← Back to portfolio
Revitalizing Atigeo's Data Stack with Advanced Analytics
Atigeo was a data analytics company turning science into big data products and services. They built xPatterns, a cloud-based big data analytics PaaS that provided abstraction mechanisms for building analytic applications, available as a managed cloud solution or deployed on the customer's own infrastructure.
Business Challenge
The client's vision was to use the latest technology to tackle big data challenges in the early days of the space. They needed to provide one-click deployments to their clients and develop complex pipelines to ingest, pre-process, and analyze data at scale.
Key Features
Built a scalable data pipeline capable of processing large volumes of unstructured data.
Ensured the infrastructure could scale to handle petabytes of data.
Provided real-time processing using streaming.
Implemented monitoring and alerting across the platform.
Results
Used Hadoop HDFS to store large volumes of unstructured and semi-structured data in a distributed manner, enabling efficient processing of terabytes of data.
Leveraged AWS EMR with Apache Spark for both batch and real-time processing of large datasets, dynamically scaling based on data volume and processing needs.
Used Elasticsearch for storing and querying processed data — including sentiment scores, keywords, and trends — in near real-time.
Used Cassandra for time-series queries, providing fast response times at scale.
Used Kafka with Spark Streaming for real-time data ingestion.
Built ETL pipelines for several analytical applications using custom Hadoop/Spark jobs and Hive/Spark SQL scripts, as well as distributed jobs for moving data between HDFS/Tachyon and Cassandra/Solr.
Tech Stack
Apache Hive
Mesos
Apache Hadoop
Python
Apache Spark
Github
Apache Cassandra
Java