Managing High Data Latency and Lack of Control Over Web Traffic Analytics: The client faced issues with its third-party analytics provider, which included high data latency, limited control over web traffic, and difficulty in adding or removing data fields. These limitations hindered the client’s ability to gain quick insights, detect anomalies, and improve performance in their CI/CD environments. Additionally, its largest dataset originated outside of its AWS infrastructure, leading to inefficiencies in processing and delivering near-real-time analytics.
Solution | Improving Time to Insight for Clickstream Analytics by 48 Times Using Amazon Kinesis Data Streams
The company implemented a solution that uses Amazon Kinesis Data Streams and Spark Streaming to automate the ETL process, enabling near-real-time data ingestion and analysis. Data is processed every 10 seconds, significantly reducing the latency from hours to minutes. This resulted in a 48-time improvement in time to insight for clickstream data, reducing query time from 4 hours to 5 minutes.
This solution supports continuous integration and delivery (CI/CD) and A/B testing, allowing near-instantaneous detection of issues and faster iterations of product features. With real-time web insights and JSON support, the company can quickly identify and resolve anomalies, optimize customer experiences, and personalize user interactions offline by analyzing and leveraging behavioural data almost instantly.
“Using Amazon Kinesis Data Streams provides data to the appropriate teams in a consumable manner and reduces all friction points.”
Distinguished Engineer