The e-commerce sample application illustrates common use cases and best practices for implementing streaming data analytics and real-time AI. Use it to learn how to dynamically respond to customer actions by analyzing and responding to events in real time, and also how to store, analyze and visualize that event data for longer-term insights.
The application is implemented in Java, and uses the following products:
- Cloud Bigtable
The sample application is available on GitHub at retail-java-applications.
The application was designed to address the following requirements:
- Validate incoming data and apply corrections to it where possible.
- Analyze clickstream data to keep a count of the number of views per product in a given time period. Store this information in a low latency store where the application can use it to provide 'number of people who viewed this product' messages to customers on the web site.
Use transaction data to inform inventory ordering:
- Analyze transaction data to calculate the total number of sales for each item, both by store and globally, for a given period.
- Analyze inventory data to calculate the incoming inventory for each item.
- Pass this data to inventory systems on a continuous basis so it can be used for inventory purchasing decsions decisions.
Validate incoming data and apply corrections to it where possible. Write any uncorrectable data to a dead letter queue for additional analysis and processing. Make a metric that represents the percentage of incoming data that gets sent to the dead letter queue available for monitoring and alerting.
Process all incoming data into a standard format and store it in a data warehouse to use for future analysis and visualization.
Denormalize transaction data for in-store sales so that it can include information like the latitude and longitude of the store location. Provide the the store information through a slowly changing table in BigQuery, using the store ID as a key.
The application processes the following types of data:
- Clickstream data being sent by Newkick's web interface.
- Transaction data being sent by on-premise or software-as-a-service (SaaS) systems.
- Inventory data being sent by on-premise or SaaS systems.
The application contains a number of task patterns that show the best way to accomplish Java programming tasks that are commonly needed to create this type of application.
The application contains the following task patterns:
- Using Apache Beam schemas to work with structured data
- Using JsonToRow to convert JSON data
- Using the AutoValue code generator to generate plain old Java objects (POJOs)
- Queuing unprocessable data for further analysis
- Applying data validation transforms serially
- Using DoFn.StartBundle to micro-batch calls to external services
- Using an appropriate side-input pattern