Jump to Content
Partners

Power self-serve analytics and generative AI with Sparkflows and Google Cloud

February 13, 2024
Jayant Shekhar

Co-founder/CEO, Sparkflows

Maruti C

Global Partner Architect, Google

Try Gemini 1.5 Pro

Google's most advanced multimodal model in Vertex AI

Try it

Self-service analytics powered by ML and generative AI is the new holy grail for data-driven enterprises, enabling enhanced decision-making through predictive insights, and providing a significant boost in operational efficiency and innovation. C-level executives increasingly see self-service analytics as the key driver of employee productivity and business efficiency.

Today, technical practitioners employ a variety of open-source libraries, including Apache Spark, Ray, pandas, sk-learn, h20 and many more to create analytics and ML applications. This entails writing a lot of code, which has a steep learning curve. Additionally, developing front-end interfaces for business users to interact with the systems in a secure and scalable manner takes a long time.

Enterprises also face challenges in hiring and retaining data-science experts and incur overhead costs for managing a large number of heterogeneous tools and technologies. Handling a growing variety and volume of data from siloed sources is a huge barrier to analytics initiatives. Lack of seamless workload scaling slows business solutions development.

Democratizing analytics and building ML applications are best done when business users and IT teams are empowered with services offered by cloud technology through intuitive, easy-to-use workflows, analytical apps, and conversational interfaces.

This brings out the strong need for a unified self-service platform made for all users to create and launch business solutions powered by cloud.

Sparkflows

Sparkflows is a Google Cloud partner that provides a powerful platform packed with self-service analytics, ML and gen AI capabilities for building data products. Sparkflows help integrate diverse open-source technologies through intuitive user-driven interfaces.

With Sparkflows, data analytics teams can turbocharge the development of ETL, exploratory analytics, feature engineering, ML models and gen AI apps using 460+ no-code/ low-code processors, and various workbenches as shown below.

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf1.max-1400x1400.png

Various AI and gen AI workbenches in Sparkflows

Self-service with Sparkflows and Google Cloud

Sparkflows running on Google Cloud provides unified self-serve data science capabilities with connectivity top BigQuery, Vertex AI, AlloyDB and Cloud Storage. The solution automatically pushes down the computation to high-performance distributed job execution engines like Dataproc and BigQuery. These automated integrations scale business solutions for very large datasets.

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf2.max-1900x1900.png

Interaction diagram: Sparkflows and Google Cloud

Sparkflows has developed a large number of solutions for the sales and marketing, manufacturing and supply chain departments of retail and CPG customers.

Business scenarios using Sparkflows and Google Cloud

Let’s assume the engineering team of a retail company needs to empower the marketing team with a self-service analytics tool that can identify the customers who are likely to churn, and measure the effectiveness of the campaigns by analyzing the coupon responsiveness, sales, and demographic data.

The team needs to ingest and prepare data quickly, build ML models, analytics reports and gen AI apps in an automated fashion where Spark code will be generated and jobs will be submitted to a Dataproc cluster effortlessly.

Installation

As the first step, Sparkflows is installed inside the customer’s secure VPC network either on a virtual machine or in a container running in Google Cloud. Sparkflows runs securely with built-in SSO integration.

Configuration

Admin users configure the Dataproc Serverless Spark cluster and various types of LLM services like PaLM API in Sparkflows admin console.

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf3.max-1200x1200.png

Self-service solution design & execution

Sparkflows enables a unified experience for continuous machine learning.

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf4.max-1400x1400.jpg

Let’s now discuss the steps required to identify customers who are likely to churn and the ability to analyze the reviews by customers to measure satisfaction. This process involves:

Sparkflows connects with various Google Cloud services for performing the above operations (Ref: Interaction diagram: Sparkflows and Google Cloud).

Datasets

In this example, the datasets (customer transactions, campaigns, coupons and demographic info) are stored in BigQuery and product review data is in Cloud Storage. Business users can select a domain like retail and then view all the datasets stored in Google Cloud within Sparkflows. Users can browse files in Cloud Storage, explore and query BigQuery tables. Sparkflows dataset explorer seamlessly connects with Data Catalog.

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf5.max-900x900.png
https://storage.googleapis.com/gweb-cloudblog-publish/images/sf6.max-800x800.png

Data preparation

Users can rapidly design various workflows for ingesting the datasets and performing data profiling, automated quality checks, cleaning and exploratory analysis using 350+ no-code/low-code data preparation processors. All these workflows help automate the Spark code generation and functionality development for the current business solution, cutting down the engineering time from weeks to hours.

Each of the visual workflows results in the automatic creation of a Spark job which is launched on Dataproc Serverless. Dataproc Serverless is an ideal platform for running these jobs. It is a highly performant and cost-effective distributed computing platform that is able to quickly spin up additional compute resources as needed. The platform is also very cost-effective as customers are only billed for resources for the duration of the job execution.

ML model training

Data scientists and analysts can perform feature engineering to calculate various aggregated metrics from the data processed by workflows designed in previous steps. Developers can leverage 80+ No Code/Low Code ML processors to create an ML modeling workflow. The features are used for training a model which can predict customers most likely to churn.

The features based on purchase pattern and coupon redemption information are used for creating the segments of customers

ML model prediction

Below is an example of the Prediction workflow for churn prediction.

The Prediction workflow can be triggered manually, via the built-in scheduler, through the API, or using the Analytical App UI.

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf7.max-1500x1500.png

ML Model Prediction Workflow

Visualization - descriptive and predictive analytics

Business users can drag the nodes used in workflows in the report designer UI and create powerful reports, which allow data scientists to inspect profiling stats, data quality results, exploratory insights, training metrics and prediction outputs.

When the underlying workflows are executed in a Dataproc cluster, the reports are automatically refreshed.

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf8.max-900x900.png

Reports of descriptive and predictive analytics

Business analytical apps

Business analytical apps in Sparkflows let business users build front-end applications for data products. Business users interact with these apps using their browsers. The analytical apps are built with an interactive UI.

Gen AI apps

Now, let’s build a few gen AI apps to allow the business team perform the following operations:

  • Ask questions from the product review data
  • Summarize, extract topics and translate texts

The first step is to configure the Vertex PaLM API connection in the admin console and select the connection in the Analytical App.

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf9.max-700x700.png
  • Allow users to query product reviews and gain insights
https://storage.googleapis.com/gweb-cloudblog-publish/images/sf10.max-1300x1300.png
  • Allow users to translate and query documents
https://storage.googleapis.com/gweb-cloudblog-publish/images/sf11.max-1100x1100.png

This is how Sparkflows helps sales and marketing teams of a retail company identify potential customer churn, measure campaign effectiveness, find target customer segments, and analyze product reviews and business documents.

ML solutions

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf12.max-1400x1400.jpg

It enables a wide range of gen AI apps, from content synthesis, content generation, and NLQ-based reports, to prompt-based business solutions.

Generative AI solutions

https://storage.googleapis.com/gweb-cloudblog-publish/images/sf13.max-1900x1900.png

Better together

Having the ability to move fast with AI and generative AI is of great value to all types of enterprises. The partnership between Sparkflows and Google Cloud puts powerful and affordable self-serve AI and gen AI capabilities in the hands of the users in a secure and scalable way. Building gen AI solutions using Sparkflows and Google Cloud is highly affordable, thanks to Vertex’s highly cost-effective gen-ai pricing model and Sparkflows’ discounted pricing package. Overall, Sparkflows with Google Cloud drives operational efficiencies, accelerates business solutions, and speeds up time to market thereby propelling business growth.

Try out Sparkflows

Here are a few links to get started with Sparkflows and Google Cloud:


We thank the many Google Cloud and Sparkflows team members who contributed to this collaboration, especially Kaniska Mandal and Deb Dasgupta for their guidance during the process.

Posted in