Lightning Engine

Lightning Engine

Accelerate Apache Spark performance

Our vectorized engine is an easier way to optimizing Spark with a smarter engine that delivers over 4.3x faster Spark performance*, reducing compute costs.

*The queries are derived from the TPC-DS standard and TPC-H standard and as such are not comparable to published TPC-DS standard and TPC-H standard results, as these runs do not comply with all requirements of the TPC-DS standard and TPC-H standard specification.

Apache Spark is a trademark of The Apache Software Foundation.

Features

Reduce job runtimes and lower costs

Experience a faster way to run Spark. Accelerate your large-scale ETL, data science, and SQL workloads over 4.3x faster than open source Apache Spark. This dramatic reduction in job runtime lowers the total cost of ownership for your Spark workloads by reducing compute time.

Accelerate Spark performance

Discover an easier way to improve performance. Reduce spending valuable engineering cycles on optimizing Spark.

Intelligent data access and caching

Leverage a smarter architecture. Lightning Engine automatically caches hot data in memory and utilizes high-throughput, optimized connectors for Cloud Storage and BigQuery, significantly improving I/O latency and throughput for large-scale Spark data processing.


The core technology: Vectorized execution

Lightning Engine leverages a native C++ vectorized execution engine to process data in batches, dramatically improving CPU efficiency over traditional row-by-row processing. This is a core component of its breakthrough Spark performance.


Availability

AvailabilityLightning Engine is for your most demanding Spark workloads. You can access it with the premium tiers of Dataproc and Serverless Apache Spark
ProductAvailabiltyAccess

Generally available

Dataproc on Google Compute Engine

In preview

Coming soon

Availability

Lightning Engine is for your most demanding Spark workloads. You can access it with the premium tiers of Dataproc and Serverless Apache Spark

Availabilty

Generally available

Access

Dataproc on Google Compute Engine

Availabilty

In preview

Access

Coming soon

How It Works

Lightning Engine accelerates Spark data processing with a native C++ vectorized engine, intelligent caching, and optimized I/O. It processes data in batches for maximum CPU efficiency, reducing job runtimes, and compute costs. This suite of optimizations delivers breakthrough Spark performance.

Common Uses

Ideal for your most demanding jobs

Large-scale ETL

Dramatically reduce the runtime of your most complex Spark data processing and transformation pipelines. This means you can meet tighter data freshness SLAs, shrink overnight batch windows, and significantly lower the TCO of your most resource-intensive data pipelines.

Slide reading 4.3x improved performance compared to open source Apache Spark

    Large-scale ETL

    Dramatically reduce the runtime of your most complex Spark data processing and transformation pipelines. This means you can meet tighter data freshness SLAs, shrink overnight batch windows, and significantly lower the TCO of your most resource-intensive data pipelines.

    Slide reading 4.3x improved performance compared to open source Apache Spark

      AI/ML data preparation

      Accelerate the feature engineering and data preparation steps that are critical for your machine learning lifecycle. By speeding up the most time-consuming part of the ML workflow, your data scientists can run more experiments, iterate on models faster, and get valuable AI applications into production sooner.

      ML workflow optimization

        AI/ML data preparation

        Accelerate the feature engineering and data preparation steps that are critical for your machine learning lifecycle. By speeding up the most time-consuming part of the ML workflow, your data scientists can run more experiments, iterate on models faster, and get valuable AI applications into production sooner.

        ML workflow optimization

          Interactive analytics

          Power fast, interactive SQL queries directly on your data lake for ad-hoc analysis and business intelligence. Empower your data analysts to maintain their train of thought with quicker query response times, leading to faster data exploration and more effective insights.

          Google Cloud’s next-generation AI-powered Open Lakehouse

            Interactive analytics

            Power fast, interactive SQL queries directly on your data lake for ad-hoc analysis and business intelligence. Empower your data analysts to maintain their train of thought with quicker query response times, leading to faster data exploration and more effective insights.

            Google Cloud’s next-generation AI-powered Open Lakehouse

              Pricing

              Accelerated Spark, your wayLightning Engine is a feature of the premium tiers of Dataproc and Google Cloud Serverless for Apache Spark.
              ProductPricing

              In preview, coming soon.

              Accelerated Spark, your way

              Lightning Engine is a feature of the premium tiers of Dataproc and Google Cloud Serverless for Apache Spark.

              Pricing

              In preview, coming soon.

              Pricing calculator

              Estimate your monthly costs, including region-specific pricing, and fees.

              Custom quote

              Connect with our sales team to get a custom quote for your organization.

              Accelerate your Spark

              Turbocharge your Spark jobs

              Have a large project?

              Start using Serverless for Apache Spark

              When to use Lightning Engine for Apache Spark

              Compare Dataproc and Serverless for Apache Spark

              Google Cloud