Big data and analytics resources

Last reviewed 2024-11-01 UTC

The Architecture Center provides content resources across a wide variety of big data and analytics subjects.

Big data and analytics resources in the Architecture Center

You can filter the following list of big data and analytics resources by typing a product name or a phrase that's in the resource title or description.

Analyzing FHIR data in BigQuery

Explains the processes and considerations for analyzing Fast Healthcare Interoperability Resources (FHIR) data in BigQuery.

Products used: BigQuery

Architecture and functions in a data mesh

A series that describes how to implement a data mesh that is internal to an organization.

Build an ML vision analytics solution with Dataflow and Cloud Vision API

How to deploy a Dataflow pipeline to process large-scale image files with Cloud Vision. Dataflow stores the results in BigQuery so that you can use them to train BigQuery ML pre-built models.

Products used: BigQuery, Cloud Build, Cloud Pub/Sub, Cloud Storage, Cloud Vision, Dataflow

Cloud Monitoring metric export

Describes a way to export Cloud Monitoring metrics for long-term analysis.

Products used: App Engine, BigQuery, Cloud Monitoring, Cloud Pub/Sub, Cloud Scheduler, Datalab, Looker Studio

Continuous data replication to BigQuery using Striim

Demonstrates how to migrate a MySQL database to BigQuery using Striim. Striim is a comprehensive streaming extract, transform, and load (ETL) platform.

Products used: BigQuery, Cloud SQL for MySQL, Compute Engine

Continuous data replication to Spanner using Striim

How to migrate a MySQL database to Cloud Spanner using Striim.

Products used: Cloud SQL, Cloud SQL for MySQL, Compute Engine, Spanner

Data science with R on Google Cloud: Exploratory data analysis

Shows you how to get started with data science at scale with R on Google Cloud. This document is intended for those who have some experience with R and with Jupyter notebooks, and who are comfortable with SQL.

Products used: BigQuery, Cloud Storage, Notebooks, Vertex AI

Data transformation between MongoDB Atlas and Google Cloud

Data transformation between MongoDB Atlas as the operational data store and BigQuery as the analytics data warehouse.

Products used: BigQuery, Cloud Pub/Sub, Dataflow

De-identification and re-identification of PII in large-scale datasets using Sensitive Data Protection

Discusses how to use Sensitive Data Protection to create an automated data transformation pipeline to de-identify sensitive data like personally identifiable information (PII).

Products used: BigQuery, Cloud Pub/Sub, Cloud Storage, Dataflow, Identity and Access Management, Sensitive Data Protection

Geospatial analytics architecture

Learn about Google Cloud geospatial capabilities and how you can use these capabilities in your geospatial analytics applications.

Products used: BigQuery, Dataflow

Import data from an external network into a secured BigQuery data warehouse

Describes an architecture that you can use to help secure a data warehouse in a production environment, and provides best practices for importing data into BigQuery from an external network such as an on-premises environment.

Products used: BigQuery

Import data from Google Cloud into a secured BigQuery data warehouse

Describes an architecture that you can use to help secure a data warehouse in a production environment, and provides best practices for data governance of a data warehouse in Google Cloud.

Products used: BigQuery, Cloud Key Management Service, Dataflow, Sensitive Data Protection

Jump Start Solution: Analytics lakehouse

Unify data lakes and data warehouses by creating an analytics lakehouse using BigQuery to store, process, analyze, and activate data.

Jump Start Solution: Data warehouse with BigQuery

Build a data warehouse with a dashboard and visualization tool using BigQuery.

Migrate to Google Cloud

Helps you plan, design, and implement the process of migrating your application and infrastructure workloads to Google Cloud, including computing, database, and storage workloads.

Products used: App Engine, Cloud Build, Cloud Data Fusion, Cloud Deployment Manager, Cloud Functions, Cloud Run, Cloud Storage, Container Registry, Data Catalog, Dataflow, Direct Peering, Google Kubernetes Engine (GKE), Transfer Appliance

Migrating On-Premises Hadoop Infrastructure to Google Cloud

Guidance on moving on-premises Hadoop workloads to Google Cloud...

Products used: BigQuery, Cloud Storage, Dataproc

Scalable BigQuery backup automation

Build a solution to automate recurrent BigQuery backup operations at scale, with two backup methods: BigQuery snapshots and exports to Cloud Storage.

Products used: BigQuery, Cloud Logging, Cloud Pub/Sub, Cloud Run, Cloud Scheduler, Cloud Storage

Security log analytics in Google Cloud

Shows how to collect, export, and analyze logs from Google Cloud to help you audit usage and detect threats to your data and workloads. Use the included threat detection queries for BigQuery or Chronicle, or bring your own SIEM.

Products used: BigQuery, Cloud Logging, Compute Engine, Looker Studio

Use a CI/CD pipeline for data-processing workflows

Describes how to set up a continuous integration/continuous deployment (CI/CD) pipeline for processing data by implementing CI/CD methods with managed products on Google Cloud.

Products used: Cloud Build, Cloud Composer, Cloud Source Repositories, Cloud Storage, Compute Engine, Dataflow

Use Apache Hive on Dataproc

Shows how to use Apache Hive on Dataproc in an efficient and flexible way by storing Hive data in Cloud Storage and hosting the Hive metastore in a MySQL database on Cloud SQL.

Products used: Cloud SQL, Cloud Storage, Dataproc