Google Cloud system design considerations

This section of the architecture framework explains specific Google Cloud features and services that you can combine in different ways to optimize your deployment for your business needs.

The framework consists of the following series of articles:

Geographic zones and regions

Regions are independent geographic areas that consist of multiple zones. A zone is a deployment area for Google Cloud resources within a region. Zones should be considered a single failure domain within a region. To deploy fault-tolerant applications with high availability, you deploy your applications across multiple zones in a single region, or in multiple zones in different regions, to help protect against expected and unexpected downtimes.

Google Cloud services and resources can be zonal, regional, managed by Google across multiple regions, or global:

Zonal resources

Zonal resources operate within a single zone. If a zone becomes unavailable, all resources in that zone are unavailable until the service is restored. Virtual machines (VMs) and local persistent disks (PDs) are examples of zonal resources.

Regional resources

Regional resources are redundantly deployed across all the zones within a region. This gives them higher availability than zonal resources. App Engine and regional Cloud Storage buckets are examples of regional resources.

Multi-regional resources

Google services are redundant and distributed within and across regions. These services optimize availability, performance, and resource efficiency. As a result, they require tradeoffs between latency and consistency. These tradeoffs are documented on a product-specific basis.

Global resources

Google services are redundant and distributed within and across regions. These services are not tied to a given region or set of regions. Load balancers, Pub/Sub, and Speech-to-Text are examples of global resources.

Design questions

  • In what geographical regions are the users for your applications?
  • Which Google Cloud regions are closest to your users?
  • Do you have any regulatory requirements based on geography?
  • Do you need global deployment or will a regional deployment meet your requirements?

Recommendations

  • Select a region or set of regions that are closest geographically to your end users to minimize latency when serving traffic to external users.
  • Select a specific region or set of regions to meet any geographic requirements
  • Use a Load Balancer to provide a single IP which is routed to your application when you are serving a global user base.
  • Connect your on-premises or colocation networks to Google Cloud through Cloud Interconnect for high-speed, private network connections.

Resources

Resource management

Google Cloud provides resource containers such as organizations, folders, and projects that allow you to group and hierarchically organize Google Cloud resources. This hierarchical organization lets you manage common aspects of your resources, such as access control, configuration settings, and policies. Resource Manager provides programmatic access to the resource containers.

The following diagram illustrates the Google Cloud resource hierarchy:

Google Cloud resources hierarchy

The purpose of the Google Cloud resource hierarchy is two-fold:

  • To provide a hierarchy of ownership, which binds the lifecycle of a resource to its immediate parent in the hierarchy.
  • To provide attachment points and inheritance for access control and organization policies.

The Google Cloud resource hierarchy allows you to map your organizational structure into Google Cloud. The hierarchy also provides logical attachment points for Identity and Access Management (IAM) policies (which manage access to resources) as well as organization policies. Both IAM and organization policies are inherited through the hierarchy, and the effective policy at each node of the hierarchy is the result of policies applied at the node and policies inherited from its ancestors.

At the lowest level, resources are the fundamental components that make up all Google Cloud services. Examples of resources include Compute Engine virtual machines (VMs), Pub/Sub topics, Cloud Storage buckets, and App Engine instances. All these lower-level resources must exist with in a project. Projects represent the first group level of the Google Cloud resource hierarchy.

The organization node is the top node of the hierarchy and does not have a parent. All resources that belong to an organization are grouped under the organization node. The organization node provides central visibility and control over every resource that belongs to an organization.

Folders are an additional grouping mechanism on top of projects. You must have an organization resource before you can create folders.

Design questions

  • Which roles in your organization require access to your Google Cloud infrastructure?
  • What access requirements do members of each role have for Google Cloud resources?
  • How will your organizational structure map to the Google Cloud resource hierarchy?
  • Do you have governance conventions for resource labeling?

Recommendations

  • Create an organization node in your domain.
  • Define a resource hierarchy that maps to your Google Cloud business needs.
  • Define your project structure. For example:
    • Anonymize information in project names.
    • Follow a project naming pattern like {company-initial-identifier}-{environment}-{app-name}, where the placeholders are unique but don't reveal company or application names.
  • Automate project creation, delegate billing, and set up IAM governance.
  • Prevent accidental deletion by leveraging project liens.
  • Identify and plan for zonal, regional, and multi-regional deployment for your workloads.

Resources

Identity and access management

Identity and access management is a cornerstone of your Google Cloud deployment because it provides the authorization controls to Google Cloud resources. Using IAM, you manage employee, customer, and other identities and their respective access authorizations.

Google Cloud provides you with a set of enterprise-ready IAM services to help you secure access to your data, simplify management through intelligence, and transition to the cloud with confidence.

In IAM, you grant access to members. Members can be of the following types.

Google account

A Google account represents a developer, an administrator, or any other person who interacts with Google Cloud.

Service account

A service account is an account that belongs to an application instead of an individual end user.

Google group

A Google group is a named collection of Google accounts and service accounts.

Google Workspace domain

A Google Workspace domain represents a virtual group of all the Google accounts that have been created in an organization's Google Workspace account.

Cloud Identity domain

A Cloud Identity domain is like a Google Workspace domain because it represents a virtual group of all Google accounts in an organization.

Authorization

When an authenticated member attempts to access a resource, IAM checks the resource's IAM policy to determine whether the action is allowed. The entities and concepts involved in the authorization process are described below.

Resources

You can grant access to users for a Google Cloud resource. Some examples of resources are projects, Compute Engine instances, Cloud Storage buckets, and so on. Some services, such as Compute Engine and Cloud Storage, support granting IAM permissions at a granularity finer than the project level.

Permissions

Permissions determine what operations are allowed on a resource. In the IAM world, permissions are represented in the form of service.resource.verb. You don't assign permissions to users directly. Instead, you assign them a role that contains one or more permissions.

Roles

A role is a collection of permissions. When you grant a role to a user, you grant them all the permissions that the role contains. There are three kinds of roles in IAM:

  • Basic roles. Owner, Editor, and Viewer.
  • Predefined roles. Predefined roles are IAM roles that give finer-grained access control than basic roles.
  • Custom roles. Roles that you create to tailor permissions to the needs of your organization when predefined roles don't meet your needs.

IAM policies

You can grant roles to users by creating an IAM policy, which is a collection of statements that define who has what type of access. A policy is attached to a resource and is used to enforce access control whenever that resource is accessed. An IAM policy is represented by the IAM policy object.

Policy hierarchy

You can set an IAM policy at any level in the resource hierarchy: organization, folder, project, or the resource level. Resources inherit the policies of their parent resource. Set a policy at the organization level to have it automatically inherited by all its children folders and projects. Set a policy at the project level to have it inherited by all the project's child resources. The effective policy for a resource is the union of the policy set at that resource, and the policy inherited from higher up in the hierarchy.

Design questions

  • How will you manage identities?
  • Will you federate from an existing identity source?
  • How do you plan to delegate admin access?
  • Do you have a governance process to create, update, and audit access control?
  • Do you group users and enforce multifactor authentication (MFA) based on access sensitivity?

Recommendations

  • Secure organization admin access.
  • Federate your identity provider with Google Cloud.
  • Use Cloud Identity for user account identity if you don't have an identity provider.
  • Use Google Accounts and appropriate IAM policies for every user.
  • Create your own custom service accounts to limit IAM permissions to least privilege.
  • Migrate unmanaged accounts.
  • Secure access to resources through least privilege.
  • Use groups and service accounts.
  • Use a group naming convention.
  • Audit the group membership request workflow.
  • Enforce MFA whenever possible, especially for users with high privilege access.
  • Review super admin access.
  • Leverage service account and audit access and how it is used.
  • Remove default IAM Organization policies.
  • Audit access management changes regularly.

Key services

Cloud Identity unifies identity, application, and device management to maximize user efficiency, protect company data, and transition your company to a digital workspace.

Identity Platform adds identity and access management functionality to your apps and services, and helps to protect user accounts.

Managed Service for Microsoft Active Directory (AD) manages your AD-dependent workloads, automates AD server maintenance and security configuration, and connects your on-premises AD domain to the cloud with a highly available, hardened Google Cloud service.

Resources

Concepts | IAM documentation
Using resource hierarchy for access control | IAM documentation

Compute

Most solutions use compute resources in some form, and the selection of compute for your application needs is critical. On Google Cloud, compute is offered as Compute Engine, App Engine, Google Kubernetes Engine (GKE), Cloud Functions, and Cloud Run. You should evaluate your application demands and then choose one of the following compute offerings.

Compute Engine provides graphics processing units (GPUs) that you can add to your virtual machine instances. You can use these GPUs to accelerate specific workloads on your instances, such as machine learning and data processing.

Generally, App Engine is a great candidate for hosting frontend applications,because it lets you focus on application development rather than maintaining infrastructure operation. App Engine also supports container deployment, allowing for easier and quicker migration. App Engine can support hosting microservice architectures with multiple services.

When you need more administrative control, Google Kubernetes Engine is the recommended option. GKE is great for complex microservice architectures that need additional services such as Istio for service mesh control.

If neither App Engine nor GKE fulfill your needs, you can use Compute Engine for deploying your application, because you can build and run any custom VM images.

Cloud Functions lets you build automation code that lives for a short duration and can perform actions in a scalable fashion. You can use Cloud Functions as a glue to stitch together various pieces of your applications without worrying about infrastructure management.

Design questions

  • How are you planning to use compute?
  • Are your applications containerized or do they have any legacy dependency?
  • Is your application Stateful or Stateless?
  • Do you have complex distributed service deployment (high inter-node networking)?
  • How do you manage instance access (including SSH keys)?

Recommendations

  • Choose the Google Cloud region closest to your user base or depending on compliance requirements.
  • Evaluate latency requirements for your workloads.
  • Determine application-end user latency requirements and choose a single region or multi-region deployment strategy.
  • Ensure that instances are not configured to use the default service account with full access to all Cloud APIs.
  • Ensure that IP forwarding is not enabled on instances unless needed.
  • Ensure that Compute Engine instances do not have public IP addresses when not needed. Instead use NAT Gateway.

Key services

Compute Engine delivers virtual machines running in Google's innovative data centers and worldwide fiber network. Compute Engine's tooling enables scaling from single instances to global, load-balanced infrastructure. Compute Engine VMs boot quickly, come with high-performance persistent and local disk options, and deliver consistent performance. Our virtual servers are available in many configurations, including predefined sizes, and include options to create custom machine types optimized for your specific needs. Flexible pricing and automatic sustained use discounts make Compute Engine flexible to match your price and performance requirements.

Google Kubernetes Engine provides a powerful cluster manager and orchestration system for running your Docker containers. GKE schedules your containers into the cluster, keeps them healthy, and manages them automatically based on requirements you define (such as CPU and memory). GKE is based on Kubernetes, the open-source container orchestration system. Using a platform based on open source provides you with the flexibility to deploy your containers on GKE, on-premises, or in another public cloud infrastructure. GKE provides you with a managed Kubernetes control plane that helps you focus on developing applications and eases the pain of managing Kubernetes deployments. GKE helps you deploy zonal or regional clusters depending on your needs, while supporting private cluster and Knative support.

App Engine is a platform for building scalable web applications and mobile and IoT backends. App Engine provides you with built-in services and APIs, such as NoSQL datastores, memcache, and a user authentication API, common to most applications. App Engine can scale your application automatically in response to the amount of traffic it receives, so you pay only for the resources you use. Just upload your code and Google will manage your app's availability—you don't need to provision or maintain a single server.

Cloud Functions is a serverless execution environment for building and connecting cloud services. With Cloud Functions, you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. Your function is triggered when an event being watched is fired. Your code executes in a fully managed environment. There is no need to provision any infrastructure or worry about managing any servers.

Cloud Run is a managed compute platform that enables you to run stateless containers that are invocable through web requests or Pub/Sub events. Cloud Run is serverless: it abstracts away all infrastructure management, so you can focus on what matters most—building great applications. It is built from Knative, letting you choose to run your containers either fully managed with Cloud Run, or in your GKE cluster with Cloud Run for Anthos on Google Cloud.

Resources

Networking

Google's private network connects our regional locations to more than 100 global network points of presence. Google Cloud uses software-defined networking and distributed systems technologies to host and deliver your services around the world as fast as possible. Google global VPC uses the Google-owned global high-speed network to link your applications across regions privately and reliably. When every millisecond of latency counts, Google ensures that your content is delivered with the highest throughput, thanks to innovations like BBR congestion control intelligence.

Networking design is another critical component, when done correctly, that helps you optimize for performance and secure how your application communicates with internal and external services. When you choose networking services, it's important to think a few steps ahead with respect to your application needs and how the applications will communicate with each other. Some components require global service, while some might need geo-locality to a specific region. Choose a deployment region close to your users for better performance.

Design questions

  • How complex is your application service connectivity deployment?
  • What are some networking requirements for your inter-application deployments?
  • If you have external services, how will you connect to the Google Cloud network?
  • If connecting your VPC and on-premises network, how much bandwidth do you require?
  • How do you segment and access control your network? Based on applications? Teams?
  • Do you have a governance process to create or update new or existing networking deployments? How frequently do you audit?
  • Do you have a separate network for sensitive applications? How do you monitor and restrict access?

Recommendations

  • Document your network design: Cross projects or hybrid deployments. Use a network topology graph to verify connectivity.
  • Use clear and consistent naming conventions for services like service accounts, network tags, and firewall rules.
  • Grant the network user role at the subnet level.
  • Choose an appropriate project:
    • Use a single host project if resources require multiple network interfaces.
    • Create a single VPC per project to map VPC quotas to projects.
    • Use multiple host projects if resource requirements exceed the quota of a single project.
    • Use multiple host projects if you need separate administration policies for each VPC.
  • Create a VPC for each autonomous team, with shared services in a common VPC.
  • Isolate sensitive data in its own VPC or project.
  • While using VPC Network Peering, evaluate if you won't exceed network peering quota limits (forwarding rules, firewall rules, routes, and so on).
  • Use multi-NIC virtual appliances to control traffic between VPCs through a cloud device.
  • Use VPC for administration of multiple working groups.
  • Create a shared services VPC if multiple VPCs need access to common resources but not to each other.
  • Use dynamic routing whenever possible.
  • Centralize network control.
  • Use Private DNS zones for name resolution whenever possible.
  • Frequently audit network access permissions and control.
  • Ensure SSH/RDP access is restricted from the internet.
  • Enable VPC flow logs for critical projects.

Key services

Virtual Private Cloud (VPC) provides networking functionality to Compute Engine virtual machine (VM) instances, GKE clusters, and App Engine flexible instances. VPC provides global, scalable, flexible networking for your cloud-based resources and services. A VPC network is a global resource that consists of a list of regional virtual subnetworks (subnets) in data centers, all connected by a global wide area network. VPC networks are logically isolated from each other in Google Cloud.

Shared VPC allows an organization to connect resources from multiple projects to a common VPC network, so that they can communicate with each other securely and efficiently using internal IP addresses from that network. When you use Shared VPC, you designate a project as a host project and attach one or more other service projects to it. The VPC networks in the host project are called Shared VPC networks.

Cloud Load Balancing gives you the ability to distribute load-balanced compute resources in single or multiple regions, to meet your high availability requirements, to put your resources behind a single anycast IP address and to scale your resources up or down with intelligent autoscaling. Cloud Load Balancing is fully integrated with Cloud CDN for optimal content delivery. Google Cloud offers both global and regional load-balancers as well as Internal load balancers to help optimize serving your application.

Cloud CDN (Content Delivery Network) uses Google's globally distributed edge points of presence to cache HTTP(S) load balanced content close to your users. Caching content at the edges of Google's network provides faster delivery of content to your users while reducing serving costs.

Cloud DNS is a scalable, reliable and managed authoritative Domain Name System (DNS) service running on the same infrastructure as Google. It has low latency, high availability, and is a cost-effective way to make your applications and services available to your users.

Cloud Interconnect extends your on-premises network to Google's network through a highly available, low-latency connection. You can use Cloud Interconnect - Dedicated (Dedicated Interconnect) to connect directly to Google or use Cloud Interconnect - Partner (Partner Interconnect) to connect to Google through a supported service provider.

Resources

Storage

Most deployments need some form of storage for their data. Google Cloud services can be classified into blob storage or disk storage. Because storage is connected to other services over the network, also consider IOPS requirements while selecting your storage type. In Google Cloud, IOPS is bundled with storage and scales according to your provisioned space. Some storage types like Persistent Disk require manual replication and backup because they are zonal or regional. Cloud Storage natively replicates data across the selected region or multi-region and is highly available.

When considering Google Cloud storage options, look at Cloud Storage for blobs, Persistent Disk for block storage, and Filestore for shared files. Cloud Storage is a regional or multi-regional resource. All Cloud Storage buckets have built-in redundancy to protect your data against equipment failure and to ensure data availability through datacenter maintenance events. Checksums are calculated for all Cloud Storage operations so Google can ensure that what you read is what you wrote. Persistent Disk is a zonal or regional resource, so you must take additional steps to snapshot, backup, or replicate your data for redundancy.

It's a best practice to determine your application performance needs and data requirements while you're choosing a storage type.

Design questions

  • How much and what types of storage do you require?
  • What are the access modes for your requirements?
  • Do you need active or archival storage?
  • Are you looking to host static objects for web hosting? CDN?
  • Do you store and process sensitive data? How do you monitor and manage access?
  • Do you have process and governance requirements for encryption?

Recommendations

  • Determine application storage requirements and choose appropriate storage options.
  • Make every bucket name unique across the entire Cloud Storage namespace. Do not include sensitive information in a bucket name. Choose bucket and object names that are difficult to guess.
  • Do a back-of-the-envelope estimate of the amount of traffic that will be sent to Cloud Storage in order to calculate transfer time.
  • If you are hosting public content, try using CDN to minimize egress cost.
  • Store your data in a region closest to your application's users.
  • Keep compliance requirements in mind when choosing a location for user data.
  • For data that will be served at a high rate with high availability, use the Multi-Regional Storage or Regional Storage class. For data that will be infrequently accessed and can tolerate slightly lower availability, use the Nearline Storage or Coldline Storage class.
  • Ensure that your Cloud Storage bucket is not anonymously or publicly accessible.

Key services

Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users through direct download.

Cloud Storage offers four storage classes: Multi-Regional Storage, Regional Storage, Nearline Storage, and Coldline Storage. All storage classes offer low latency (time to first byte is typically tens of milliseconds) and high durability. The classes differ by their availability, minimum storage durations, and pricing for storage and access.

Persistent Disk is durable and high performance block storage for Google Cloud. Persistent Disk provides SSD and HDD storage that can be attached to instances running in either Compute Engine or Google Kubernetes Engine. Storage volumes can be transparently resized, quickly backed up, and offer the ability to support simultaneous readers.

Regional persistent disks provide durable storage and replication of data between two zones in the same region. If you need higher IOPS and low latency, Google Cloud offers Local SSDs. These are physically attached to the server that hosts your virtual machine instance. You can use them as temporary disk space.

Filestore is a managed file storage service for applications that require a file system interface and a shared file system for data. Filestore gives users a simple, native experience for standing up managed Network Attached Storage (NAS) with their Compute Engine and Google Kubernetes Engine instances. The ability to fine-tune Filestore's performance and capacity independently leads to predictably fast performance for your file-based workloads.

Resources

Database

Selection of a database is another critical step of selecting components for your application. Broadly, databases are classified as relational and non-relational. You can choose to host your own database or database cluster using Compute Engine virtual machines (VMs), but it's a good idea to evaluate Google Cloud managed database services before you choose to install your own. Managing your own database or database cluster comes with additional overhead of maintaining it with the latest patches and updates, while also catering to the day-to-day operational activities like monitoring and taking backups. Google Cloud offers you a wide variety of database services to choose from depending on your business use case. Selection criteria are, for example, low-latency access, time series data processing, disaster recovery, or mobile client synchronization. Cloud SQL is a regional service that supports read replicas in remote regions, low-latency reads, and disaster recovery. Cloud Spanner is a multi-regional offering providing external consistency, global replication, and five 9s SLA. Additional systems are Cloud Bigtable, Memorystore, Firebase, and Firestore. Open source databases like MongoDB and MariaDB are available as well.

Similar to storage, let your functional and non-functional application requirements drive your database selection. Define the requirements and select the database that fits the requirements best.

When you move existing workloads to Google Cloud, database migration technology becomes an important component to enable and to execute zero-downtime migration. This is essential so that applications can continue to serve end users while database migration is ongoing. Several database migration technologies are available to you.

Design questions

  • What databases are you running? How are they used ?
  • Do you have any specific requirements (latency, replication, consistency)?
  • Do you have any legacy dependency on certain databases or versions?
  • How many of these are structured or unstructured?
  • How do you govern access to your database? At the application level and for internal consumption?

Recommendations

  • Choose the right schema for your table.
  • Choose the right key name to avoid key hotspotting, especially for non-relational databases.
  • Shard your database instance whenever possible.
  • Use good connection management practices, such as connection pooling and exponential backoff.
  • Avoid very large transactions.
  • Design and test your application's response to maintenance updates on databases.
  • Secure and isolate connections to your database.

Cloud SQL specific recommendations:

  • Use private IP networking (VPC).

    • For additional security:
      • Use Cloud SQL Proxy with private networking.
      • Restrict public IP access (constraints/sql.restrictPublicIp)
  • If you need public IP networking:

    • Use the built-in firewall with limited/narrow IP list, and ensure that Cloud SQL instances require incoming connections to use SSL.
    • For additional security:
      • Do not use an allowlist, and use a Cloud SQL Proxy.
      • Restrict authorized networks (constraints/sql.restrictAuthorizedNetworks).
  • Use Cloud SQL Proxy whenever possible.

  • Use limited privileges for database users.

  • Ensure that Cloud SQL database instance requires all incoming connections to use SSL.

  • Ensure that Cloud SQL database instances are not open to the world.

  • Ensure that your MySQL database instance does not allow anyone to connect with administrative privileges.

Key services

Cloud SQL is a fully managed database service that makes it easy to set up, maintain, manage, and administer your relational PostgreSQL, MySQL, and SQL Server databases in the cloud. Cloud SQL offers high performance, scalability, and convenience. Hosted on Google Cloud, Cloud SQL provides a database infrastructure for applications running anywhere.

Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, enabling you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key. Cloud Bigtable is ideal for storing very large amounts of single-keyed data with very low latency. It supports high read and write throughput at low latency, and it is an ideal data source for MapReduce operations.

Cloud Spanner is the first scalable, enterprise-grade, globally-distributed, and strongly consistent database service built for the cloud specifically to combine the benefits of relational database structure with non-relational horizontal scale. This combination delivers high-performance transactions and strong consistency across rows, regions, and continents with an industry-leading 99.999% availability SLA, no planned downtime, and enterprise-grade security.

Memorystore for Redis is a fully managed Redis service for the Google Cloud. Applications running on Google Cloud can achieve extreme performance by leveraging the highly scalable, available, secure Redis service without the burden of managing complex Redis deployments.

Firestore is a NoSQL document database built for automatic scaling, high performance, and ease of application development. While the Firestore interface has many of the same features as traditional databases, as a NoSQL database it differs from them in the way it describes relationships between data objects.

Firebase Realtime Database is a cloud-hosted database. Data is stored as JSON and synchronized in real time to every connected client. When you build cross-platform apps with our iOS, Android, and JavaScript SDKs, all of your clients share one Realtime Database instance and automatically receive updates with the newest data.

Open source databases. Various partners provide different open source databases, including MongoDB, MariaDB, Redis, and many more.

Resources

Analytics

Most businesses want to analyze their data and glean insights from it. Google Cloud provides you with various managed tools that help you focus on writing your ETL pipeline while Google manages the underlying infrastructure for you. Depending on your business needs and what you want to achieve, Google Cloud offers the following services for ingesting, processing, transforming, analyzing, and viewing your data.

Choosing the right service is tricky, but if you determine your expertise and comfort level, the decision to choose becomes easier. For example, Dataflow lets you write complex transforms, but you must be familiar with scripting. Dataprep lets you visualize your data and build custom recipes to transform your data.

BigQuery is ideal for a data warehouse, because it is a fully managed service and automatically helps you save on long term storage. If you have a table that is not edited for 90 consecutive days, the price of storage for that table automatically drops by 50 percent. Always partition your data and optimize your BigQuery query.

Design questions

  • How do you ingest and analyze your data?
  • Do you currently have an ETL pipeline setup? What does it look like?
  • What type of data do you typically analyze? Any proprietary data formats?
  • Do you have an estimate of your existing data and expected growth?
  • Do you perform any machine learning? Do you plan to use a managed or unmanaged service?
  • Do you have SLAs for jobs or workflows? How do you monitor them?

Recommendations

  • Determine whether your application needs an "exactly once" or "guaranteed once" delivery pipeline.
  • Decouple your ETL functions into small functions using Pub/Sub as a buffer to make the pipeline scalable.
  • Use the Jobs API to scale Dataproc clusters, which helps reduce cost by running jobs in existing clusters.
  • Evaluate your query performance and partition your BigQuery datasets to minimize query cost.

Key services

Pub/Sub is a simple, reliable, scalable foundation for stream analytics and event-driven computing systems. You can send and receive messages between independent applications and syndicate data across projects and applications running on cloud, on-premises, or hybrid environments. You can use Pub/Sub's flexibility to decouple systems and components hosted on Google Cloud or elsewhere on the internet. And Pub/Sub is designed to provide "at least once" delivery at low latency with on-demand scaling to tens of millions of messages per second.

Dataflow is a fully managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness. Dataflow's serverless approach frees you from operational tasks like capacity planning, resource management, and performance optimization, while you pay only for what you use. Plus, Dataflow not only works with Google's ingestion, data warehousing, and machine learning products, but also third-party tools like Apache Spark and Apache Beam.

Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. Because Dataprep is serverless and works at any scale, there is no infrastructure to deploy or manage. Your next ideal data transformation is suggested and predicted with each UI input, so you don't have to write code. And with automatic schema, data type, possible joins, and anomaly detection, you can skip time-consuming data profiling and focus on data analysis.

Datalab is a powerful interactive tool created to explore, analyze, transform, and visualize data and build machine-learning models on Google Cloud. It is an interactive notebook based on Jupyter, and it's integrated with BigQuery and AutoML to provide easy access to key data processing services. With TensorFlow or AutoML, you can easily turn data into deployed machine-learning models ready for prediction.

Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way. Operations that used to take hours or days take seconds or minutes instead — and you pay only for the resources you use with per-second billing. Dataproc integrates with storage, compute, and monitoring services across Google Cloud products, giving you a powerful and complete data processing platform.

Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. With a graphical interface and a broad open source library of preconfigured connectors and transformations, Cloud Data Fusion shifts an organization's focus away from code and integration to insights and action.

BigQuery is Google's fully managed, low-cost, serverless data warehouse that scales with your storage and computing power needs. With BigQuery, you get a columnar and ANSI SQL database that can quickly analyze terabytes to petabytes of data. Analyze geospatial data using familiar SQL with BigQuery GIS. Quickly build and operationalize ML models on large-scale structured or semi-structured data using simple SQL with BigQuery ML, and support real-time interactive dashboarding with sub-second query latency using BigQuery BI Engine. Plus, BigQuery offers data transfer services, flexible data ingestion, and pay-for-what-you-use pricing.

Cloud Composer is a fully managed workflow orchestration service that lets you author, schedule, and monitor pipelines that span across clouds and on-premises data centers. Built on the popular Apache Airflow open source project and operated using the Python programming language, Cloud Composer is free from lock-in and easy to use. Plus, with end-to-end integration for Google Cloud workloads, you can orchestrate a full pipeline with all of Google Cloud's big data products.

Data Catalog is a fully managed and scalable metadata management service that organizations can use to quickly discover, manage, and understand all their data in Google Cloud. It offers a simple and easy-to-use search interface for data discovery, a flexible and powerful cataloging system for capturing both technical and business metadata, and a strong security and compliance foundation with Cloud Data Loss Prevention (DLP) and Identity and Access Management integrations.

Google Data Studio is a fully managed visual analytics service that can help anyone in your organization unlock insights from data through easy-to-create and interactive dashboards that inspire smarter business decision-making. When Data Studio is combined with BigQuery BI Engine, an in-memory analysis service, data exploration and visual interactivity reach sub-second speeds, over massive datasets.

Resources