This document provides a reference architecture to help you build the infrastructure to host a highly available enterprise application that uses an Oracle database, with the entire stack deployed on Compute Engine VMs. You can use this reference architecture to efficiently rehost (lift and shift) on-premises applications that use Oracle databases to Google Cloud. This document also includes guidance to help you build an Oracle Database topology in Google Cloud that meets Oracle's maximum availability architecture (MAA) requirements. The intended audience for this document is cloud architects and Oracle database administrators. The document assumes that you're familiar with Compute Engine and Oracle Database.
If you use Oracle Exadata or Oracle Real Application Clusters (Oracle RAC) to run Oracle databases on-premises, you can efficiently migrate your applications to Google Cloud and run your databases on Oracle Database@Google Cloud. For more information, see Enterprise application on Compute Engine VMs with Oracle Exadata in Google Cloud.
Architecture
The following diagram shows the infrastructure for a multi-tier enterprise application that uses Oracle Database. The web tier, application tier, and Oracle Database instances are hosted on Compute Engine VMs. The web tier and application tier run in active-active mode on VMs that are distributed across two zones within a Google Cloud region. The primary and standby database instances are deployed in separate zones. This architecture is aligned with the regional deployment archetype, which helps to ensure that your Google Cloud topology is robust against single-zone outages.
The architecture that's shown in the preceding diagram includes the following components:
Component | Purpose |
---|---|
Regional external Application Load Balancer | The regional external Application Load Balancer receives and distributes user requests to the web tier VMs. |
Google Cloud Armor security policy | The Google Cloud Armor security policy helps to protect your application stack against threats like distributed denial-of-service (DDoS) attacks and cross-site scripting (XSS). |
Regional managed instance group (MIG) for the web tier | The web tier of the application is deployed on Compute Engine VMs that are part of a regional MIG. This MIG is the backend for the external Application Load Balancer. The MIG contains Compute Engine VMs in two zones. Each of these VMs hosts an independent instance of the web tier of the application. |
Regional internal Application Load Balancer | The regional internal Application Load Balancer distributes traffic from the web tier VMs to the application tier VMs. |
Regional MIG for the application tier | The application tier, such as an Oracle WebLogic Server cluster, is deployed on Compute Engine VMs that are part of a regional MIG. This MIG is the backend for the internal Application Load Balancer. The MIG contains Compute Engine VMs in two zones. Each VM hosts an independent instance of the application server. |
Oracle Database instances deployed on Compute Engine VMs | The application in this architecture uses a primary-standby pair of Oracle Database instances that are deployed on Compute Engine VMs in separate zones. You bring your own licenses (BYOL) for these Oracle Database instances, and you manage the VMs and database instances. |
Hyperdisk Storage Pools | The VMs in each zone (across all the tiers in the application stack) use Hyperdisk Balanced volumes from a Hyperdisk Storage Pool. By creating and managing all the disks in a single storage pool, you improve capacity utilization and reduce operational complexity while maintaining the storage capacity and performance that the VMs need. |
Oracle Data Guard FSFO observer | The Oracle Data Guard Fast-Start Failover (FSFO) observer is a lightweight program that initiates automatic failover to the standby Oracle Database instance when the primary instance is unavailable. The observer runs on a Compute Engine VM in a zone that's different from the zones where the primary and standby database instances are deployed. |
Cloud Storage bucket | To store backups of the Oracle Database instances, this architecture uses a Cloud Storage bucket. To facilitate recovery of the database during a region outage, you can store the backups geo-redundantly in a dual-region or multi-region bucket. |
Virtual Private Cloud (VPC) network and subnet | All the Google Cloud resources in the architecture use a single VPC network and subnet. Depending on your requirements, you can choose to build an architecture that uses multiple VPC networks or multiple subnets. For more information, see Deciding whether to create multiple VPC networks. |
Public Cloud NAT gateway | The architecture includes a public Cloud NAT gateway to enable secure outbound connections from the Compute Engine VMs that have only internal IP addresses. |
Cloud Interconnect and Cloud VPN | To connect your on-premises network to the VPC network in Google Cloud, you can use Cloud Interconnect or Cloud VPN. For information about the relative advantages of each approach, see Choosing a Network Connectivity product. |
Cloud Monitoring and Cloud Logging | Cloud Monitoring helps you to observe the behavior, health, and performance of your application and Google Cloud resources. Ops Agent collects metrics and logs from the Compute Engine VMs, including the VMs that host the Oracle Database instances. The agent sends logs to Cloud Logging and sends metrics to Cloud Monitoring. |
Products used
This reference architecture uses the following Google Cloud products:
- Compute Engine: A secure and customizable compute service that lets you create and run VMs on Google's infrastructure.
- Google Cloud Hyperdisk: A network storage service that you can use to provision and dynamically scale block storage volumes with configurable and predictable performance.
- Cloud Load Balancing: A portfolio of high performance, scalable, global and regional load balancers.
- Cloud Storage: A low-cost, no-limit object store for diverse data types. Data can be accessed from within and outside Google Cloud, and it's replicated across locations for redundancy.
- Virtual Private Cloud (VPC): A virtual system that provides global, scalable networking functionality for your Google Cloud workloads.
- Google Cloud Armor: A network security service that offers web application firewall (WAF) rules and helps to protect against DDoS and application attacks.
- Cloud NAT: A service that provides Google Cloud-managed high-performance network address translation.
- Cloud Monitoring: A service that provides visibility into the performance, availability, and health of your applications and infrastructure.
- Cloud Logging: A real-time log management system with storage, search, analysis, and alerting.
- Cloud Interconnect: A service that extends your external network to the Google network through a high-availability, low-latency connection.
- Cloud VPN: A service that securely extends your peer network to Google's network through an IPsec VPN tunnel.
This reference architecture uses the following Oracle products:
- Oracle Database: A relational database management system (RDBMS) that extends the relational model to an object-relational model.
- Oracle Data Guard: A set of services to create, maintain, manage, and monitor one or more standby databases.
You're responsible for procuring licenses for the Oracle products that you deploy in Google Cloud, and you're responsible for complying with the terms and conditions of the Oracle licenses.
Design considerations
This section describes design factors, best practices, and design recommendations that you should consider when you use this reference architecture to develop a topology that meets your specific requirements for security, reliability, operational efficiency, cost, and performance.
The guidance in this section isn't exhaustive. Depending on the specific requirements of your application and the Google Cloud and third-party products and features that you use, there might be additional design factors and trade-offs that you should consider.
System design
This section provides guidance to help you to choose Google Cloud regions for your deployment and to select appropriate Google Cloud services.
Region selection
When you choose the Google Cloud region for your deployment, consider the following factors and requirements:
- Availability of Google Cloud services in each region. For more information, see Products available by location.
- Availability of Compute Engine machine types in each region. For more information, see Regions and zones.
- End-user latency requirements.
- Cost of Google Cloud resources.
- Regulatory requirements.
Some of these factors and requirements might involve trade-offs. For example, the most cost-efficient region might not have the lowest carbon footprint. For more information, see Best practices for Compute Engine regions selection.
Compute infrastructure
The reference architecture in this document uses Compute Engine VMs to host all the tiers of the application. Depending on the requirements of your application, you can choose the following other Google Cloud compute services:
- Containers: You can run containerized applications in Google Kubernetes Engine (GKE) clusters. GKE is a container-orchestration engine that automates deploying, scaling, and managing containerized applications.
- Serverless: If you prefer to focus your IT efforts on your data and applications instead of setting up and operating infrastructure resources, then you can use serverless services like Cloud Run.
The decision of whether to use VMs, containers, or serverless services involves a trade-off between configuration flexibility and management effort. VMs and containers provide more configuration flexibility and control, but you're responsible for managing the resources. In a serverless architecture, you deploy workloads to a preconfigured platform that requires minimal management effort. The design guidance for those services is outside the scope of this document. For more information about service options, see Application Hosting Options.
Storage options
The architecture shown in this document uses a Hyperdisk Storage Pool in each zone, with Hyperdisk Balanced volumes for the VMs in all the tiers. Hyperdisk volumes provide better performance, flexibility, and efficiency than Persistent Disk. For information about Hyperdisk types and features, see About Hyperdisk.
To store data that's shared across multiple VMs in a region, like configuration files for all the VMs in the web tier, you can use a Filestore regional instance. The data that you store in a Filestore regional instance is replicated synchronously across three zones within the region. This replication ensures high availability and robustness against zone outages. You can store shared configuration files, common tools and utilities, and centralized logs in the Filestore instance, and mount the instance on multiple VMs.
When you design storage for your workloads, consider the functional characteristics of the workloads, resilience requirements, performance expectations, and cost goals. For more information, see Design an optimal storage strategy for your cloud workload.
Network design
When you build infrastructure for a multi-tier application stack, you must choose a network design that meets your business and technical requirements. The architecture that's shown in this document uses a simple network topology with a single VPC network and subnet. Depending on your requirements, you can choose to use multiple VPC networks or multiple subnets. For more information, see the following documentation:
- Deciding whether to create multiple VPC networks
- Decide the network design for your Google Cloud landing zone
Security, privacy, and compliance
This section describes factors to consider when you use this reference architecture to design a topology in Google Cloud that meets the security and compliance requirements of your workloads.
Protection against external threats
To help protect your application against external threats like DDoS attacks and XSS, define appropriate Google Cloud Armor security policies based on your requirements. Each policy is a set of rules that specifies the conditions to be evaluated and actions to take when the conditions are met. For example, a rule could specify that if the source IP address of incoming traffic matches a specific IP address or CIDR range, then the traffic must be denied. You can also apply preconfigured web application firewall (WAF) rules. For more information, see Security policy overview.
External access for VMs
In the reference architecture that this document describes, the VMs that host the web tier, application tier, and Oracle Database instances don't need direct inbound access from the internet. Don't assign external IP addresses to those VMs. Google Cloud resources that have only private, internal IP addresses can still access certain Google APIs and services by using Private Service Connect or Private Google Access. For more information, see Private access options for services.
To enable secure outbound connections from Google Cloud resources that have only private IP addresses, like the Compute Engine VMs in this reference architecture, you can use Secure Web Proxy or Cloud NAT.
VM image security
Approved images are images with software that meets your policy or security requirements. To ensure that your VMs use only approved images, you can define an organization policy that restricts the use of images in specific public image projects. For more information, see Setting up trusted image policies.
Service account privileges
In Google Cloud projects where the Compute Engine API is enabled, a
default service account
is created automatically. For Google Cloud organizations that were created
before May 3, 2024, this default service account is granted the Editor
IAM role (roles/editor
), unless this behavior is disabled.
By default, the default service account is attached to all VMs that you create by using the Google Cloud CLI or the Google Cloud console. The Editor role includes a broad range of permissions, so attaching the default service account to VMs creates a security risk. To avoid this risk, you can create and use dedicated service accounts for each tier of the application stack. To specify the resources that the service account can access, use fine-grained policies. For more information, see Limit service account privileges.
Disk encryption
By default, the data that's stored in Hyperdisk volumes is encrypted using Google-owned and Google-managed keys. As an additional layer of protection, you can choose to encrypt the Google-owned data encryption keys by using keys that you own and manage in Cloud Key Management Service (Cloud KMS). For more information, see About disk encryption.
Network security
To control network traffic between the resources in the architecture, you must configure appropriate Cloud Next Generation Firewall (NGFW) policies.
More security considerations
When you build the architecture for your workload, consider the platform-level security best practices and recommendations that are provided in the Enterprise foundations blueprint.
Reliability
This section describes design factors to consider when you use this reference architecture to build and operate reliable infrastructure for your deployment in Google Cloud.
Robustness against VM failures
In the architecture that's shown in this document, if a Compute Engine VM in any of the tiers fails, the application can continue to process requests.
- If a VM in the web tier or application tier crashes, the relevant MIG recreates the VM automatically. The load balancers forward requests to only the currently available web server instances and application server instances.
- If the VM that hosts the primary Oracle Database instance fails, the Oracle Data Guard FSFO observer initiates an automatic failover to the standby Oracle Database instance.
VM autohealing
Sometimes the VMs that host your web tier and application tier might be running and available, but there might be issues with the application itself. The application might freeze, crash, or not have enough memory. In this scenario, the VMs won't respond to load balancer health checks, and the load balancer won't route traffic to the unresponsive VMs. To help ensure that applications respond as expected, you can configure application-based health checks as part of the autohealing policy of your MIGs. If the application on a particular VM isn't responding, the MIG autoheals (repairs) the VM. For more information about configuring autohealing, see About repairing VMs for high availability.
Robustness against zone outages
If a zone outage occurs, the application remains available.
- The web tier and application tier are available (and responsive) because the VMs are in regional MIGs. The regional MIGs ensure that new VMs are created automatically in the other zone to maintain the configured minimum number of VMs. The load balancers forward requests to the available web server VMs and application server VMs.
- If an outage affects the zone that has the primary Oracle Database instance, then the Oracle Data Guard FSFO observer initiates an automatic failover to the standby Oracle Database instance. The FSFO observer runs on a VM in a zone that's different from the zones that have the primary and standby database instances.
- To ensure high availability of data in Hyperdisk volumes during a single-zone outage, you can use Hyperdisk Balanced High Availability. When data is written to a volume, the data is replicated synchronously between two zones in the same region.
Robustness against region outages
If both of the zones in the architecture have an outage or if a region outage occurs, then the application is unavailable. To reduce the downtime caused by multi-zone or region outages, you can implement the following approach:
- Maintain a passive (failover) replica of the infrastructure stack in another Google Cloud region.
Use a dual-region or multi-region Cloud Storage bucket to store the Oracle Database backups. The backups are replicated asynchronously across at least two geographic locations. With replicated database backups, your architecture maps to Oracle's Maximum Availability Architecture (MAA) Silver tier.
To achieve faster replication for backups stored in dual-region buckets, you can use turbo replication. For more information, see Data availability and durability.
If an outage occurs in the primary region, use the database backup to restore the database and activate the application in the failover region. Use DNS routing policies to route traffic to the load balancer in the failover region.
For business-critical applications that must continue to be available even when a region outage occurs, consider using the multi-regional deployment archetype. For the database tier, you can use Oracle Active Data Guard FSFO to automatically failover to a standby Oracle Database instance in the failover region. This approach maps to Oracle's MAA Gold tier.
MIG autoscaling
When you run your application on VMs in a regional MIG, the application remains available during isolated zone outages. The autoscaling capability of stateless MIGs lets you maintain application availability and performance at predictable levels. Stateful MIGs can't be autoscaled.
To control the autoscaling behavior of your MIGs, you can specify target utilization metrics, such as average CPU utilization. You can also configure schedule-based autoscaling. For more information, see Autoscaling groups of instances.
VM placement
In the architecture that this document describes, the application tier and web tier run on Compute Engine VMs that are distributed across multiple zones. This distribution ensures that your application is robust against single-zone outages. To improve this robustness further, you can create a spread placement policy and apply it to the MIG template. With a spread placement policy, when the MIG creates VMs, it places them within each zone on different physical servers (called hosts), so your VMs are robust against failures of individual hosts. However, a trade-off with this approach is that the latency for inter-VM network traffic might increase. For more information, see Placement policies overview.
VM capacity planning
To make sure that capacity for Compute Engine VMs is available when required for MIG autoscaling, you can create reservations. A reservation provides assured capacity in a specific zone for a specified number of VMs of a machine type that you choose. A reservation can be specific to a project, or it can be shared across multiple projects. You incur charges for reserved resources even if the resources aren't provisioned or used. For more information about reservations, including billing considerations, see Reservations of Compute Engine zonal resources.
Block storage availability
The architecture in this document uses a Hyperdisk Storage Pool in each zone to provide block storage for the Compute Engine VMs. You create a pool of block storage capacity for a zone. You then create Hyperdisk volumes in the storage pool and attach the volumes to VMs in the zone. The storage pool attempts to add capacity automatically to ensure that the utilization rate doesn't exceed 80% of the pool's provisioned capacity. This approach ensures that block storage space is available when required. For more information, see How Hyperdisk Storage Pools work.
Stateful storage
A best practice in application design is to avoid the need for stateful local disks. But if the requirement exists, you can configure your disks to be stateful to ensure that the data is preserved when the VMs are repaired or recreated. However, we recommend that you keep the boot disks stateless, so that you can update them easily to the latest images with new versions and security patches. For more information, see Configuring stateful persistent disks in MIGs.
Backup and recovery
The architecture in this document uses Cloud Storage to store database backups. If you choose the dual-region or multi-region location type for the Cloud Storage bucket, the backups are replicated asynchronously across at least two geographic locations. If a region outage occurs, you can use the backups to restore the database in another region. With a dual-region bucket, you can achieve faster replication by enabling the turbo replication option. For more information, see Data availability and durability.
You can use Backup and DR Service to create, store, and manage backups of Compute Engine VMs. Backup and DR Service stores backup data in its original, application-readable format. When required, you can restore workloads to production by directly using data from long-term backup storage without time-consuming data-movement or preparation activities. For more information, see the following documentation:
More reliability considerations
When you build the cloud architecture for your workload, review the reliability-related best practices and recommendations that are provided in the following documentation:
- Google Cloud infrastructure reliability guide
- Patterns for scalable and resilient apps
- Designing resilient systems
Cost optimization
This section provides guidance to optimize the cost of setting up and operating a Google Cloud topology that you build by using this reference architecture.
VM machine types
To help you optimize the utilization of your VM resources, Compute Engine provides machine type recommendations. Use the recommendations to choose machine types that match your workload's compute requirements. For workloads that have predictable resource requirements, you can customize the machine type to your needs and save money by using custom machine types.
VM provisioning model
If your application is fault tolerant, then Spot VMs can help to reduce the Compute Engine costs for your VMs in the web tier and application tier. The cost of Spot VMs is significantly lower than regular VMs. However, Compute Engine might preemptively stop or delete Spot VMs to reclaim capacity.
Spot VMs are suitable for batch jobs that can tolerate preemption and that don't have high availability requirements. Spot VMs offer the same machine types, options, and performance as regular VMs. However, when the resource capacity in a zone is limited, MIGs with Spot VMs might not be able to scale out (that is, create VMs) automatically to reach the specified target size until the required capacity becomes available again. Don't use Spot VMs for the VMs that host the Oracle Database instances.
VM resource utilization
The autoscaling capability of stateless MIGs enables your application to gracefully handle increases in traffic to the web tier and application tier. Autoscaling also helps you to reduce cost when the need for resources is low. Stateful MIGs can't be autoscaled.
Oracle Database licensing
You're responsible for procuring licenses for the Oracle products that you deploy on Compute Engine, and you're responsible for complying with the terms and conditions of the Oracle licenses. When you calculate the Oracle Database licensing cost, consider the number of Oracle Processor licenses that are required based on the machine type that you choose for the Compute Engine VMs that host the Oracle Database instances. For more information, see Licensing Oracle Software in the Cloud Computing Environment.
Block storage resource utilization
The architecture in this document uses a Hyperdisk Storage Pool in each zone to provide block storage for the Compute Engine VMs. You can improve the overall utilization of block storage capacity and reduce cost by using Advanced capacity storage pools, which use thin provisioning and data reduction technologies to improve storage efficiency.
More cost considerations
When you build the architecture for your workload, also consider the general best practices and recommendations that are provided in Google Cloud Architecture Framework: Cost optimization.
Operational efficiency
This section describes the factors to consider when you use this reference architecture to design a Google Cloud topology that you can operate efficiently.
VM configuration updates
To update the configuration of the VMs in a MIG (like the machine type or boot-disk image), you create a new instance template with the required configuration and then apply the new template to the MIG. The MIG updates the VMs by using an update method that you specify: automatic or selective. Choose an appropriate method based on your requirements for availability and operational efficiency. For more information about these MIG update methods, see Apply new VM configurations in a MIG.
VM images
For your MIG instance templates, instead of using Google-provided public images, we recommend that you create and use custom OS images that include the configurations and software that your applications require. You can group your custom images into a custom image family. An image family always points to the most recent image in that family, so your instance templates and scripts can use that image without you having to update references to a specific image version. You must regularly update your custom images to include the security updates and patches that are provided by the OS vendor.
Deterministic instance templates
If the instance templates that you use for your MIGs include startup scripts
(for example, to install third-party software), make sure that the scripts
explicitly specify the software-installation parameters, like the software
version. Otherwise, when the MIG creates the VMs, the software that's installed
on the VMs might not be consistent. For example, if your instance template
includes a startup script to install Apache HTTP Server 2.0 (the apache2
package), then make sure that the script specifies the exact apache2
version
that should be installed, such as version 2.4.53
. For more information, see
Deterministic instance templates.
Block storage management
The architecture in this document uses a Hyperdisk Storage Pool in each zone to provide block storage for the Compute Engine VMs. Hyperdisk Storage Pools help simplify storage management. Instead of allocating and managing capacity individually for numerous disks, you define a pool of capacity that can be shared across multiple workloads in a zone. You then create Hyperdisk volumes in the storage pool and attach the volumes to the VMs in the zone. The storage pool attempts to add capacity automatically to ensure that the utilization rate doesn't exceed 80% of the pool's provisioned capacity.
Application server to database connectivity
For connections from your application to Oracle Database, we recommend that you use the database VM's zonal internal DNS name rather than its IP address. Google Cloud automatically resolves the DNS name to the VM's primary internal IP address. An added advantage with this approach is that you don't need to reserve and assign static internal IP addresses for the database VMs.
Oracle Database administration and support
When you run a self-managed Oracle Database instance on a Compute Engine VM, there are similar operational concerns as when you run Oracle Database on-premises. However, with a Compute Engine VM you no longer need to manage the underlying compute, networking, and storage infrastructure.
- For guidance about operating and managing your Oracle Database instances, see the Oracle-provided documentation for the relevant release.
- For information about Oracle's support policy for Oracle Database instances that you deploy in Google Cloud, see Oracle Database Support for Non-Oracle Public Cloud Environments (Doc ID 2688277.1).
More operational considerations
When you build the architecture for your workload, consider the general best practices and recommendations for operational efficiency that are described in Google Cloud Architecture Framework: Operational excellence.
Performance optimization
This section describes the factors to consider when you use this reference architecture to design a topology in Google Cloud that meets the performance requirements of your workloads.
Compute performance
Compute Engine offers a wide range of predefined and customizable machine types that you can choose from depending on the performance requirements of your workloads.
- For the VMs that host the web tier and application tier, choose an appropriate machine type based on your performance requirements for those tiers. To get a list of the available machine types that support Hyperdisk volumes and that meet your performance and other requirements, use the Machine series comparison table.
- For the VMs that host the Oracle Database instances, we recommend that you use a machine type in the C4 machine series from the general-purpose machine family. C4 machine types provide consistently high performance for database workloads.
Network performance
For workloads that need low inter-VM network latency, you can create a compact placement policy and apply it to the MIG template that's used for the application tier. When the MIG creates VMs, it places the VMs on physical servers that are close to each other. While a compact placement policy helps improve inter-VM network performance, a spread placement policy can help improve VM availability as described earlier. To achieve an optimal balance between network performance and availability, when you create a compact placement policy, you can specify how far apart the VMs must be placed. For more information, see Placement policies overview.
Compute Engine has a per-VM limit for egress network bandwidth. This limit depends on the VM's machine type and whether traffic is routed through the same VPC network as the source VM. For VMs with certain machine types, to improve network performance, you can get a higher maximum egress bandwidth by enabling Tier_1 networking. For more information, see Configure per VM Tier_1 networking performance.
Hyperdisk storage performance
The architecture that's described in this document uses Hyperdisk volumes for the VMs in all the tiers. Hyperdisk lets you scale performance and capacity dynamically. You can adjust the provisioned IOPS, throughput, and the size of each volume to match your workload's storage performance and capacity needs. The performance of Hyperdisk volumes depends on the Hyperdisk type and the machine type of the VMs to which the volumes are attached. For more information about Hyperdisk performance limits and tuning, see the following documentation:
More performance considerations
When you build the architecture for your workload, consider the general best practices and recommendations that are provided in Google Cloud Architecture Framework: Performance optimization.
What's next
- Accelerating cloud transformation with Google Cloud and Oracle
- Oracle MAA Reference Architectures
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.
Contributors
Author: Kumar Dhanagopal | Cross-Product Solution Developer
Other contributors:
- Andy Colvin | Database Black Belt Engineer (Oracle on Google Cloud)
- Jeff Welsch | Director, Product Management
- Lee Gates | Group Product Manager
- Marc Fielding | Data Infrastructure Architect
- Mark Schlagenhauf | Technical Writer, Networking
- Michelle Burtoft | Senior Product Manager
- Rajesh Kasanagottu | Engineering Manager
- Sekou Page | Outbound Product Manager
- Souji Madhurapantula | Group Product Manager
- Victor Moreno | Product Manager, Cloud Networking
- Yeonsoo Kim | Product Manager