Best practices for VMware Engine security

This document describes the recommended security best practices for managing and configuring Google Cloud VMware Engine and is intended for users who are already familiar with VMware Engine. If you're a beginner, you may consider starting with learning about the prerequisites and VMware Engine security.

VMware Engine has a shared responsibility model for security. Trusted security in the cloud is achieved through the shared responsibilities of customers and Google as a service provider. Following these best practices can help you save time, prevent errors, and mitigate the effects of points of failure.

Identify and understand all traffic flows of your environment

To protect your workloads and management interfaces on your private cloud, it's important to identify and understand all traffic flows of your environment. A list of key traffic flows in VMware Engine is available in the document private cloud networking for VMware Engine.

VMware Engine leverages Private Services Access (PSA) to expose a private network connection from a VMware Engine private cloud network to your VPC network. Ingress traffic from a VPC in your Google Cloud environment or from an on-prem network traverses a Google-managed Services Producer network.

Use VMware Engine's Public IP Service for internet ingress

Internet traffic can enter a private cloud directly using the Public IP Service of VMware Engine. Alternatively, internet traffic can enter using a public load balancer on Google Cloud. In that case, the traffic is routed like any other ingress traffic through Private Service Access. Note that these options are mutually exclusive. If custom controls are required for internet traffic, such as URL filtering, IPS/IDS or traffic inspection provided by a central instance or service in your Google Cloud environment, you should route internet-bound traffic through your VPC and the Service Producer Shared VPC.

If this is not applicable to you or you have the controls implemented within your private cloud, we recommend that you include the public IP address service in VMware Engine. In addition, we recommend using the stateful firewall rules to deny traffic patterns from the internet that do not apply to your applications.

Separate north-south and east-west firewall rules on gateway and distributed firewall in VMware Engine NSX-T

Configure the distributed firewall (DFW) in NSX-T on the tier-1 logical router to segment internal traffic between your virtual layer 2 domains. The NSX DFW is designed to handle (internal) east-west network traffic between segments and allows firewall rules that allow and deny traffic between individual instances inside a segment.

For fine-grained network access control make sure to apply a restricted default policy on the DFW to deny network traffic between instances by default. Use the DFW to specifically allow traffic between applications and between services inside your application.

Configure the NSX gateway firewall to control north-south traffic that enters and leaves the private cloud.

The NSX gateway firewall is designed to control north-south traffic and recommended for use cases such as controlling the traffic at a perimeter to another security zone. If you need to configure the north-south traffic for the entire private cloud consistently, then configure the gateway firewall on the tier-0 router. If you need to configure the north-south traffic for each individual NSX-T segment, then configure the gateway firewall on the tier-1 router.

In addition to NSX-T firewalls, it's recommended to utilize the VPC firewall to allow and block east-west traffic between workloads in the VMware Engine private cloud and workloads in VPCs. Ingress to Compute Engine instances from VMware Engine workloads should be restricted by default with only consciously opened traffic.

Egress to management appliances as well as to the vSphere/vSAN CIDR range should be blocked from VPCs as well using the VPC Firewall. Only open egress towards management appliances from trusted hosts and IP addresses inside your network. It's important to note that management appliances are not inside an NSX-T segment and hence, DFW rules do not apply to restrict access.

Apply Zero Trust Security principles and micro-segmentation in NSX-T

Utilize the NSX-T DFW to implement traffic controls for security segments that are as granular as individual virtual machines. This principle of protecting traffic between individual VMs that is denied by default is often also referred to as "Micro-segmentation", which is a more granular approach for firewalling than the traditional implementation of firewalls between Layer 3 domains.

The DFW is enabled in the hypervisor kernel on all VMware Engine vSphere hosts in your private cloud and can control traffic flow between workloads that are either in the same or in separate NSX segments. Firewall rules to allow traffic to and from VMs can be defined by organizing the VMs into policy groups, which can have flexible membership criteria such as VM tag or name matching.

Micro-segmentation allows you to implement a network with fine-grained traffic control where any desired traffic pattern needs to be explicitly allowed. The security concept where all network flows are controlled by identity and device verification processes rather than implicit trust is often also called Zero Trust Security.

Deploy a third-party firewall appliance from the Cloud Marketplace portal for IPS/IDS capabilities

If you require advanced layer 7 security, including IDS/IPS capabilities for ingress traffic into the private cloud from the rest of your network or between your NSX-T network segments, consider deploying a third-party firewall appliance. The third-party appliance can be deployed either as a multi-NIC appliance between two VPCs in your Google Cloud network or inside the private cloud with an integration with NSX-T.

For a deep-dive in VMware Engine architectures with centralized appliances, which can be used for a variety of advanced security use cases, such as IPS/IDS, DDoS, SSL offloading and more, consult the document network security using centralized appliances in the Cloud Architecture Center.

Use Google Cloud Armor to protect web services on VMware Engine from DDoS attacks

If you route ingress traffic to workloads on VMware Engine through the customer VPC, then we recommend placing VMware Engine workloads in hybrid network endpoint groups behind Traffic Director and leveraging the external HTTP(S) load balancer. Either setup allows you to include Google Cloud Armor for public-facing applications, mitigating DDoS attacks and common vulnerabilities like SQL injections or cross-site scripting.

If you require Service Mesh features such as advanced traffic management using the Envoy proxy or integration of Certificate Authority Service, we recommend Traffic Director. In all other cases, we recommend the external HTTP(S) load balancer.

Follow the documentation on how VMware Engine workloads can be added to Hybrid NEGs in either one of the following setups:

Connect to Google Cloud Services privately without internet access

VMware Engine private cloud workloads can access Google Cloud APIs such as the Cloud Storage API using Private Google Access. We recommend you use Private Google Access to access Google services without sending traffic over the internet because it reduces egress cost and latency. This also removes the need for a network path to the internet for workloads that only need Google API access. Follow the deep-dive into Private Google Access for more technical details and configuration steps.

Similarly, VMware Engine workloads which need to access Google Cloud resources from a service producer network such as Cloud SQL or Memorystore instances should connect privately using PSA. For more information on this topic visit the section on PSA for VMware Engine.

Encrypt the communication between your on-premises environment and Google Cloud

Workloads on VMware Engine that need to communicate with systems on-prem should connect over an encrypted channel. We recommend a layered approach for encryption in-transit between your on-prem data centers and Google Cloud. The link between on-prem and Google Cloud can either be encrypted by setting up Cloud VPN with an IPSec tunnel or by using the native IPSec on the VLAN attachments of the Interconnect. In addition, application layer encryption should be enabled between application components using TLS.

Protect your data from exfiltration using VPC Service Controls

It's recommended to mitigate data exfiltration risks using VPC Service Controls by placing your sensitive resources such as Cloud Storage buckets and BigQuery datasets into a VPC Service Controls perimeter. Workloads that need to access data inside a perimeter need to be placed into the perimeter as well. Specifically, the Google Cloud project which hosts the private cloud needs to be part of the VPC Service Controls perimeter to access resources that are protected by VPC Service Controls.

You need to configure ingress and egress policies in your VPC Service Controls configuration to allow the VMware Engine producer service APIs into the perimeter. For detailed guidance on the setup follow our documentation pages on VPC Service Controls with VMware Engine.

VMware Engine IAM and Permissions

The following sections introduce best practices for user permissions in a VMware Engine environment. It's important to take care of permissions within the VMware Engine environment and the Google Cloud project in which the private cloud is deployed.

Use predefined roles or custom roles to grant access

The VMware Engine approach to manage vSphere roles and permissions can be leveraged the same way you are used to leveraging from other VMware Engine environments. However, activities like deploying a cluster require permissions in Identity and Access Management (IAM). The following table lists relevant access managers, the identity sources that they grant permissions to, and example activities that they enable.

Platform Component Identity source Where to configure permissions Example activities
Google Cloud VMware Engine portal Cloud Identity Identity and Access Management Private cloud deployment and cancelation, cluster deployment and cancelation, for example.
VMware Engine vCenter LDAP Hosts and clusters, VMs and folders, datastores in the vCenter UI VM creation, VM folder creation, datastore object creation and deletion, for example
NSX-T LDAP "Users and Roles" in the NSX-T Manager UI NSX segment creation, firewall configuration, load balancer configuration, for example.
vCenter VM Guest Operating System Active Directory, LDAP, Local Users, for example Guest Operating System SSH or RDP Login, file operations, for example

In Google Cloud IAM, there are two predefined roles with permissions to the VMware Engine portal:

  • VMware Engine Service Admin - gives full access to the VMware Engine service on Google Cloud.
  • VMware Engine Service Viewer - gives read-only access to the VMware Engine service on Google Cloud.

These permissions relate to actions in the VMware Engine portal and do not relate to actions in the API or CLI. Note that also the basic roles include the ability to manage the VMware Engine service (Owner, Editor) or to view the service details (Viewer). Generally, it's recommended to use predefined roles instead of basic roles because they provide more fine-grained permissions.

Programmatic access to the VMware Engine using service accounts using the API or CLI should be constrained using predefined roles or custom roles because they include more fine-grained permissions that apply only to VMware Engine. If programmatic access is only used for a task that requires only a specific subset of the permissions of the predefined roles, then it's recommended to create a custom role.

Choose an appropriate location for the IAM role assignment in your organization's resource hierarchy. If you are running all your VMware Engine private clouds in one project only the roles can be assigned on project level. If there are technical or organizational requirements that result in your private clouds being located in separate projects, define the required roles in a folder that is common to the projects used for your private clouds.

Cloud IAM permissions are not required for activities that only need to be done inside vCenter, NSX-T or HCX. Staff that only needs to operate these environments do not require the previously listed IAM roles. Instead, they should use LDAP identities with permissions configured in vCenter and NSX-T. We recommend to provide the VMware Engine Service Admin or VMware Engine Service Viewer roles only to a very limited number of users since these roles grant access to the powerful CloudOwner user account for vCenter and the admin user account for NSX-T. These user accounts should only be used for initial setup or break-glass procedures.

Restrict and actively audit administrator access

The VMware Engine Service Admin role is very powerful and should only be assigned to users who need to manage the lifecycle of the VMware Engine private cloud and their clusters. Typically, the manual addition or deletion of clusters and nodes is an action that does not happen frequently and may have a high impact on billing or the availability of the cluster. Only assign this role to very few people in your organization.

Make sure to regularly audit who has been assigned the VMware Engine Service Admin role, either directly on the project that is used for VMware Engine or on one of the parent levels of the resource hierarchy. This audit should include other roles, like the basic Editor and Owner roles that include critical permissions related to VMware Engine. You can use services like the IAM roles recommender to help identify overly privileged roles.

Configure an LDAP or Active Directory identity source

An identity provider which supports LDAP authentication, such as Active Directory, should be configured to enable user authentication for vCenter and NSX Manager. This is a recommended practice to have central identity lifecycle management, group management, password management and more. Note that directly joining vCenter and NSX-T to Active Directory for integrated Windows authentication is not supported.

Rotate the passwords of built-in service accounts

VMware Engine generates credentials to access management appliances in the private cloud (such as vCenter, NSX-T and HCX). it's recommended to establish a process to rotate the passwords of the default vCenter service account CloudOwner@gve.local and the default NSX-T service account admin. Both user accounts should only be used for the initial configuration and break-glass procedures and their passwords be rotated regularly (e.g. every 60 or 90 days). Equivalently, regularly rotate the passwords of solution user accounts which are commonly used for the integration of third party tooling. The more often you rotate service account passwords, the less likely it is that the password is still valid when a bad actor finds it.

VMware Engine Logging and Monitoring

The following sections introduce best practices for logging and monitoring of both VM workloads and the VMware Engine infrastructure, which provides the resources that workloads consume.

Ingest VMware Engine logs and metrics

Many organizations wish to collect and analyze logs in a centralized "Single Pane of Glass." In Google Cloud, the Cloud Logging and Cloud Monitoring products provide services which can be used for centralized management of logs and metrics. VMware Engine can be integrated with Cloud Monitoring using a standalone agent. In this configuration, vCenter forwards metrics such as ESXi CPU and memory utilization to Cloud Monitoring. It is recommended to create dashboards based on metrics that are forwarded by vCenter, or to get started with some sample dashboards that have been published on GitHub.

To collect platform logs, VMware Engine private clouds can forward Syslog logs to a centralized log aggregator. This can be done both for vCenter and NSX-T Syslog messages. Collecting, retaining and analyzing Syslog messages from vCenter has important security use cases, such as real-time alerting based on administrative user (or "break-glass" user) logins which should be performed only in exceptional circumstances. To analyze Syslog messages, a Syslog aggregator such as Fluentd or the Standalone agent needs to be configured to relay the messages to Cloud Logging.

It's recommended to analyze logs from VMware Engine in a central dashboard in a single project. If your VMware Engine environment spans multiple projects, you'll additionally need to aggregate your projects by configuring log sinks and monitoring scopes.

Use the Cloud Logging agent for workload VM logging

VMware Engine workload VMs can send logs directly to Cloud Logging API, using the Logging agent. The Logging agent is based on fluentd and streams logs from common third-party applications and system software to Cloud Logging. As a best practice, align the approach for collecting and analyzing logs for workload VMs on VMware Engine with the approach for Compute Engine instances and your on-premises estate (if applicable). The use of the Logging agent on VMware Engine matches the approach that is used for VMs on Compute Engine such that workloads on both platforms send their logs to Cloud Logging.

Apply equivalent capabilities of Access Transparency and Access Approval policies

While VMware Engine does not yet natively support access transparency (AxT) and access approval (AxA) in Google Cloud, we have implemented processes with equivalent capabilities which can be enabled by request.

For access transparency equivalency you need to consider several sources of logs, including:

  • vCenter logs - exportable using remote syslog server configuration.
  • ESXi logs - these can be collected using remote syslog configuration, however, you need to file a support request with Google Cloud to configure ESXi syslog forwarding.

If you have strict regulatory requirements, we implement a policy to provide equivalent capabilities for access approval. In this policy, standard service operations require a support ticket to be generated with a reason why access service operators require access.

Google Cloud Access Approval exclusions apply.

VMware Engine Encryption

The following sections introduce best practices for the encryption of storage in the private cloud, and driving factors to consider when choosing a key provider for your private cloud.

Use a Google-managed key provider enabled for vSAN encryption at rest

Encryption of data at rest is implemented using vSAN software based encryption. By default, VMware Engine enables vSAN encryption on each ESXi cluster and configures a default key provider in vCenter. Google requires customers to keep vSAN encryption enabled on their ESXi clusters and disabling vSAN encryption is a violation of the service terms for VMware Engine. Many organizations require encryption at rest as part of their company policies or are obligated by regulations to encrypt data (for example, NIST, FIPS).

Each ESXi host encrypts data using the standard AES-256 XTS mode with different, randomly generated Data Encryption Keys (DEK). The DEK is encrypted using a Key Encryption Key (KEK) and only stored on the disk in encrypted form. The vCenter server only stores the ID of the KEK but not the KEK itself which is stored in a Cloud Key Management Service (KMS). You can choose the location of the Cloud KMS where your KEK is kept.

We recommend you use the Google-managed default key provider. If, however, you're required to manage your Cloud KMS yourself, you can use a third-party KMIP 1.1 compliant Cloud KMS by one of the supported vendors. In both cases, the key provider can be used to encrypt data at-rest and vMotion traffic in-transit.

The following table highlights key differences between the default key provider and third-party Cloud KMS integrations:

Key provider Pros Cons
Default Google-managed key provider
  • Simplicity: Deployed "out of the box" with no vendor management and no operational burden
  • End-to-end support by Google
  • Simplest method of the ability to rotate DEKs/KEKs is the key requirement
  • No additional cost
  • Built-in zone redundancy for high availability
  • Not possible to bring your own key material (BYOK)
  • The KEKs are stored and managed in the Google infrastructure. External Key Managers (EKM) are not supported.
Third-party Cloud KMS key provider
  • Full control over the encrypted data and encryption key
  • Hardware backed keys can be stored in an HSM appliance
  • Additional complexity and operational overhead
  • Additional cost
  • Possible additional latency, especially in the case of SaaS KMS
  • Possible lower availability

Note that it is not recommended to enable VM-level encryption together with vSAN datastore encryption because deduplication efficiency approaches zero for encrypted VMs.

Automate the rotation of encryption keys according to your organization's standards

You are responsible for the KEK rotation using functionality provided by VMware Engine vSphere. This is the case both with the default key provider and with an external Cloud KMS. The KEK rotation can be initiated from vCenter or using its API. Consider automating the KEK rotation according to your organization's requirements. You can find an example PowerCLI script on GitHub.

VMware Engine Backup and Disaster Recovery

It's important to protect your data against threats like ransomware, corruption, and human error. Further, business-critical applications rely on your data being available virtually at all times, leaving you little time to recover data from sudden outages. This section does not contain a full coverage of all backup and disaster recovery aspects that are relevant to effectively design a backup and DR strategy to keep your data secure and available but contain key considerations when choosing the right strategy for your VMware Engine environment.

Backup your workloads using Backup and DR Service

With Backup and DR Service Google Cloud offers a centrally-managed, native backup solution that can be used for a variety of use cases, including backup of workloads on Compute Engine and Google Cloud VMware Engine. Backup and DR Service is Google's recommended single solution for workload backups, because it offers benefits such as a broad spectrum of workload support, space-efficient, incremental-forever backups, and flexible storage options.

Google Cloud VMware Engine also supports using third-party, agent-based backup solutions. You may prefer these if you already have licenses for a third-party backup product. Prerequisites for these kinds of tools include the following:

  • They provide application-level backups
  • They are certified by the application vendors
  • They are certified by VMware Engine for vSAN
  • They support the VMware Engine vStorage API for Data Protection (VADP) protocol standard or take application-level backups

Regardless of the backup solution of your choice, we recommend Cloud Storage as a cost-effective storage option for long-term retention of backups. Cloud Storage is a highly durable, cost-effective object storage. Cloud Storage buckets can be configured to automatically replicate storage objects across multiple regions which is ideal for multi-regional Cloud topologies.

Cloud Storage is also ideal for long-term archival as it provides lifecycle policies to automatically move storage objects to another storage tier once their lifetime exceeds a predefined value. Use this option for a cost-effective backup storage location and medium to high RPOs, especially if cost is a driving factor.

Alternatively, you can choose vSAN storage to minimize RPO instead. Use this storage option if a higher cost for backup storage is acceptable and RPO requirements can't be met with Cloud Storage. Avoid this option for long-term archival, since there is a risk that VMware Engine cluster sizes become storage bound.

Implement disaster recovery with Backup and DR Service

We recommend restoring applications on VMware Engine using Backup and DR Service. To protect your production workloads from outages of a single zone inside a region, it's recommended to deploy and operate a private cloud in a secondary zone inside the region if VMware Engine is available in more than one zone of that region. If this is not the case, it's recommended to restore your applications in a secondary region.

In addition to Google Cloud Backup and DR, VMware Engine is compatible with other options for DR such as VMware Engine SRM and Zerto. Both VMware Engine SRM and Zerto rely on vSphere replication for disaster recovery which generally supports lower RPO targets. If your RPO target is minutes rather than hours, consider DR solutions based on vSphere replication.

Checklist summary

The following checklist summarizes the security best practices for using the VMware Engine.

Task Topic
VMware Engine Networking
VMware Engine IAM and Permissions
VMware Engine Logging and Monitoring
VMware Engine Encryption
VMware Engine Backup and Disaster Recovery

What's next