Google Distributed Cloud air-gapped release notes

October 6, 2023 [GDCH 1.11.0]


Google Distributed Cloud air-gapped 1.11.0 is now available.

See the product overview to learn about the features of Google Distributed Cloud air-gapped.


Add-on Manager:

  • The Runtime Config Processing component processes runtime information and produces deterministic results.
  • Add-on Manager supports the auto-generation of OpenAPI schema for ConfigTypeSpec parameters.
  • Kind clusters can use a dedicated network.

Artifact Management:

  • Added support for package validation to ensure that the integrity of the relevant artifacts used for provisioning or updating are maintained and not tampered with.
  • Added support for IP privatization to protect intellectual property of system artifacts.

Backup and restore:

  • Backup and restore create, read, update, and delete operations for backup resources are available in the UI.
  • Replaced the VM backup and restore APIs on the system cluster with a new VM API on the org admin cluster. VM Backup APIs are available from the org admin cluster.

Billing:

  • Introduced automated hourly computation of usage costs.
  • Introduced automated monthly generation of invoices.
  • Added new processes for the IO to configure one-time charges and recurring charges based on customer usage and contract.
  • Added a new public SKUDescription resource that lets you query for prices for services.
  • Introduced PDF export for invoices and pricing calculator estimates.

Cluster:

  • Introduced the new Cluster KRM API for user cluster creation.
  • Enabled node pool deletion in the UI.
  • Enabled node pool downscaling in the UI.
  • Added Cluster dashboards.

Database Service:

  • Added support for creating a standby postgres database in the same zone, and issuing a failover command.
  • Added support for defining maintenance windows for database clusters by using the gdcloud CLI.
  • Added support for PostgreSQL 14.5.
  • Added support for automatic in-place database minor version upgrades in user-specified maintenance windows.
  • Added support for configuring a list of supported database flags. Additional supported flags are added for Postgres and Oracle. Database Service also validates and rejects unsafe flag values to avoid database outages.
  • The Database Service CLI command group moved from gdcloud alpha database to gdcloud database.
  • Added two new commands:
    • gdcloud database clusters update for updating the database settings
    • gdcloud database clusters failover for causing failover for an HA database cluster
  • Added support for enabling external connectivity through the gdcloud database clusters update command.
  • Added support for setting database flags through the gdcloud database clusters create and gdcloud database clusters update commands.
  • Added support for a new AlloyDB Omni database engine (Preview) in the GDC database service.

Disaster Recovery:

Firewall:

  • Health checks and firewall upgrade preflight checks are added.
  • Firewall upgrade for segmentation is supported.

Hardware security module:

  • HSM secret rotation is automated.
  • Introduced automated HSM backup improvements and a gdcloud command for taking manual HSM backups.
  • Security keys protecting storage data volumes are annotated by the associated resources that they protect.
  • Security keys are deleted when their associated HSM keys are destroyed.
  • HSM initialization is automated.
  • Introduced a gdcloud command to print HSM network info.
  • Introduced a gdcloud command to create a config file for communicating with HSMs.

Identity and access management:

  • After 15 or more minutes of inactivity in a session, the GDCH console and gdcloud CLI logs you out.
  • If a member leaves your organization or team, you can revoke their access to GDC.
  • When running the gdcloud init command, an https:// prefix is added to the URL if the prefix is missing.
  • Encrypted tokens are supported for OIDC identity providers.
  • The web-tls certificate is forwarded to the system cluster.

Key Management Service (KMS):

  • A new role is added for PAs and AOs to create and manage secrets in projects.
  • Improved the KMS Golden Metrics monitoring dashboard.
  • KMS can be configured with a Kubernetes secret based root key, or an HSM backed root key.
  • Improved KMS audit logging.
  • Introduced a gdcloud command to verify key signatures.

Logging and audit logging:

  • Added support for Loki high availability.
  • General performance, stability and scalability improvements.
  • Fully automated integration of logging and audit logging pipelines with object storage.
  • Added IO and PA/AO audit logs separation across different object storage buckets.
  • Loki version updated to v2.8.4.
  • Added support for NetFlow logs fanout across multiple organizations.

Marketplace:

  • The GDC Marketplace lets you access popular software that has gone through security scanning and been validated for GDC. Elastic Cloud on Kubernetes (ECK) is the official Elastic Kubernetes Operator. With the power of Elastic Cloud on ECK, application and operation teams can focus on their Elasticsearch, observability, and security use cases instead of spending valuable time managing Elasticsearch deployments.

Networking (physical) :

  • Added support for configuring Direct Connect (DX) Interconnect.
  • Direct Server Return (DSR) is supported. DSR is a load balancing configuration where servers behind a load balancer return responses directly to the client.

Networking (virtual)

  • External load balancer (ELB) services are created in default-deny mode, and require ProjectNetworkPolicy to allow access to workloads using ELB.
  • Fabric trace tooling is added, which allows an operator to view information about flows and traffic flowing through the organization.
  • A network diagnose CLI command is added.
  • A BGP diagnose CLI tool is added.

Node platform and operating system:

  • Added support for applying runtime modifications to a GDCH operating system by using a custom orchestration mechanism to safely configure the target operating system.
  • The startup script is updated to use the pre-installed package OS.
  • The server reconciler supports the force power off API.
  • A fencing reconciler watches IM and controls the server.
  • The NodeUpgrade and NodeUpgradeTask API added more columns such as status and concurrency.

Object storage:

  • Project deletion checks if there are any buckets in the project before deleting.
  • The object storage load balancer is updated to not use a web-tls certificate for communicating with Istio.

Platform authentication:

  • Improved the Bring your Own Certificate mode for all certificates served by the ingress (customer facing endpoints). The Infrastructure Operator can use the customer's public key infrastructure (PKI) for signing certificate signing requests and can install the certificates to customer facing endpoints.

Resource manager:

  • Added support for deleting a project using the GDC console and gdcloud CLI.
  • Added support for creating, managing, and deleting projects by using the gdcloud projects command group.

Ticketing system:

  • The ticketing system automatically deploys with the gdchservices organization.
  • A Security Incident Response module is added.
  • Audit logs are forwarded with the service name gdchservices.
  • Email to case workflow support is enabled.
  • Major Incident workflow is enabled.
  • Support for metrics collection and reporting is enabled.
  • Scripted configuration upgrade is enabled.

Transfer service:

  • Ongoing stats for long-lived transfers are logged.
  • Added support for custom CA certificates for object storage.

Upgrade:

  • IOs and PAs can pin certain nodes in the pool to upgrade instead of having to upgrade the entire pool.
  • Prometheus is configured to scrape the VM for metrics.
  • An upgrade status metric is added to the upgrade orchestrator.
  • An Upgrade Status dashboard is added.
  • Added pre-flight checks prior to controller upgrades.

Vertex AI:

  • You can enable or disable a specific Vertex AI pre-trained service. The pre-trained services are Optical Character Recognition (OCR), Speech-to-Text, and Translation.
  • Vertex AI OCR supports asynchronous file detection and annotation. Asynchronous OCR enables users to schedule multiple jobs to batch process a collection of files, such as images and PDFs. Each job is identified by a unique identifier, which lets the user monitor the status of the job.

Virtual machines (VM):

  • Replaced the VM API on the system cluster with a new VM API on the org admin cluster. VM APIs are available from the org admin cluster.
  • Launched VM HA to ensure VM availability during a node outage.
  • Launched the bring your own image feature, which lets you import your own Ubuntu and RHEL operating system images.
  • Added support for Rocky 8 and RHEL 8 operating systems, with one out-of-the-box Rocky image released.
  • Launched a CLI-oriented operation experience by using the gdcloud compute command group.
  • Launched VM package manager, which automatically sets up VMs to install and update APT and RPM packages from a GDCH-provided package manager.
  • The NTP of the VM is automatically set up without human intervention.
  • Introduced the ability to store and serve VM images with object storage instead of Harbor to increase image availability and support large VM image distribution.
  • Introduced an VM Egress improvement that lets you enable or disable VM egress filed without requiring VM rebooting.

Addon Manager:

Database Service:

  • Monitors the health of the database after the database engine upgrade.
  • Database Service provides a runbook for manually restoring database data after upgrade.

Logging and audit logging:

  • New version of LoggingTarget and LoggingRule APIs.
  • New version of AuditLoggingTarget API.

Cluster:

  • A user cluster does not become ready in time to restart the coredns deployment.
  • A system cluster for the gdchservices organization gets stuck during its creation.
  • ClusterBGPPeer misses the node information.

Firewall:

  • Platform admins cannot find firewall logs in the monitoring dashboard.
  • TLS certificate is not installed.
  • The master key rotation might fail.
  • The firewall gdcloud commands for mgmt-setup and install might fail.

File and block storage:

  • Deployment netapp-harvest in root cluster is not up.

Hardware security module:

  • Hardware security module frequently toggles between the ServicesNotStarted and ready states.

Monitoring:

  • Alerts in organization system clusters don't reach the ticketing system.
  • The MonitoringTarget does not update when user clusters are added or removed from projects.
  • There is an issue with high CPU usage for cortex and cortex-tenant.
  • Alerts MON-A2004 (Metrics Data Plane Error Rate) and MON-A2006 (Alerting Data Plane Error Rate) are unreliable in 1.11.0.

Networking:

  • There is a limitation on the egress NAT performance for TCP traffic and UDP traffic.
  • Performance degradation for UDP services exposed through external load balancers.
  • The API server is blocked intermittently.
  • The etcd is overloaded.
  • Address Resolution Protocol (ARP) packets are rejected.
  • The org admin server is unable to pxe boot.

Node platform and operating system:

  • Bare metal server provisioning might hang, blocking organization creation.
  • An org admin server gets stuck in the available state.

Upgrade:

  • The GPU device plugin does not start during an upgrade.

Vertex AI:

  • The ODSPostgresDBCluster creation fails when recreating database clusters.
  • The Vertex AI monitoring dashboards don't display system metrics.

Virtual machines (VM):

  • Unbalanced worker node after upgrade.
  • The gdcloud compute ssh command does not work.
  • A VM is unresponsive after its host reboots.

VM Backup and Restore:

  • Role-based access control (RBAC) and schema settings in the VM manager is stopping users from starting VM backup and restore processes.

Add-on Manager:

  • An issue with the cluster operator missing in the root admin cluster is fixed.

Artifact Management:

  • An issue with artifact registry pods not being evicted during the node draining process is fixed.
  • The Upgrade Admin role is updated to add permissions to use the docker-credential-gdcloud and load-oci CLI tools.
  • The failover registry preflight timeout of 30 seconds is applied over all checks.

Backup and restore:

  • An issue with backup repository deletion failure is fixed.

Clusters:

  • A problem with the VM disk continuing to grow in a pending state is fixed.
  • When the InventoryMachineCreated condition type is true, it's not overwritten with the InstallingCertificate reason.
  • The default machine type for org admin clusters is 01-standard1-64-gdc-metal.

Disaster recovery:

  • Users with the DR admin role can create config maps.
  • The host network on the hardware controller is enabled to allow connectivity with HSM.

DNS:

  • systemd-resolved is disabled when configuring DNS on the bootstrapper.
  • Updated the DNSRegistration status with an error when neither the IP or the Istio gateway config is specified.
  • Reverse DNS resources are added to IO predefined DNS roles.

Block storage:

  • Errors found during collecting metrics are returned.
  • The Google Distributed Cloud Virtual for Bare Metal node draining deadline is increased during user cluster upgrade to avoid potential data corruption.
  • An issue with StorageEncryptionConnection objects not being generated is fixed.
  • A problem with storage node volumes not being encrypted is fixed.
  • A storage cluster validation check is added after storage cluster bootstrap.
  • A problem provisioning persistent volumes in the system cluster during bootstrap is fixed.

Firewall:

  • Mark the ConfigReplaceCompleted condition as false to reflect the overall status of the firewall node.
  • The problem with the firewall skipping the antivirus signature install is fixed.
  • The problem with the bootstrap failing to initialize the PAN-OS client is fixed.
  • A problem with the ObjectStorageClientLif address group missing in the root vsys is fixed.
  • A firewall bootstrap failure with an invalid memory address is fixed.

Hardware security modules:

  • The target address for HSM provides a full URL.
  • A problem with USB ports not being disabled and USB events not being logged is fixed.

Identity and access management:

  • A problem logging in with the CLI when no user cluster is created is fixed.
  • The TTL reconciler can delete bindings created using Infrastructure as Code.
  • The OrganizationRole is not propagated to the system cluster as it does not have the RBACSelectorSystemScope attached.
  • An issue with assigning roles to service identities in the console is fixed.
  • An internal system error in the Access page is fixed.
  • A problem with the gdch- prefix being added to the user email after login is fixed.
  • An issue where a JupyterLab notebook cannot be opened is fixed.
  • Organization IAM Admins can assign the IDP Federation Admin role.

Key management service (KMS) :

  • A problem with the KMS monitoring dashboard not receiving data is fixed.
  • A problem with KMS alerts failing to trigger is fixed.
  • Rotating the root key no longer causes downtime in the KMS data plane.

Monitoring:

  • The ObservabilityPipeline reconciler upgrades the LogMon CR with new annotations.
  • A problem with unhealthy metrics and cortex pods is fixed.
  • A problem with the monitoring target not honoring the project's cluster selector is fixed.
  • A problem with Loki data sources configuration is fixed.
  • A problem with firewall and switch metrics not displaying in the dashboard is fixed.

Networking (physical) :

  • The gdcloud system preinstall cleanup command introduced an optional --switches flag that accepts a list of switches to reset.
  • There is a new field in the TORSwitch and AggSwitch custom resources to contain allocated dataplane IP addresses.
  • An issue which causes status updates to the switch to fail when it is paused is fixed.
  • A problem with the switches failing due to old certificates is fixed.

Node platform and operating system:

  • The root admin node OS upgrade creates manual distributions at the same time.
  • An issue with the OS artifact snapshot collector is fixed.
  • Pod crash looping due to SDS server start failure is fixed.
  • An issue with VM creation failure due to physical node BIOS without KVM enabled is fixed.
  • A problem with servers getting stuck in a BMC certificate generation loop, resulting in provisioning failures during bootstrap, is fixed.
  • A problem with an organization upgrade failure is fixed.

Object storage:

  • An issue with the ObjectStorageSite CR status not updating is fixed.
  • An issue with exceeding the object storage user tenant quota is fixed.
  • An issue with the object storage system failure after a VM reboot is fixed.

Ticketing system:

  • An issue with insufficient log storage causing intermittent outages is fixed.
  • Removed unused user and group objects.

Upgrade:

  • Google Distributed Cloud Virtual for Bare Metal user cluster overrides are applied to other versions of Google Distributed Cloud Virtual for Bare Metal.
  • A lock is acquired before restarting the firewall when upgrading.
  • Override the predicates for the remote node upgrade task in the watch NodeUpgrade context.
  • The system cluster upgrade cannot start issue due to annotations not being updated is fixed.
  • A problem with the kube-state-metrics Helm chart RBAC missing permissions is fixed.
  • The problem with the org admin cluster upgrade taking a long time is fixed.
  • An issue with a missing web-tls cert is fixed.
  • The Upgrade Admin predefined role has permissions to upgrade nodes, file and block storage, and HSM.
  • A problem with the tenant org upgrade failing during artifact distribution is fixed.
  • A problem with IOs not able to skip the preflight check during an org upgrade is fixed.
  • A problem with an unhealthy tenant org after root upgrade is fixed.
  • A problem with an org upgrade stuck at VM image distribution is fixed.

User interface:

  • The Create button is disabled in the Console when a create operation is in progress for projects, clusters, and VMs.
  • An issue with buttons not displaying after navigating from another page is fixed.
  • The autofocus is fixed on the cluster creation page.
  • An error message is shown when a service instance creation or deletion fails.
  • A problem with not being able to manage alert rule groups that are based on logs in the UI is fixed.

Virtual machines:

  • You can launch a VM with a larger memory footprint up to 500GB. Previously large VMs might hit hypervisor level of out-of-memory errors which led to VM crashes.
  • A problem with removing a GPU VM node which leaves the GPU card in a lingering MIG enabled state is fixed. This issue prevented this card from being reused in a user cluster without manual reset.

Updated Canonical Ubuntu OS image version to 20230912 to apply the latest security patches and important updates. The following security vulnerabilities are fixed:


Clusters:

  • User cluster versioning (1.x.x-gdch-uc.x) is deprecated. In the user interface, PAs and AOs only see Kubernetes versions when creating or upgrading user clusters.

Database Service:

  • Existing database clusters won't be migrated forward from previous GDC versions. These database clusters will be automatically deleted after the GDC instance is upgraded to version 1.11.0.

    Before applying the release, confirm that there is no critical data that must be preserved. For data that must be preserved, preserve database clusters before an upgrade.