Databases

Cloud SQL for PostgreSQL data cache under the hood

May 8, 2024

Virender Singla

Database Engineer

Data is the lifeblood of the organization. Being able to make quick, accurate and actionable decisions based on authoritative data enables enterprises to offer differentiated services and improve customer satisfaction. The rise of generative AI has only further amplified this.

It’s therefore important that your database provides near-real-time performance when interacting with operational data. For PostgreSQL databases, we offer Cloud SQL for PostgreSQL Enterprise Plus edition, which offers improved performance out of the box, improved data protection (35 days of PITR) and improved availability (99.99% SLA and near zero downtime maintenance.)

Cloud SQL for PostgreSQL Enterprise Plus edition also includes an innovative data cache feature, which significantly improves read performance. The data cache is a read cache that uses a server-side SSD to cache data. Because it is co-located with compute in the server, data accesses have low latencies and high throughput. Workloads that are typically limited by read throughput and latency will therefore see significant benefits when using the data cache.

In this blog post, we will explore how the data cache works, its internal mechanisms, and the types of workloads that will benefit the most from it.

How does the data cache work?

Cloud SQL uses high-performance, low-latency local solid-state drives as a caching layer for data storage disks. Think of the data cache as an extension to the shared buffers in PostgreSQL. This means queries avoid unnecessary network hops and are not limited by the underlying storage. The data cache is bigger than the physical memory in the instance, which means that more of the working set fits inside the server. This helps queries to achieve a much better cache-hit ratio.

The table below gives the sense of size of different storage layers for a 32 vCPU instance.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_0jIJ77R.max-1600x1600.jpg

The data cache size is fixed for an instance and is a function of the number of vCPUs configured. The table below summarizes the data cache size for different vCPU configurations:

Number of vCPU	Memory (GB)	Data Cache Size (GB)
2	16	375
4	32	375
8	64	375
16	128	750
32	256	1500
48	384	3000
64	512	6000
80	640	6000
96	768	6000
128	864	9000

What workloads benefit from the data cache?

Workloads that benefit from the data cache feature are read workloads where the total dataset does not fit entirely into memory. This includes but is not limited to:

Workloads that are sensitive to read latencies (for example, key-value lookups)
Workloads that are sensitive to read throughput (for example, table scans)
Gen AI workloads that use vectors for similarity searches

Enabling the data cache

In the Cloud SQL console, go to SQL → Create Instance → Choose PostgreSQL and then choose Enable data cache. The data cache can also be enabled via gcloud and Terraform.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_brtZh8u.max-1100x1100.png

Monitoring

Cloud SQL for PostgreSQL also includes four new metrics for data cache observability:

Data cache quota: Maximum data cache size
Data cache used: Data cache used
PostgreSQL data cache hit count: Total number of data cache hits
PostgreSQL data cache miss count: Total number of data cache misses

Turbocharge your read-heavy workloads

In this blog we discussed how the data cache improves read performance. We also discussed how to easily monitor data cache operations. To learn more about Cloud SQL for PostgreSQL data cache, or to get started, check out the documentation.

Posted in