This page describes how to maintain the freshness of data in the columnar engine's column store in AlloyDB for PostgreSQL.
As table rows are added, deleted, or updated, the columnar engine doesn't immediately update data in the column store. Instead, it marks affected content in the column store as invalid, and queries are planned and executed using both column-oriented data from the column store and row-oriented data from the row cache and database storage.
Over time, as more and more data blocks become invalid, query performance improvements provided by the columnar engine are reduced to the point where the columnar data needs to be refreshed to regain query performance improvements. When this refresh threshold is reached, the columnar engine uses a background job to refresh the invalid data in the column store. By default, the columnar engine uses a refresh threshold of 50, indicating that given content in the column store gets marked for refresh when 50% of its content has become invalid. You can change this refresh threshold to adjust the overhead of the automatic refresh and potentially gain better ongoing query performance for your application workload.
Additionally, you can manually refresh tables and ScaNN indexes that are loaded in the columnar engine at any time. Usually, you perform a manual refresh when you want changes to a table to be reflected in the column store as soon as possible after making the changes.
Check invalid data percentage
As your application adds, deletes, or updates rows in a table, the columnar engine marks the affected data in the column store as invalid. You can check the percentage of invalid data for your tables and materialized views by querying the g_columnar_relations
view. This helps you decide if you need to perform a manual refresh or adjust the automatic refresh threshold.
To view the invalidation percentage for all relations in the columnar store, execute the following query:
SELECT relation_name, invalid_percentage FROM g_columnar_relations;
To check the invalidation percentage for a specific table, add a WHERE
clause to the query.
SELECT relation_name, CASE WHEN total_block_count > 0 THEN (invalid_block_count * 100.0 / total_block_count) ELSE 0.0 END AS invalid_block_percentage FROM g_columnar_relations; WHERE relation_name = TABLE_NAME
Replace TABLE_NAME
with a string containing the name of the table or the materialized view. If the resource is in a schema other
than public
, specify the name of the schema using the format
SCHEMA_NAME.TABLE_NAME
—for example, myschema.mytable
.
To check the invalidation percentage for a specific index, add a WHERE
clause to the query.
SELECT index_name, CASE WHEN total_block_count > 0 THEN (invalid_block_count * 100.0 / total_block_count) ELSE 0.0 END AS invalid_block_percentage FROM g_columnar_indexes; WHERE index_name = INDEX_NAME
Change the refresh threshold of the columnar engine
To change the refresh threshold of the columnar engine, you must modify the value of the google_columnar_engine.refresh_threshold_percentage
database flag for your AlloyDB instance.
Console
- In the Google Cloud console, go to the Clusters page.
- Click the name of the cluster that contains the instance you want to configure.
- In the Instances in your cluster section, find the instance and click Edit.
- In the Flags section, either add a new flag or modify the existing
google_columnar_engine.refresh_threshold_percentage
flag. - Set the value to an integer between
1
and100
. This value represents the percentage of invalid data that triggers a refresh. - Click Done.
gcloud
To use the gcloud CLI, you can install and initialize the Google Cloud CLI, or you can use Cloud Shell.
To change the refresh threshold of the columnar engine, run the following command:gcloud alloydb instances update INSTANCE_ID \ --database-flags=google_columnar_engine.refresh_threshold_percentage=THRESHOLD \ --cluster=CLUSTER_ID \ --region=REGION_ID \ --project=PROJECT_ID
Replace the following:
- INSTANCE_ID: the ID of the instance.
- THRESHOLD: an integer value between
1
and100
. This value specifies the percentage of data blocks in content that must become invalid for the content to be marked for refresh. - CLUSTER_ID: the ID of the cluster where the instance is located.
- REGION_ID: the region of the cluster.
- PROJECT_ID: the ID of the project.
Manually refresh tables and ScaNN indexes in the column store
You can manually refresh tables and ScaNN indexes in the columnar engine at any time using dedicated SQL functions.
To refresh a table's data in the columnar engine, execute the google_columnar_engine_refresh
function.
SELECT google_columnar_engine_refresh(TABLE_NAME);
Replace TABLE_NAME
with a string containing the name of the table or the materialized view. If the resource is in a schema other
than public
, specify the name of the schema using the format
SCHEMA_NAME.TABLE_NAME
—for example, myschema.mytable
.
Similarly, to manually refresh a ScaNN index in the columnar engine, execute the google_columnar_engine_refresh_index
function.
SELECT google_columnar_engine_refresh_index(index => INDEX_NAME);
Replace INDEX_NAME
with a string containing the name of the ScaNN index to refresh.