About database observability

Database observability is a measure of how accurately you can infer the internal state of a database system based on the data, or telemetry, that it generates in logs, metrics, and traces.

Diagnosing and troubleshooting issues in an application can be particularly difficult and time-consuming when a database is involved. Telemetry collection is crucially important. Telemetry, when enriched with application context, can make database instances more understandable, observable, and easier to maintain. You can identify issues and problematic trends easily and remedy them early, without having to incur costly downtime. Moreover, using such data, you can configure newer database instances to collect the right kind of data from the moment they start. In this way, you can use data effectively and proactively to prevent issues and focus on strategic innovation. This is particularly useful in the DevOps model, where database generalists need to independently analyze telemetry to monitor, evaluate, and optimize the performance and health of their rapidly evolving applications.

Google Cloud offers several powerful features spanning the four iterative observability stages to help you maintain the health of your Cloud SQL database.

The iterative stages of implementing observability

Automated telemetry collection

To achieve observability goals, we start by collecting telemetry, preferably through an automated process. When collected over a period of time, telemetry helps establish a baseline for metrics under different load conditions.

Google Cloud services automatically generate observability data, including metrics, logs, and traces, which can help provide a complete observability overview. Cloud Monitoring collects measurements of your service and of the Google Cloud resources that you use. Cloud SQL uses built-in memory custom agents to collect query telemetry, resulting in low performance impact and eliminating the need for agent maintenance or security overhead. Cloud Logging collects logging data from common application components. For Cloud SQL, see also View instance logs.

Cloud Trace collects latency data and executed query plans from applications to help you track how requests propagate through your application. You can compare these latency distributions over time or across versions. Cloud Trace alerts you when it detects a significant shift in the latency profile of your application, if it's instrumented to use Cloud Trace.

DevOps developers, being increasingly responsible for the entire application stack, need to be able to monitor their databases through the lens of an application. To this end, you can use sqlcommenter, an OpenTelemetry library for databases. Sqlcommenter automatically instruments ORMs to augment SQL statements with tags and allows OpenTelemetry trace context information to be propagated to the database. With tags and trace application context in databases, it is easy to correlate application code with database performance. This simplifies your ability to troubleshoot modern microservices-based architectures, which redefine an application as an interconnected mesh of services.

Database monitoring

Proper monitoring helps you determine whether your application is working optimally. Implement monitoring early, such as before you initiate a migration or deploy a new application to a production environment. Disambiguate between application issues and underlying cloud issues.

The Cloud SQL Overview page shows graphs for some of the key metrics.

Cloud SQL also helps you compare metrics for selected instances.

You can use Cloud Monitoring to create custom dashboards to monitor metrics and to set up alert policies so that you can receive timely notifications.

Database tuning

You can iteratively troubleshoot and tune your database.

Cloud SQL recommenders help you analyze the current usage of your database and provide recommendations and insights based on heuristic methods and machine learning.

Cloud SQL recommenders are briefly described as follows:

Name Description
Out-of-disk recommender Reduce the risk of downtime that might be caused by your Cloud SQL instances running out of disk space.
Idle instance recommender Reduce costs by shutting down Cloud SQL instances that are inadvertently idle.
Overprovisioned instance recommender Reduce costs by resizing Cloud SQL instances that are unnecessarily large for a given workload.

