Sample SQL queries

This document contains sample queries over log entries that are stored in log buckets that are upgraded to use Log Analytics. On these buckets you can run SQL queries from the Log Analytics page in the Google Cloud console. For more samples, see the logging-analytics-samples and the security-analytics GitHub repositories.

This document doesn't describe SQL or how to route and store log entries. For information about those topics, see the What's next section.

Before you begin

  • To use the queries shown in this document on the Log Analytics page, replace TABLE with the name of the table that corresponds to the view that you want to query. The table name has the format project_ID.region.bucket_ID.view_ID. You can find the table name for a view on the Log Analytics page; the default query for a log view lists the table name in the FROM statement. For information about how to access the default query, see Query a log view.

  • To use the queries shown in this document on the BigQuery Studio page, replace TABLE with the path to the table in the linked dataset. For example, to query the view _AllLogs on the linked dataset mydataset that is in the project myproject, set this field to myproject.mydataset._AllLogs:

    In the navigation panel of the Google Cloud console, select BigQuery:

    Go to BigQuery

  • To open the Log Analytics page, do the following:

    1. In the navigation panel of the Google Cloud console, select Logging, and then select Log Analytics:

      Go to Log Analytics

    2. Optional: To identify the table schema for log view, in the Log views list, find the view, and then select the name of the view.

    The schema for the table is displayed. You can use the Filter field to locate specific fields. You can't modify the schema.

Filter logs

SQL queries determine which rows of the table to process, then they group the rows and perform aggregate operations. When no grouping and aggregation operation are listed, the result of the query includes the rows selected by the filter operation. The samples in this section illustrate filtering.

Filter by time

To set the time range of your query, we recommend that you use the time-range selector. This selector is used automatically when a query doesn't specify a timestamp field in the WHERE clause. For example, to view the data for the past week, select Last 7 days from the time-range selector. You can also use the time-range selector to specify a start and end time, specify a time to view around, and change time zones.

If you include a timestamp field in the WHERE clause, then the time-range selector setting isn't used. The following example filters the data by using the TIMESTAMP_SUB function, which lets you specify a look-back interval from the current time:

WHERE
  timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)

For more information about how to filter by time, see Time functions and Timestamp functions.

Filter by resource

To filter by resource, add a resource.type restriction.

For example, the following query reads the most recent hour of data, then retains those rows whose resource type matches gce_instance, and then sorts and displays up to 100 entries:

SELECT
  timestamp, log_name, severity, json_payload, resource, labels
FROM
  `TABLE`
WHERE
  resource.type = "gce_instance"
ORDER BY timestamp ASC
LIMIT 100

Filter by severity

You can filter by a specific severity with a restriction like severity = 'ERROR'. Another option is to use the IN statement and specify a set of valid values.

For example, the following query reads the most recent hour of data, and then retains only those rows that contain a severity field whose value is either 'INFO' or 'ERROR':

SELECT
  timestamp, log_name, severity, json_payload, resource, labels
FROM
  `TABLE`
WHERE
  severity IS NOT NULL AND
  severity IN ('INFO', 'ERROR')
ORDER BY timestamp ASC
LIMIT 100

The previous query filters by the value of the severity field. However, you can also write queries that filter by the numeric value of the log severity. For example, if you replace the severity lines with the following lines, the query returns all log entries whose severity level is at least NOTICE:

  severity_number IS NOT NULL AND
  severity_number > 200

For information about the enumerated values, see LogSeverity.

Filter by log name

To filter by a log name, you can add a restriction on the value of the log_name or the log_id field. The log_name field includes the resource path. That is, this field has values like projects/myproject/logs/mylog. The log_id field only stores the log name such as mylog.

For example, the following query reads the most recent hour of data, then retains those rows where the value in the log_id field is cloudaudit.googleapis.com/data_access, and then sorts and displays the results:

SELECT
  timestamp, log_id, severity, json_payload, resource, labels
FROM
  `TABLE`
WHERE
  log_id = "cloudaudit.googleapis.com/data_access"
ORDER BY timestamp ASC
LIMIT 100

Filter by resource label

Most monitored resource descriptors define labels that are used to identify the specific resource. For example, the descriptor for a Compute Engine instance includes labels for the zone, project ID, and instance ID. When the log entry is written, values are assigned to each field. The following is such an example:

{
   type: "gce_instance"
   labels: {
      instance_id: "1234512345123451"
      project_id: "my-project"
      zone: "us-central1-f"
   }
}

Because the data type of the labels field is JSON, including a restriction like resource.labels.zone = "us-centra1-f" in a query results in a syntax error. To get the value of a field with a data type of JSON, use the function JSON_VALUE.

For example, the following query reads the most recent data and then retains those rows where the resource is a Compute Engine instance that is located in the us-central1-f zone:

SELECT
  timestamp, log_name, severity, JSON_VALUE(resource.labels.zone) AS zone, json_payload, resource, labels
FROM
  `TABLE`
WHERE
  resource.type = "gce_instance" AND
  JSON_VALUE(resource.labels.zone) = "us-central1-f"
ORDER BY timestamp ASC
LIMIT 100

For information about all functions that can retrieve and transform JSON data, see JSON functions.

Filter by HTTP request

To filter the table to only include rows that correspond to a HTTP request or reply, add a http_request IS NOT NULL restriction:

SELECT
  timestamp, log_name, severity, http_request, resource, labels
FROM
  `TABLE`
WHERE
  http_request IS NOT NULL
ORDER BY timestamp
LIMIT 100

The following query only includes rows that correspond to GET or POST requests:

SELECT
  timestamp, log_name, severity, http_request, resource, labels
FROM
  `TABLE`
WHERE
  http_request IS NOT NULL AND
  http_request.request_method IN ('GET', 'POST')
ORDER BY timestamp ASC
LIMIT 100

Filter by HTTP status

To filter by HTTP status, modify the WHERE clause to require the http_request.status field be defined:

SELECT
  timestamp, log_name, http_request.status, http_request, resource, labels
FROM
  `TABLE`
WHERE
  http_request IS NOT NULL AND
  http_request.status IS NOT NULL
ORDER BY timestamp ASC
LIMIT 100

To determine the type of data stored in a field, view the schema or display the field. The results of the previous query show that the http_request.status field stores integer values.

Filter by a field with a JSON type

To extract a value from a column whose data type is JSON, use the function JSON_VALUE.

Consider the following queries:

SELECT
  json_payload
FROM
  `TABLE`
WHERE
  json_payload.status IS NOT NULL

and

SELECT
  json_payload
FROM
  `TABLE`
WHERE
  JSON_VALUE(json_payload.status) IS NOT NULL

The previous queries test the value of the column json_payload; the content of this column is determined by the contents of a log entry. Both queries discard rows that don't contain a column labeled json_payload. The difference between these two queries is the final line, which defines what is tested against NULL. Now, consider a table that has two rows. In one row, the json_payload column has the following form:

{
    status: {
        measureTime: "1661517845"
    }
}

In the other row, the json_payload column has a different structure:

{
    @type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
    jobName: "projects/my-project/locations/us-central1/jobs/test1"
    relativeUrl: "/food=cake"
    status: "NOT_FOUND"
    targetType: "APP_ENGINE_HTTP"
}

Both of the previous rows satisfy the restriction json_payload.status IS NOT NULL. That is, the result of the query includes both rows. However, when the restriction is JSON_VALUE(json_payload.status) IS NOT NULL, only the second row is included in the result.

Filter by regular expression

To return the substring that matches a regular expression, use the function REGEXP_EXTRACT. The return type of this function is either a STRING or BYTES.

The following query display the most recent log entries received, retains those entries with a json_payload.jobName field, and then displays the portion of the name that starts with test:

SELECT
  timestamp, REGEXP_EXTRACT(JSON_VALUE(json_payload.jobName), r".*(test.*)$") AS name,
FROM
  `TABLE`
WHERE
  json_payload.jobName IS NOT NULL
ORDER BY timestamp DESC
LIMIT 20

For additional examples, see the REGEXP_EXTRACT documentation. For examples of other regular expressions that you can use, see Functions, operators, and conditionals.

The query shown in this example isn't efficient. For a substring match, like the one illustrated, use the CONTAINS_SUBSTR function.

Group and aggregate log entries

This section builds upon the previous samples and illustrates how you can group and aggregate table rows. If you don't specify a grouping but do specify an aggregation, a single result is printed because SQL treats all rows that satisfy the WHERE clause as a single group.

Every SELECT expression must be included in the group fields or be aggregated.

Group by time

To group data by time, use the function TIMESTAMP_TRUNC, which truncates a timestamp to a specified granularity like MINUTE. For example a timestamp of 15:30:11, which is formatted as hours:minutes:seconds, becomes 15:30:00 when the granularity is set to MINUTE.

The following query reads the data received in the interval specified by the time-range picker and then retains those rows where the value of the json_payload.status field isn't NULL. The query truncates the timestamp on each row by hour, and then groups the rows by the truncated timestamp and status:

SELECT
  TIMESTAMP_TRUNC(timestamp, HOUR) AS hour,
  JSON_VALUE(json_payload.status) AS status,
  COUNT(*) AS count
FROM
  `TABLE`
WHERE
  json_payload IS NOT NULL AND
  JSON_VALUE(json_payload.status) IS NOT NULL
GROUP BY hour,status
ORDER BY hour ASC

For additional samples, see the TIMESTAMP_TRUNC documentation. For information about other time-based functions, see Datetime functions.

Group by resource

The following query reads the most recent hour of data, and then groups the rows by the resource type. It then counts the number of rows for each type, and returns a table with two columns. The first column lists the resource type while the second column is the number of rows for that resource type:

SELECT
   resource.type, COUNT(*) AS count
FROM
  `TABLE`
GROUP BY resource.type
LIMIT 100

Group by severity

The following query reads the most recent hour of data and then retains rows that have a severity field. The query then groups the rows by severity and counts the number of rows for each group:

SELECT
  severity, COUNT(*) AS count
FROM
  `TABLE`
WHERE
  severity IS NOT NULL
GROUP BY severity
ORDER BY severity
LIMIT 100

Group by log_id

The result of the following query is a table with two columns. The first column lists the log names while the second column lists the number of log entries that were written to that log in the most recent hour. The query sorts the results by the count of entries:

SELECT
  log_id, COUNT(*) AS count
FROM
  `TABLE`
GROUP BY log_id
ORDER BY count DESC
LIMIT 100

Compute average latency for HTTP request

The following query illustrates grouping by multiple columns, and computing an average value. The query groups rows by the URL contained in the HTTP request and by the value of the labels.checker_location field. After grouping the rows, the query computes the average latency for each group:

SELECT
  JSON_VALUE(labels.checker_location) AS location,
  AVG(http_request.latency.seconds) AS secs, http_request.request_url
FROM
  `TABLE`
WHERE
  http_request IS NOT NULL AND
  http_request.request_method IN ('GET')
GROUP BY http_request.request_url, location
ORDER BY location
LIMIT 100

In the previous expression, JSON_VALUE is required to extract the value of the labels.checker_location field because the data type for labels is JSON. However, you don't use this function to extract the value from the http_request.latency.seconds field. The latter field has a data type of integer.

Compute average bytes sent for a subnetwork test

The following query illustrates how you might display the average number of bytes sent by location.

The query reads the most recent hour of data and then retains only those rows whose resource type column is gce_subnetwork and whose json_payload column isn't NULL. Next, the query groups the rows by the location of the resource. Unlike the previous example where the data is stored as a numeric value, the value of the bytes_sent field is a string and therefore you must convert the value to a FLOAT64 before computing the average:

SELECT JSON_VALUE(resource.labels.location) AS location,
   AVG(CAST(JSON_VALUE(json_payload.bytes_sent) AS FLOAT64)) AS bytes
FROM
  `TABLE`
WHERE
  resource.type = "gce_subnetwork" AND
  json_payload IS NOT NULL
GROUP BY location
LIMIT 100

The result of the previous query is a table where each row lists a location and the average bytes sent for that location.

For information about all functions that can retrieve and transform JSON data, see JSON functions.

For information about CAST and other conversion functions, see Conversion functions.

Count the log entries with a field that match a pattern

To return the substring that matches a regular expression, use the function REGEXP_EXTRACT. The return type of this function is either a STRING or BYTES.

The following query retains the log entries for which the value of the json_payload.jobName field is not NULL. Then, it groups the entries by the name suffix that begins with test. Lastly, the query counts the numbers of entries in each group:

SELECT
  REGEXP_EXTRACT(JSON_VALUE(json_payload.jobName), r".*(test.*)$") AS name,
  COUNT(*) AS count
FROM
  `TABLE`
WHERE
  json_payload.jobName IS NOT NULL
GROUP BY name
ORDER BY count
LIMIT 20

For additional examples, see the REGEXP_EXTRACT documentation. For examples of other regular expressions that you can use, see Functions, operators, and conditionals.

This section describes two different approaches that you can use to search multiple columns of a table.

To search a table for entries that match a set of search terms, use the function SEARCH. This function requires two parameters: where to search, and the search query. Because the SEARCH function has specific rules on how the data is searched, we recommend that you read the SEARCH documentation.

The following query retains only those rows that have a field that exactly matches "35.193.12.15":

SELECT
  timestamp, log_id, proto_payload, severity, resource.type, resource, labels
FROM
  `TABLE` AS t
WHERE
  proto_payload IS NOT NULL AND
  log_id = "cloudaudit.googleapis.com/data_access" AND
  SEARCH(t,"`35.193.12.15`")
ORDER BY timestamp ASC
LIMIT 20

In the previous query, backticks wrap the value to be searched for. This ensures that the SEARCH function searches for an exact match between a field value and the value between the backticks.

When backticks are omitted in the query string, the query string is split based on rules defined in the SEARCH documentation. For example, when the following statement is run, the query string is split into four tokens: "35", "193", "12", and "15":

  SEARCH(t,"35.193.12.15")

The previous SEARCH statement matches a row when a single field matches all four tokens. The order of the tokens doesn't matter.

You can include multiple SEARCH statements in a query. For example, in the previous query, you could replace the filter on the log ID with a statement like the following:

  SEARCH(t,"`cloudaudit.googleapis.com/data_access`")

The previous statement searches the entire table while the original statement searches only the log_id column.

To perform multiple searches on a column, separate the individual strings with a space. For example, the following statement matches rows where a field contains "Hello World", "happy", and "days":

  SEARCH(t,"`Hello World` happy days")

Lastly, you can search specific columns of a table instead of searching an entire table. For example, the following statement only searches the columns named text_payload and json_payload:

   SEARCH((text_payload, json_payload) ,"`35.222.132.245`")

For information about how the parameters of the SEARCH function are processed, see the BigQuery reference page Search functions.

To perform a case-insensitive test to determine whether a value exists in an expression, use the function CONTAINS_SUBSTR. This function returns TRUE when the value exists and FALSE otherwise. The search value must be a STRING literal, but not the literal NULL.

For example, the following query fetches all Data Access audit log entries with a specific IP address whose timestamps are in a specific time range. Lastly, the query sorts the results and then displays the 20 oldest results:

SELECT
  timestamp, log_id, proto_payload, severity, resource.type, resource, labels
FROM
  `TABLE` AS t
WHERE
  proto_payload IS NOT NULL AND
  log_id = "cloudaudit.googleapis.com/data_access" AND
  CONTAINS_SUBSTR(t,"35.193.12.15")
ORDER BY timestamp ASC
LIMIT 20

The previous query performs a substring test. Therefore, a row that contains "35.193.12.152" matches the CONTAINS_SUBSTR statement.

Combine data from multiple sources

Query statements scan one or more tables or expressions and return the computed result rows. For example, you can use query statements to merge the results of SELECT statements on different tables or datasets in a variety of ways and then select the columns from the combined data.

Combine data from two tables with joins

To combine information from two tables, use the one of the join operators. The type of join and the conditional clause you use determines how rows are combined and discarded.

The following query gives you the json_payload fields from rows in two different tables written by the same trace span. The query performs an inner JOIN over two tables for rows where the values of the span_id and trace columns in both tables match. From this result, the query then selects the timestamp, severity, and json_payload fields that came from TABLE_1, the json_payload field from TABLE_2, and the values of the span_id and trace fields on which the two tables were joined, and returns up to 100 rows:

SELECT
  a.timestamp, a.severity, a.json_payload, b.json_payload, a.span_id, a.trace
FROM `TABLE_1` a
JOIN `TABLE_2` b
ON
  a.span_id = b.span_id AND
  a.trace = b.trace
LIMIT 100

Combine multiple selections with unions

To combine the results of two or more SELECT statements and discard duplicate rows, use the UNION operator. To retain duplicate rows, use the UNION ALL operator.

The following query reads the most recent hour of data from TABLE_1, merges the result with the most recent hour of data from TABLE_2, sorts the merged data by increasing timestamp, and then displays the oldest 100 entries:

SELECT
  timestamp, log_name, severity, json_payload, resource, labels
FROM(
  SELECT * FROM `TABLE_1`
  UNION ALL
  SELECT * FROM `TABLE_2`
)
ORDER BY timestamp ASC
LIMIT 100

What's next

For information about how to route and store log entries, see the following documents:

For SQL reference documentation, see the following documents: