This document contains sample queries over log entries that are stored in
log buckets that are upgraded to use Log Analytics.
On these buckets you can run SQL queries from the
Log Analytics page in the Google Cloud console. For more samples, see the
logging-analytics-samples
and the
security-analytics
GitHub repositories.
This document doesn't describe SQL or how to route and store log entries. For information about those topics, see the What's next section.
Before you begin
To use the queries shown in this document on the Log Analytics page, replace TABLE with the name of the table that corresponds to the view that you want to query. The table name has the format
project_ID.region.bucket_ID.view_ID
. You can find the table name for a view on the Log Analytics page; the default query for a log view lists the table name in theFROM
statement. For information about how to access the default query, see Query a log view.To use the queries shown in this document on the BigQuery SQL workspace* page, replace TABLE with the path to the table in the linked dataset. For example, to query the view
_AllLogs
on the linked datasetmydataset
that is in the projectmyproject
, set this field tomyproject.mydataset._AllLogs
:
Filter logs
SQL queries determine which rows of the table to process, then they group the rows and perform aggregate operations. When no grouping and aggregation operation are listed, the result of the query includes the rows selected by the filter operation. The samples in this section illustrate filtering.
Filter by time
One option to filter by time is to use the TIMESTAMP_SUB
function. This
function lets you specify a look-back interval from the current time.
For example, the following query reads the most recent hour of data, sorts the data by increasing timestamp, and then displays the oldest 100 entries:
SELECT
timestamp, log_name, severity, json_payload, resource, labels
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
ORDER BY timestamp ASC
LIMIT 100
Another option is to use the timestamp
function. The following query
reads all logs that are received in a closed time range,
and then displays the oldest 100 entries:
SELECT
timestamp, log_name, severity, json_payload, resource, labels
FROM
`TABLE`
WHERE
timestamp >= TIMESTAMP("2022-08-25 13:00:00", "America/New_York") AND
timestamp <= TIMESTAMP("2022-08-25 17:00:00", "America/New_York")
ORDER BY timestamp ASC
LIMIT 100
For more information about how to filter by time, see Time functions and Timestamp functions.
Filter by resource
To filter by resource, add a resource.type
restriction.
For example, the following query reads the most recent hour of data, then
retains those rows whose resource type matches gce_instance
, and then sorts
and displays up to 100 entries:
SELECT
timestamp, log_name, severity, json_payload, resource, labels
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
resource.type = "gce_instance"
ORDER BY timestamp ASC
LIMIT 100
Filter by severity
You can filter by a specific severity with a restriction like
severity = 'ERROR'
. Another option is to use the IN
statement
and specify a set of valid values.
For example, the following query reads the most recent hour of data, and
then retains only those rows that contain a severity
field whose value is
either 'INFO'
or 'ERROR'
:
SELECT
timestamp, log_name, severity, json_payload, resource, labels
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
severity IS NOT NULL AND
severity IN ('INFO', 'ERROR')
ORDER BY timestamp ASC
LIMIT 100
The previous query filters by the value of the severity
field. However,
you can also write queries that filter by the numeric value of the log severity.
For example, if you replace the severity
lines with the following lines,
the query returns all log entries whose severity level is at least NOTICE
:
severity_number IS NOT NULL AND
severity_number > 200
For information about the enumerated values, see
LogSeverity
.
Filter by log name
To filter by a log name, you can add a restriction on the value of the
log_name
or the log_id
field. The log_name
field includes the resource
path. That is, this field has values like projects/myproject/logs/mylog
.
The log_id
field only stores the log name such as mylog
.
For example, the following query reads the most recent hour of data, then
retains those rows where the value in the log_name
field ends with
data_access
, and then sorts and displays the results:
SELECT
timestamp, log_name, severity, json_payload, resource, labels
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
log_name LIKE "%data_access"
ORDER BY timestamp ASC
LIMIT 100
The previous example specifies a partial name and uses the
LIKE
statement and the wildcard %
.
Filter by resource label
Most monitored resource descriptors define labels that are used to identify the specific resource. For example, the descriptor for a Compute Engine instance includes labels for the zone, project ID, and instance ID. When the log entry is written, values are assigned to each field. The following is such an example:
{
type: "gce_instance"
labels: {
instance_id: "1234512345123451"
project_id: "my-project"
zone: "us-central1-f"
}
}
Because the data type of the labels
field is JSON, including a restriction
like resource.labels.zone = "us-centra1-f"
in a query results in a syntax
error. To get the value of a field with a data type of JSON, use the function
JSON_VALUE
.
For example, the following query reads the most recent data and then retains
those rows where the resource is a Compute Engine instance that is
located in the us-central1-f
zone:
SELECT
timestamp, log_name, severity, JSON_VALUE(resource.labels.zone) AS zone, json_payload, resource, labels
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
resource.type = "gce_instance" AND
JSON_VALUE(resource.labels.zone) = "us-central1-f"
ORDER BY timestamp ASC
LIMIT 100
For information about all functions that can retrieve and transform JSON data, see JSON functions.
Filter by HTTP request
To filter the table to only include rows that correspond to a HTTP request
or reply, add a http_request IS NOT NULL
restriction:
SELECT
timestamp, log_name, severity, http_request, resource, labels
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
http_request IS NOT NULL
ORDER BY timestamp
LIMIT 100
The following query only includes rows that correspond to GET
or POST
requests:
SELECT
timestamp, log_name, severity, http_request, resource, labels
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
http_request IS NOT NULL AND
http_request.request_method IN ('GET', 'POST')
ORDER BY timestamp ASC
LIMIT 100
Filter by HTTP status
To filter by HTTP status, modify the WHERE
clause to require the
http_request.status
field be defined:
SELECT
timestamp, log_name, http_request.status, http_request, resource, labels
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
http_request IS NOT NULL AND
http_request.status IS NOT NULL
ORDER BY timestamp ASC
LIMIT 100
To determine the type of data stored in a field, view the schema or display
the field. The results of the previous query show that the
http_request.status
field stores integer values.
Filter by a field with a JSON type
To extract a value from a column whose data type is JSON, use the function
JSON_VALUE
.
Consider the following queries:
SELECT
json_payload
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
json_payload.status IS NOT NULL
and
SELECT
json_payload
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
JSON_VALUE(json_payload.status) IS NOT NULL
The previous queries test the value of the column json_payload
; the content
of this column is determined by the contents of a log entry. Both queries
discard rows that don't contain a column labeled json_payload
.
The difference between these two queries is the final line, which defines
what is tested against NULL
. Now, consider a table that has two rows. In one
row, the json_payload
column has the following form:
{
status: {
measureTime: "1661517845"
}
}
In the other row, the json_payload
column has a different structure:
{
@type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
jobName: "projects/my-project/locations/us-central1/jobs/test1"
relativeUrl: "/food=cake"
status: "NOT_FOUND"
targetType: "APP_ENGINE_HTTP"
}
Both of the previous rows satisfy the restriction
json_payload.status IS NOT NULL
.
That is, the result of the query includes both rows.
However, when the restriction is JSON_VALUE(json_payload.status) IS NOT NULL
,
only the second row is included in the result.
Filter by regular expression
To return the substring that matches a regular expression, use the function
REGEXP_EXTRACT
. The return type of this function is
either a STRING
or BYTES
.
The following query display the most recent log entries received, retains
those entries with a json_payload.jobName
field, and then displays the
portion of the name that starts with test
:
SELECT
timestamp, REGEXP_EXTRACT(JSON_VALUE(json_payload.jobName), r".*(test.*)$") AS name,
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
json_payload.jobName IS NOT NULL
ORDER BY timestamp DESC
LIMIT 20
For additional examples, see the
REGEXP_EXTRACT
documentation.
For examples of other regular expressions that
you can use, see Functions, operators, and conditionals.
The query shown in this example isn't efficient. For a substring match, like
the one illustrated, use the CONTAINS_SUBSTR
function.
Group and aggregate log entries
This section builds upon the previous samples and illustrates how you can
group and aggregate table rows. If you don't specify a grouping but do
specify an aggregation, a single result is printed because SQL treats all
rows that satisfy the WHERE
clause as a single group.
Every SELECT
expression must be included in the group fields or be aggregated.
Group by time
To group data by time, use the function TIMESTAMP_TRUNC
,
which truncates a timestamp to a specified granularity like MINUTE
. For
example a timestamp of 15:30:11
, which is formatted as
hours:minutes:seconds
, becomes 15:30:00
when the granularity is set to
MINUTE
.
The following query reads the most recent 10 hours of data and then retains
those rows where the value of the json.payload.status
field isn't NULL.
The query truncates the timestamp on each row by hour, and then groups the
rows by the truncated timestamp and status:
SELECT
TIMESTAMP_TRUNC(timestamp, HOUR) AS hour,
JSON_VALUE(json_payload.status) AS status,
COUNT(*) AS count
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 10 HOUR) AND
json_payload IS NOT NULL AND
JSON_VALUE(json_payload.status) IS NOT NULL
GROUP BY hour,status
ORDER BY hour ASC
For additional samples, see the
TIMESTAMP_TRUNC
documentation.
For information about other time-based functions, see
Datetime functions.
Group by resource
The following query reads the most recent hour of data, and then groups the rows by the resource type. It then counts the number of rows for each type, and returns a table with two columns. The first column lists the resource type while the second column is the number of rows for that resource type:
SELECT
resource.type, COUNT(*) AS count
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
GROUP BY resource.type
LIMIT 100
Group by severity
The following query reads the most recent hour of data and then retains rows that have a severity field. The query then groups the rows by severity and counts the number of rows for each group:
SELECT
severity, COUNT(*) AS count
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
severity IS NOT NULL
GROUP BY severity
ORDER BY severity
LIMIT 100
Group by log_id
The result of the following query is a table with two columns. The first column lists the log names while the second column lists the number of log entries that were written to that log in the most recent hour. The query sorts the results by the count of entries:
SELECT
log_id, COUNT(*) AS count
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
GROUP BY log_id
ORDER BY count DESC
LIMIT 100
Compute average latency for HTTP request
The following query illustrates grouping by multiple columns, and computing
an average value. The query groups rows by the URL contained in the HTTP
request and by the value of the labels.checker_location
field. After
grouping the rows, the query computes the average latency for each group:
SELECT
JSON_VALUE(labels.checker_location) AS location,
AVG(http_request.latency.seconds) AS secs, http_request.request_url
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
http_request IS NOT NULL AND
http_request.request_method IN ('GET')
GROUP BY http_request.request_url, location
ORDER BY location
LIMIT 100
In the previous expression, JSON_VALUE
is required to extract the value
of the labels.checker_location
field because the data type for
labels
is JSON.
However, you don't use this function to extract the value from the
http_request.latency.seconds
field. The latter field has a data type of
integer.
Compute average bytes sent for a subnetwork test
The following query illustrates how you might display the average number of bytes sent by location.
The query reads the most recent hour of data and then retains only those rows
whose resource type column is gce_subnetwork
and whose json_payload
column isn't NULL. Next, the query groups the rows by the location of the
resource. Unlike the previous example where the data is stored as a numeric
value, the value of the bytes_sent
field is a string and therefore you must
convert the value to a FLOAT64
before computing the average:
SELECT JSON_VALUE(resource.labels.location) AS location,
AVG(CAST(JSON_VALUE(json_payload.bytes_sent) AS FLOAT64)) AS bytes
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
resource.type = "gce_subnetwork" AND
json_payload IS NOT NULL
GROUP BY location
LIMIT 100
The result of the previous query is a table where each row lists a location and the average bytes sent for that location.
For information about all functions that can retrieve and transform JSON data, see JSON functions.
For information about CAST
and other conversion functions, see
Conversion functions.
Count the log entries with a field that match a pattern
To return the substring that matches a regular expression, use the function
REGEXP_EXTRACT
. The return type of this function is
either a STRING
or BYTES
.
The following query retains the log entries for which the value
of the json_payload.jobName
field is not NULL.
Then, it groups the entries by the name suffix that begins
with test
. Lastly, the query counts the numbers of entries in each group:
SELECT
REGEXP_EXTRACT(JSON_VALUE(json_payload.jobName), r".*(test.*)$") AS name,
COUNT(*) AS count
FROM
`TABLE`
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
json_payload.jobName IS NOT NULL
GROUP BY name
ORDER BY count
LIMIT 20
For additional examples, see the
REGEXP_EXTRACT
documentation.
For examples of other regular expressions that
you can use, see Functions, operators, and conditionals.
Cross-column search
This section describes two different approaches that you can use to search multiple columns of a table.
Token-based search
To search a table for entries that match a set of search terms,
use the function SEARCH
. This function requires two parameters:
where to search, and the search query.
Because the SEARCH
function has specific rules on how the data is searched,
we recommend that you read the SEARCH
documentation.
The following query retains only those rows that have a field that exactly matches "35.193.12.15":
SELECT
timestamp, log_name, proto_payload, severity, resource.type, resource, labels
FROM
`TABLE` AS t
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
proto_payload IS NOT NULL AND
log_name LIKE "%cloudaudit.googleapis.com%" AND
SEARCH(t,"`35.193.12.15`")
ORDER BY timestamp ASC
LIMIT 20
In the previous query, backticks wrap the value to be searched for. This
ensures that the SEARCH
function searches for an exact match between
a field value and the value between the backticks.
When backticks are omitted in the query string, the query string is split
based on rules defined in the SEARCH
documentation.
For example, when the following statement is run,
the query string is split into four tokens: "35", "193", "12", and "15":
SEARCH(t,"35.193.12.15")
The previous SEARCH
statement matches a row when a single field
matches all four tokens. The order of the tokens doesn't matter.
You can include multiple SEARCH
statements in a query. For example, in the
previous query, you could replace the filter on the log name with a
statement like the following:
SEARCH(t,"`cloudaudit.googleapis.com`")
The previous statement searches the entire table while the original statement
searches only the log_name
column.
To perform multiple searches on a column, separate the individual strings with a space. For example, the following statement matches rows where a field contains "Hello World", "happy", and "days":
SEARCH(t,"`Hello World` happy days")
Lastly, you can search specific columns of a table instead of searching an
entire table. For example, the following statement only searches
the columns named text_payload
and json_payload
:
SEARCH((text_payload, json_payload) ,"`35.222.132.245`")
For information about how the parameters of the SEARCH
function are processed,
see the BigQuery reference page Search functions.
Substring search
To perform a case-insensitive test to determine whether a value exists in an
expression, use the function CONTAINS_SUBSTR
.
This function returns TRUE
when the value exists and
FALSE
otherwise. The search value must be a STRING
literal, but not the
literal NULL
.
For example, the following query fetches all audit log entries with a specific IP address whose timestamps are in a specific time range. Lastly, the query sorts the results and then displays the 20 oldest results:
SELECT
timestamp, log_name, proto_payload, severity, resource.type, resource, labels
FROM
`TABLE` AS t
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) AND
proto_payload IS NOT NULL AND
log_name LIKE "%cloudaudit.googleapis.com%" AND
CONTAINS_SUBSTR(t,"35.193.12.15")
ORDER BY timestamp ASC
LIMIT 20
The previous query performs a substring test. Therefore, a row that contains
"35.193.12.152" matches the CONTAINS_SUBSTR
statement.
What's next
For information about how to route and store log entries, see the following documents:
- Create a log bucket
- Upgrade a bucket to use Log Analytics
- Link a log bucket to a BigQuery dataset
- Configure and manage sinks
For SQL reference documentation, see the following documents: