Query Amazon S3 data

This document describes how to query data stored in an Amazon Simple Storage Service (Amazon S3) BigLake table.

Before you begin

Ensure that you have a Amazon S3 BigLake table.

Required roles

To query Amazon S3 BigLake tables, ensure that the caller of the BigQuery API has the following roles:

  • BigQuery Connection User (roles/bigquery.connectionUser)
  • BigQuery Data Viewer (roles/bigquery.dataViewer)
  • BigQuery User (roles/bigquery.user)

The caller can be your account or an Amazon S3 connection service account. Depending on your permissions, you can grant these roles to yourself or ask your administrator to grant them to you. For more information about granting roles, see Viewing the grantable roles on resources.

To see the exact permissions that are required to query Amazon S3 BigLake tables, expand the Required permissions section:

Required permissions

You might also be able to get these permissions with custom roles or other predefined roles.

Query Amazon S3 BigLake tables

After creating a Amazon S3 BigLake table, you can query it using GoogleSQL syntax, the same as if it were a standard BigQuery table.

The cached query results are stored in a BigQuery temporary table. To query a temporary BigLake table, see Query a temporary BigLake table. For more information about BigQuery Omni limitations and quotas, see limitations and quotas.

When creating a reservation in a BigQuery Omni region, use the Enterprise edition. To learn how to create a reservation with an edition, see Create reservations.

Run a query on a BigLake Amazon S3 table:

SQL

To query the table:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    SELECT * FROM DATASET_NAME.TABLE_NAME;
    

    Replace the following:

    • DATASET_NAME: the dataset name that you created
    • TABLE_NAME: the name of the table that you created

    • Click Run.

For more information about how to run queries, see Running interactive queries.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.CsvOptions;
import com.google.cloud.bigquery.DatasetId;
import com.google.cloud.bigquery.ExternalTableDefinition;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardSQLTypeName;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;
import com.google.cloud.bigquery.TableResult;

// Sample to queries an external data source aws s3 using a permanent table
public class QueryExternalTableAws {

  public static void main(String[] args) throws InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String datasetName = "MY_DATASET_NAME";
    String externalTableName = "MY_EXTERNAL_TABLE_NAME";
    // Query to find states starting with 'W'
    String query =
        String.format(
            "SELECT * FROM s%.%s.%s WHERE name LIKE 'W%%'",
            projectId, datasetName, externalTableName);
    queryExternalTableAws(query);
  }

  public static void queryExternalTableAws(String query) throws InterruptedException {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableResult results = bigquery.query(QueryJobConfiguration.of(query));

      results
          .iterateAll()
          .forEach(row -> row.forEach(val -> System.out.printf("%s,", val.toString())));

      System.out.println("Query on aws external permanent table performed successfully.");
    } catch (BigQueryException e) {
      System.out.println("Query not performed \n" + e.toString());
    }
  }
}

Query a temporary table

BigQuery creates temporary tables to store query results. To retrieve query result from temporary tables, you can use the Google Cloud console or the BigQuery API.

Select one of the following options:

Console

When you query a BigLake table that references external cloud data, you can view the query results displayed in the Google Cloud console.

API

To query a BigLake table using the API, follow these steps:

  1. Create a Job object.
  2. Call the jobs.insert method to run the query asynchronously or the jobs.query method to run the query synchronously, passing in the Job object.
  3. Read rows with the jobs.getQueryResults by passing the given job reference, and the tabledata.list methods by passing the given table reference of the query result.

What's next