快速入门:使用客户端库

本页面介绍了如何以您偏好的编程语言开始使用 BigQuery API。

准备工作

bigquery BigQuery API SERVICE_ACCOUNT env_var no_billing
  1. 登录您的 Google 帐号。

    如果您还没有 Google 帐号,请注册新帐号

  2. 在 GCP Console 的项目选择器页面上,选择或创建 GCP 项目。

    转到项目选择器页面

  3. 确保您的 Google Cloud Platform 项目已启用结算功能。 了解如何确认您的项目已启用结算功能

安装客户端库

C#

要详细了解如何设置 C# 开发环境,请参阅 C# 开发环境设置指南
Install-Package Google.Cloud.BigQuery.V2 -Pre

Go

go get -u cloud.google.com/go/bigquery

Java

如需详细了解如何设置 Java 开发环境,请参阅 Java 开发环境设置指南。 如果您使用的是 Maven,请将以下代码添加到您的 pom.xml 文件中:
<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-bigquery</artifactId>
  <version>1.100.0</version>
</dependency>
如果您使用的是 Gradle,请将以下代码添加到您的依赖项中:
compile 'com.google.cloud:google-cloud-bigquery:1.100.0'
如果您使用的是 SBT,请将以下代码添加到您的依赖项中:
libraryDependencies += "com.google.cloud" % "google-cloud-bigquery" % "1.100.0"

如果您使用的是 IntelliJ 或 Eclipse,请通过以下 IDE 插件将客户端库添加到您的项目中:

上述插件还提供其他功能,例如服务帐号密钥管理。如需了解详情,请参阅各个插件相应的文档。

Node.js

要详细了解如何设置 Node.js 开发环境,请参阅 Node.js 开发环境设置指南
npm install --save @google-cloud/bigquery

PHP

composer require google/cloud-bigquery

Python

要详细了解如何设置 Python 开发环境,请参阅 Python 开发环境设置指南
pip install --upgrade google-cloud-bigquery

Ruby

如需详细了解如何设置 Ruby 开发环境,请参阅 Ruby 开发环境设置指南
gem install google-cloud-bigquery

导入库

C#

如需了解详情,请参阅 BigQuery C# API 参考文档


    using System;
    using Google.Cloud.BigQuery.V2;

Go

如需了解详情,请参阅 BigQuery Go API 参考文档

import (
    	"context"
    	"fmt"
    	"io"
    	"log"
    	"os"

    	"cloud.google.com/go/bigquery"
    	"google.golang.org/api/iterator"
    )
    

Java

如需了解详情,请参阅 BigQuery Java API 参考文档


    import com.google.cloud.bigquery.BigQuery;
    import com.google.cloud.bigquery.BigQueryOptions;
    import com.google.cloud.bigquery.FieldValueList;
    import com.google.cloud.bigquery.Job;
    import com.google.cloud.bigquery.JobId;
    import com.google.cloud.bigquery.JobInfo;
    import com.google.cloud.bigquery.QueryJobConfiguration;
    import com.google.cloud.bigquery.TableResult;
    import java.util.UUID;
    

Node.js

如需了解详情,请参阅 BigQuery Node.js API 参考文档

// Import the Google Cloud client library
    const {BigQuery} = require('@google-cloud/bigquery');

PHP

如需了解详情,请参阅 BigQuery PHP API 参考文档

use Google\Cloud\BigQuery\BigQueryClient;
    

Python

如需了解详情,请参阅 BigQuery Python API 参考文档

from google.cloud import bigquery

Ruby

如需了解详情,请参阅 BigQuery Ruby API 参考文档

require "google/cloud/bigquery"

初始化 BigQuery 客户端

C#

使用 BigQueryClient.Create() 函数创建 BigQuery 客户端。

string projectId = "YOUR-PROJECT-ID";
    var client = BigQueryClient.Create(projectId);

Go

使用 bigquery.NewClient() 函数创建 BigQuery 客户端。

ctx := context.Background()

    client, err := bigquery.NewClient(ctx, projectID)
    if err != nil {
    	log.Fatalf("bigquery.NewClient: %v", err)
    }
    defer client.Close()

Java

通过 BigQueryOptions.getDefaultInstance() 函数来使用默认身份验证选项。使用 BigQueryOptions.getService() 函数创建 BigQuery 客户端。

BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

Node.js

实例化 BigQuery 类以创建 BigQuery 客户端。

// Create a client
    const bigqueryClient = new BigQuery();

PHP

实例化 BigQueryClient 类以创建 BigQuery 客户端。

$bigQuery = new BigQueryClient([
        'projectId' => $projectId,
    ]);

Python

实例化 bigquery.Client 类以创建 BigQuery 客户端。

client = bigquery.Client()

Ruby

使用 Google::Cloud::Bigquery.new 函数创建 BigQuery 客户端。

# This uses Application Default Credentials to authenticate.
    # @see https://cloud.google.com/bigquery/docs/authentication/getting-started
    bigquery = Google::Cloud::Bigquery.new

运行查询

查询 Stack Overflow 公共数据集,找出浏览次数最多且带有 google-bigquery 标记的问题

SELECT
      CONCAT(
        'https://stackoverflow.com/questions/',
        CAST(id as STRING)) as url,
      view_count
    FROM `bigquery-public-data.stackoverflow.posts_questions`
    WHERE tags like '%google-bigquery%'
    ORDER BY view_count DESC
    LIMIT 10
    

该查询采用标准 SQL 语法;如需了解具体说明,请参阅查询参考指南。客户端库默认使用标准 SQL 语法。如需更改 SQL 方言,请参阅切换 SQL 方言

运行查询

使用经过身份验证的 BigQuery 客户端进行查询。

C#

定义查询字符串,然后使用 client.ExecuteQuery() 函数提交查询并获取结果。

string query = @"SELECT
        CONCAT(
            'https://stackoverflow.com/questions/',
            CAST(id as STRING)) as url, view_count
        FROM `bigquery-public-data.stackoverflow.posts_questions`
        WHERE tags like '%google-bigquery%'
        ORDER BY view_count DESC
        LIMIT 10";
    var result = client.ExecuteQuery(query, parameters: null);

Go

使用 bigquery.Query() 函数定义查询,然后使用 Query.Read() 函数提交查询并获取结果。

query := client.Query(
    	`SELECT
    		CONCAT(
    			'https://stackoverflow.com/questions/',
    			CAST(id as STRING)) as url,
    		view_count
    	FROM ` + "`bigquery-public-data.stackoverflow.posts_questions`" + `
    	WHERE tags like '%google-bigquery%'
    	ORDER BY view_count DESC
    	LIMIT 10;`)
    return query.Read(ctx)

Java

通过 QueryJobConfiguration 实例定义查询,并使用 BigQuery.create() 方法启动查询作业。

QueryJobConfiguration queryConfig =
        QueryJobConfiguration.newBuilder(
                "SELECT "
                    + "CONCAT('https://stackoverflow.com/questions/', CAST(id as STRING)) as url, "
                    + "view_count "
                    + "FROM `bigquery-public-data.stackoverflow.posts_questions` "
                    + "WHERE tags like '%google-bigquery%' "
                    + "ORDER BY favorite_count DESC LIMIT 10")
            // Use standard SQL syntax for queries.
            // See: https://cloud.google.com/bigquery/sql-reference/
            .setUseLegacySql(false)
            .build();

    // Create a job ID so that we can safely retry.
    JobId jobId = JobId.of(UUID.randomUUID().toString());
    Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

    // Wait for the query to complete.
    queryJob = queryJob.waitFor();

    // Check for errors
    if (queryJob == null) {
      throw new RuntimeException("Job no longer exists");
    } else if (queryJob.getStatus().getError() != null) {
      // You can also look at queryJob.getStatus().getExecutionErrors() for all
      // errors, not just the latest one.
      throw new RuntimeException(queryJob.getStatus().getError().toString());
    }

Node.js

使用 BigQuery.query() 方法启动查询。

// The SQL query to run
    const sqlQuery = `SELECT
      CONCAT(
        'https://stackoverflow.com/questions/',
        CAST(id as STRING)) as url,
      view_count
      FROM \`bigquery-public-data.stackoverflow.posts_questions\`
      WHERE tags like '%google-bigquery%'
      ORDER BY view_count DESC
      LIMIT 10`;

    const options = {
      query: sqlQuery,
      // Location must match that of the dataset(s) referenced in the query.
      location: 'US',
    };

    // Run the query
    const [rows] = await bigqueryClient.query(options);

PHP

创建查询配置,然后使用 BigQueryClient.startQuery() 方法启动查询。

$query = <<<ENDSQL
    SELECT
      CONCAT(
        'https://stackoverflow.com/questions/',
        CAST(id as STRING)) as url,
      view_count
    FROM `bigquery-public-data.stackoverflow.posts_questions`
    WHERE tags like '%google-bigquery%'
    ORDER BY view_count DESC
    LIMIT 10;
    ENDSQL;
    $queryJobConfig = $bigQuery->query($query);
    $queryResults = $bigQuery->runQuery($queryJobConfig);

Python

使用 Client.query() 方法启动查询。

query_job = client.query("""
        SELECT
          CONCAT(
            'https://stackoverflow.com/questions/',
            CAST(id as STRING)) as url,
          view_count
        FROM `bigquery-public-data.stackoverflow.posts_questions`
        WHERE tags like '%google-bigquery%'
        ORDER BY view_count DESC
        LIMIT 10""")

    results = query_job.result()  # Waits for job to complete.

Ruby

使用 Google::Cloud::Bigquery::Project.query 函数启动查询并等待结果。

sql     = "SELECT " +
              "CONCAT('https://stackoverflow.com/questions/', " +
              "       CAST(id as STRING)) as url, view_count " +
              "FROM `bigquery-public-data.stackoverflow.posts_questions` " +
              "WHERE tags like '%google-bigquery%' " +
              "ORDER BY view_count DESC LIMIT 10"
    results = bigquery.query sql

如需查看运行 BigQuery 查询的更多示例,请参阅:

显示查询结果

显示查询结果。

C#

Console.Write("\nQuery Results:\n------------\n");
    foreach (var row in result)
    {
        Console.WriteLine($"{row["url"]}: {row["view_count"]} views");
    }

Go

使用 RowIterator.Next() 函数将每一行数据加载到一个结构体指针。

type StackOverflowRow struct {
    	URL       string `bigquery:"url"`
    	ViewCount int64  `bigquery:"view_count"`
    }

    // printResults prints results from a query to the Stack Overflow public dataset.
    func printResults(w io.Writer, iter *bigquery.RowIterator) error {
    	for {
    		var row StackOverflowRow
    		err := iter.Next(&row)
    		if err == iterator.Done {
    			return nil
    		}
    		if err != nil {
    			return fmt.Errorf("error iterating through results: %v", err)
    		}

    		fmt.Fprintf(w, "url: %s views: %d\n", row.URL, row.ViewCount)
    	}
    }
    

Java

迭代 QueryResponse,以获取结果中的所有行。迭代器会自动处理分页。每个 FieldList 都会按数字索引或列名称显示列。

// Get the results.
    TableResult result = queryJob.getQueryResults();

    // Print all pages of the results.
    for (FieldValueList row : result.iterateAll()) {
      String url = row.get("url").getStringValue();
      long viewCount = row.get("view_count").getLongValue();
      System.out.printf("url: %s views: %d%n", url, viewCount);
    }

Node.js

查询结果作为行列表返回,其中每一行都是一个字典。

console.log('Query Results:');
    rows.forEach(row => {
      const url = row['url'];
      const viewCount = row['view_count'];
      console.log(`url: ${url}, ${viewCount} views`);
    });

PHP

调用 Job.queryResults() 方法以等待查询完成。查询结果中的每一行都是一个关联数组。

if ($queryResults->isComplete()) {
        $i = 0;
        $rows = $queryResults->rows();
        foreach ($rows as $row) {
            printf('--- Row %s ---' . PHP_EOL, ++$i);
            printf('url: %s, %s views' . PHP_EOL, $row['url'], $row['view_count']);
        }
        printf('Found %s row(s)' . PHP_EOL, $i);
    } else {
        throw new Exception('The query failed to complete');
    }

Python

迭代 RowIterator,以获取结果中的所有行。迭代器会自动处理分页。每个 Row 都会按数字索引、按列名称或以 Python 属性的形式显示列。

for row in results:
        print("{} : {} views".format(row.url, row.view_count))

Ruby

Google::Cloud::Bigquery::Data 类会将每行公开显示为一个哈希值。

results.each do |row|
      puts "#{row[:url]}: #{row[:view_count]} views"
    end

详细了解如何在 BigQuery 中处理数据行:

完整源代码

以下是该示例的完整源代码。

C#


    using System;
    using Google.Cloud.BigQuery.V2;

    namespace GoogleCloudSamples
    {
        public class Program
        {
            public static void Main(string[] args)
            {
                string projectId = "YOUR-PROJECT-ID";
                var client = BigQueryClient.Create(projectId);
                string query = @"SELECT
                    CONCAT(
                        'https://stackoverflow.com/questions/',
                        CAST(id as STRING)) as url, view_count
                    FROM `bigquery-public-data.stackoverflow.posts_questions`
                    WHERE tags like '%google-bigquery%'
                    ORDER BY view_count DESC
                    LIMIT 10";
                var result = client.ExecuteQuery(query, parameters: null);
                Console.Write("\nQuery Results:\n------------\n");
                foreach (var row in result)
                {
                    Console.WriteLine($"{row["url"]}: {row["view_count"]} views");
                }
            }
        }
    }

Go

import (
    	"context"
    	"fmt"
    	"io"
    	"log"
    	"os"

    	"cloud.google.com/go/bigquery"
    	"google.golang.org/api/iterator"
    )

    func main() {
    	projectID := os.Getenv("GOOGLE_CLOUD_PROJECT")
    	if projectID == "" {
    		fmt.Println("GOOGLE_CLOUD_PROJECT environment variable must be set.")
    		os.Exit(1)
    	}

    	ctx := context.Background()

    	client, err := bigquery.NewClient(ctx, projectID)
    	if err != nil {
    		log.Fatalf("bigquery.NewClient: %v", err)
    	}
    	defer client.Close()

    	rows, err := query(ctx, client)
    	if err != nil {
    		log.Fatal(err)
    	}
    	if err := printResults(os.Stdout, rows); err != nil {
    		log.Fatal(err)
    	}
    }

    // query returns a row iterator suitable for reading query results.
    func query(ctx context.Context, client *bigquery.Client) (*bigquery.RowIterator, error) {

    	query := client.Query(
    		`SELECT
    			CONCAT(
    				'https://stackoverflow.com/questions/',
    				CAST(id as STRING)) as url,
    			view_count
    		FROM ` + "`bigquery-public-data.stackoverflow.posts_questions`" + `
    		WHERE tags like '%google-bigquery%'
    		ORDER BY view_count DESC
    		LIMIT 10;`)
    	return query.Read(ctx)
    }

    type StackOverflowRow struct {
    	URL       string `bigquery:"url"`
    	ViewCount int64  `bigquery:"view_count"`
    }

    // printResults prints results from a query to the Stack Overflow public dataset.
    func printResults(w io.Writer, iter *bigquery.RowIterator) error {
    	for {
    		var row StackOverflowRow
    		err := iter.Next(&row)
    		if err == iterator.Done {
    			return nil
    		}
    		if err != nil {
    			return fmt.Errorf("error iterating through results: %v", err)
    		}

    		fmt.Fprintf(w, "url: %s views: %d\n", row.URL, row.ViewCount)
    	}
    }
    

Java


    import com.google.cloud.bigquery.BigQuery;
    import com.google.cloud.bigquery.BigQueryOptions;
    import com.google.cloud.bigquery.FieldValueList;
    import com.google.cloud.bigquery.Job;
    import com.google.cloud.bigquery.JobId;
    import com.google.cloud.bigquery.JobInfo;
    import com.google.cloud.bigquery.QueryJobConfiguration;
    import com.google.cloud.bigquery.TableResult;
    import java.util.UUID;

    public class SimpleApp {
      public static void main(String... args) throws Exception {
        BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
        QueryJobConfiguration queryConfig =
            QueryJobConfiguration.newBuilder(
                    "SELECT "
                        + "CONCAT('https://stackoverflow.com/questions/', CAST(id as STRING)) as url, "
                        + "view_count "
                        + "FROM `bigquery-public-data.stackoverflow.posts_questions` "
                        + "WHERE tags like '%google-bigquery%' "
                        + "ORDER BY favorite_count DESC LIMIT 10")
                // Use standard SQL syntax for queries.
                // See: https://cloud.google.com/bigquery/sql-reference/
                .setUseLegacySql(false)
                .build();

        // Create a job ID so that we can safely retry.
        JobId jobId = JobId.of(UUID.randomUUID().toString());
        Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

        // Wait for the query to complete.
        queryJob = queryJob.waitFor();

        // Check for errors
        if (queryJob == null) {
          throw new RuntimeException("Job no longer exists");
        } else if (queryJob.getStatus().getError() != null) {
          // You can also look at queryJob.getStatus().getExecutionErrors() for all
          // errors, not just the latest one.
          throw new RuntimeException(queryJob.getStatus().getError().toString());
        }

        // Get the results.
        TableResult result = queryJob.getQueryResults();

        // Print all pages of the results.
        for (FieldValueList row : result.iterateAll()) {
          String url = row.get("url").getStringValue();
          long viewCount = row.get("view_count").getLongValue();
          System.out.printf("url: %s views: %d%n", url, viewCount);
        }
      }
    }

Node.js

// Import the Google Cloud client library
    const {BigQuery} = require('@google-cloud/bigquery');

    async function queryStackOverflow() {
      // Queries a public Stack Overflow dataset.

      // Create a client
      const bigqueryClient = new BigQuery();

      // The SQL query to run
      const sqlQuery = `SELECT
        CONCAT(
          'https://stackoverflow.com/questions/',
          CAST(id as STRING)) as url,
        view_count
        FROM \`bigquery-public-data.stackoverflow.posts_questions\`
        WHERE tags like '%google-bigquery%'
        ORDER BY view_count DESC
        LIMIT 10`;

      const options = {
        query: sqlQuery,
        // Location must match that of the dataset(s) referenced in the query.
        location: 'US',
      };

      // Run the query
      const [rows] = await bigqueryClient.query(options);

      console.log('Query Results:');
      rows.forEach(row => {
        const url = row['url'];
        const viewCount = row['view_count'];
        console.log(`url: ${url}, ${viewCount} views`);
      });
    }
    queryStackOverflow();

PHP

require __DIR__ . '/vendor/autoload.php';

    use Google\Cloud\BigQuery\BigQueryClient;

    // get the project ID as the first argument
    if (2 != count($argv)) {
        die("Usage: php stackoverflow.php YOUR_PROJECT_ID\n");
    }

    $projectId = $argv[1];

    $bigQuery = new BigQueryClient([
        'projectId' => $projectId,
    ]);
    $query = <<<ENDSQL
    SELECT
      CONCAT(
        'https://stackoverflow.com/questions/',
        CAST(id as STRING)) as url,
      view_count
    FROM `bigquery-public-data.stackoverflow.posts_questions`
    WHERE tags like '%google-bigquery%'
    ORDER BY view_count DESC
    LIMIT 10;
    ENDSQL;
    $queryJobConfig = $bigQuery->query($query);
    $queryResults = $bigQuery->runQuery($queryJobConfig);

    if ($queryResults->isComplete()) {
        $i = 0;
        $rows = $queryResults->rows();
        foreach ($rows as $row) {
            printf('--- Row %s ---' . PHP_EOL, ++$i);
            printf('url: %s, %s views' . PHP_EOL, $row['url'], $row['view_count']);
        }
        printf('Found %s row(s)' . PHP_EOL, $i);
    } else {
        throw new Exception('The query failed to complete');
    }

Python

from google.cloud import bigquery

    def query_stackoverflow():
        client = bigquery.Client()
        query_job = client.query("""
            SELECT
              CONCAT(
                'https://stackoverflow.com/questions/',
                CAST(id as STRING)) as url,
              view_count
            FROM `bigquery-public-data.stackoverflow.posts_questions`
            WHERE tags like '%google-bigquery%'
            ORDER BY view_count DESC
            LIMIT 10""")

        results = query_job.result()  # Waits for job to complete.

        for row in results:
            print("{} : {} views".format(row.url, row.view_count))

    if __name__ == '__main__':
        query_stackoverflow()

Ruby

require "google/cloud/bigquery"

    # This uses Application Default Credentials to authenticate.
    # @see https://cloud.google.com/bigquery/docs/authentication/getting-started
    bigquery = Google::Cloud::Bigquery.new

    sql     = "SELECT " +
              "CONCAT('https://stackoverflow.com/questions/', " +
              "       CAST(id as STRING)) as url, view_count " +
              "FROM `bigquery-public-data.stackoverflow.posts_questions` " +
              "WHERE tags like '%google-bigquery%' " +
              "ORDER BY view_count DESC LIMIT 10"
    results = bigquery.query sql

    results.each do |row|
      puts "#{row[:url]}: #{row[:view_count]} views"
    end

恭喜!您已向 BigQuery 发送了第一个请求。

结果怎么样?

后续步骤

详细了解我们的 BigQuery API 客户端库