快速入门:使用客户端库

本页面介绍了如何以您偏好的编程语言开始使用 BigQuery API。

准备工作

  1. 登录您的 Google 帐号。

    如果您还没有 Google 帐号,请注册一个新帐号

  2. 在 Cloud Console 的项目选择器页面上,选择或创建 Cloud 项目。

    转到项目选择器页面

  3. 启用 BigQuery API。

    启用 API

  4. 设置身份验证:
    1. 在 Cloud Console 中,转到创建服务帐号密钥页面。

      转到“创建服务帐号密钥”页面
    2. 服务帐号列表中,选择新的服务帐号
    3. 服务帐号名称字段中,输入一个名称。
    4. 角色列表中,选择项目 > Owner

      注意角色字段会为您的服务帐号授予资源访问权限。您稍后可以使用 Cloud Console 查看和更改此字段。如果您开发的是正式版应用,请指定比项目 > Owner 更为精细的权限。如需了解详情,请参阅为服务帐号授予角色
    5. 点击创建。包含密钥的 JSON 文件就会下载到计算机。
  5. 将环境变量 GOOGLE_APPLICATION_CREDENTIALS 设置为包含您的服务帐号密钥的 JSON 文件的路径。 此变量仅适用于当前的 shell 会话,因此,如果您打开新的会话,请重新设置该变量。

安装客户端库

C#

如需详细了解如何设置 C# 开发环境,请参阅 C# 开发环境设置指南

Install-Package Google.Cloud.BigQuery.V2 -Pre

Go

go get -u cloud.google.com/go/bigquery

Java

如需详细了解如何设置 Java 开发环境,请参阅 Java 开发环境设置指南

如果您使用的是 Maven,请将以下代码添加到您的 pom.xml 文件中。如需详细了解 BOM,请参阅 Google Cloud Platform 库 BOM

<!--  Using libraries-bom to manage versions.
See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM -->
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>11.1.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-bigquery</artifactId>
  </dependency>

如果您使用的是 Gradle,请将以下代码添加到您的依赖项中:

compile 'com.google.cloud:google-cloud-bigquery:1.122.1'

如果您使用的是 sbt,请将以下代码添加到您的依赖项中:

libraryDependencies += "com.google.cloud" % "google-cloud-bigquery" % "1.122.1"

如果您使用的是 IntelliJ 或 Eclipse,可通过以下 IDE 插件将客户端库添加到您的项目中:

上述插件还提供其他功能,例如服务帐号密钥管理。如需了解详情,请参阅各个插件相应的文档。

Node.js

如需详细了解如何设置 Node.js 开发环境,请参阅 Node.js 开发环境设置指南

npm install --save @google-cloud/bigquery

PHP

composer require google/cloud-bigquery

Python

如需详细了解如何设置 Python 开发环境,请参阅 Python 开发环境设置指南

pip install --upgrade google-cloud-bigquery

Ruby

如需详细了解如何设置 Ruby 开发环境,请参阅 Ruby 开发环境设置指南

gem install google-cloud-bigquery

导入库

C#

如需了解详情,请参阅 BigQuery C# API 参考文档


using System;
using Google.Cloud.BigQuery.V2;

Go

如需了解详情,请参阅 BigQuery Go API 参考文档

import (
	"context"
	"fmt"
	"io"
	"log"
	"os"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/iterator"
)

Java

如需了解详情,请参阅 BigQuery Java API 参考文档


import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobId;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;
import java.util.UUID;

Node.js

如需了解详情,请参阅 BigQuery Node.js API 参考文档

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');

PHP

如需了解详情,请参阅 BigQuery PHP API 参考文档

use Google\Cloud\BigQuery\BigQueryClient;

Python

如需了解详情,请参阅 BigQuery Python API 参考文档

from google.cloud import bigquery

Ruby

如需了解详情,请参阅 BigQuery Ruby API 参考文档

require "google/cloud/bigquery"

初始化 BigQuery 客户端

C#

使用 BigQueryClient.Create() 函数创建 BigQuery 客户端。

string projectId = "YOUR-PROJECT-ID";
var client = BigQueryClient.Create(projectId);

Go

使用 bigquery.NewClient() 函数创建 BigQuery 客户端。

ctx := context.Background()

client, err := bigquery.NewClient(ctx, projectID)
if err != nil {
	log.Fatalf("bigquery.NewClient: %v", err)
}
defer client.Close()

Java

通过 BigQueryOptions.getDefaultInstance() 函数来使用默认身份验证选项。使用 BigQueryOptions.getService() 函数创建 BigQuery 客户端。

BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

Node.js

实例化 BigQuery 类以创建 BigQuery 客户端。

// Create a client
const bigqueryClient = new BigQuery();

PHP

实例化 BigQueryClient 类以创建 BigQuery 客户端。

$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);

Python

实例化 bigquery.Client 类以创建 BigQuery 客户端。

client = bigquery.Client()

Ruby

使用 Google::Cloud::Bigquery.new 函数创建 BigQuery 客户端。

# This uses Application Default Credentials to authenticate.
# @see https://cloud.google.com/bigquery/docs/authentication/getting-started
bigquery = Google::Cloud::Bigquery.new

运行查询

查询 Stack Overflow 公共数据集,找出浏览次数最多且带有 google-bigquery 标记的问题

SELECT
  CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
  view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
LIMIT 10

该查询采用标准 SQL 语法;如需了解具体说明,请参阅查询参考指南。客户端库默认使用标准 SQL 语法。如需更改 SQL 方言,请参阅切换 SQL 方言

运行查询

使用经过身份验证的 BigQuery 客户端进行查询。

C#

定义查询字符串,然后使用 client.ExecuteQuery() 函数提交查询并获取结果。

string query = @"SELECT
    CONCAT(
        'https://stackoverflow.com/questions/',
        CAST(id as STRING)) as url, view_count
    FROM `bigquery-public-data.stackoverflow.posts_questions`
    WHERE tags like '%google-bigquery%'
    ORDER BY view_count DESC
    LIMIT 10";
var result = client.ExecuteQuery(query, parameters: null);

Go

使用 bigquery.Query() 函数定义查询,然后使用 Query.Read() 函数提交查询并获取结果。

query := client.Query(
	`SELECT
		CONCAT(
			'https://stackoverflow.com/questions/',
			CAST(id as STRING)) as url,
		view_count
	FROM ` + "`bigquery-public-data.stackoverflow.posts_questions`" + `
	WHERE tags like '%google-bigquery%'
	ORDER BY view_count DESC
	LIMIT 10;`)
return query.Read(ctx)

Java

通过 QueryJobConfiguration 实例定义查询,并使用 BigQuery.create() 方法启动查询作业。

QueryJobConfiguration queryConfig =
    QueryJobConfiguration.newBuilder(
            "SELECT "
                + "CONCAT('https://stackoverflow.com/questions/', CAST(id as STRING)) as url, "
                + "view_count "
                + "FROM `bigquery-public-data.stackoverflow.posts_questions` "
                + "WHERE tags like '%google-bigquery%' "
                + "ORDER BY favorite_count DESC LIMIT 10")
        // Use standard SQL syntax for queries.
        // See: https://cloud.google.com/bigquery/sql-reference/
        .setUseLegacySql(false)
        .build();

// Create a job ID so that we can safely retry.
JobId jobId = JobId.of(UUID.randomUUID().toString());
Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

// Wait for the query to complete.
queryJob = queryJob.waitFor();

// Check for errors
if (queryJob == null) {
  throw new RuntimeException("Job no longer exists");
} else if (queryJob.getStatus().getError() != null) {
  // You can also look at queryJob.getStatus().getExecutionErrors() for all
  // errors, not just the latest one.
  throw new RuntimeException(queryJob.getStatus().getError().toString());
}

Node.js

使用 BigQuery.query() 方法启动查询。

// The SQL query to run
const sqlQuery = `SELECT
  CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
  view_count
  FROM \`bigquery-public-data.stackoverflow.posts_questions\`
  WHERE tags like '%google-bigquery%'
  ORDER BY view_count DESC
  LIMIT 10`;

const options = {
  query: sqlQuery,
  // Location must match that of the dataset(s) referenced in the query.
  location: 'US',
};

// Run the query
const [rows] = await bigqueryClient.query(options);

PHP

创建查询配置,然后使用 BigQueryClient.startQuery() 方法启动查询。

$query = <<<ENDSQL
SELECT
  CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
  view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
LIMIT 10;
ENDSQL;
$queryJobConfig = $bigQuery->query($query);
$queryResults = $bigQuery->runQuery($queryJobConfig);

Python

使用 Client.query() 方法启动查询。

query_job = client.query(
    """
    SELECT
      CONCAT(
        'https://stackoverflow.com/questions/',
        CAST(id as STRING)) as url,
      view_count
    FROM `bigquery-public-data.stackoverflow.posts_questions`
    WHERE tags like '%google-bigquery%'
    ORDER BY view_count DESC
    LIMIT 10"""
)

results = query_job.result()  # Waits for job to complete.

Ruby

使用 Google::Cloud::Bigquery::Project.query 函数启动查询并等待结果。

sql     = "SELECT " +
          "CONCAT('https://stackoverflow.com/questions/', " +
          "       CAST(id as STRING)) as url, view_count " +
          "FROM `bigquery-public-data.stackoverflow.posts_questions` " +
          "WHERE tags like '%google-bigquery%' " +
          "ORDER BY view_count DESC LIMIT 10"
results = bigquery.query sql

如需运行 BigQuery 查询的更多示例,请参阅:

显示查询结果

显示查询结果。

C#

Console.Write("\nQuery Results:\n------------\n");
foreach (var row in result)
{
    Console.WriteLine($"{row["url"]}: {row["view_count"]} views");
}

Go

使用 RowIterator.Next() 函数将每一行数据加载到一个结构体指针。

type StackOverflowRow struct {
	URL       string `bigquery:"url"`
	ViewCount int64  `bigquery:"view_count"`
}

// printResults prints results from a query to the Stack Overflow public dataset.
func printResults(w io.Writer, iter *bigquery.RowIterator) error {
	for {
		var row StackOverflowRow
		err := iter.Next(&row)
		if err == iterator.Done {
			return nil
		}
		if err != nil {
			return fmt.Errorf("error iterating through results: %v", err)
		}

		fmt.Fprintf(w, "url: %s views: %d\n", row.URL, row.ViewCount)
	}
}

Java

迭代 QueryResponse,以获取结果中的所有行。迭代器会自动处理分页。每个 FieldList 都会按数字索引或列名称显示列。

// Get the results.
TableResult result = queryJob.getQueryResults();

// Print all pages of the results.
for (FieldValueList row : result.iterateAll()) {
  String url = row.get("url").getStringValue();
  long viewCount = row.get("view_count").getLongValue();
  System.out.printf("url: %s views: %d%n", url, viewCount);
}

Node.js

查询结果作为行列表返回,其中每一行都是一个字典。

console.log('Query Results:');
rows.forEach(row => {
  const url = row['url'];
  const viewCount = row['view_count'];
  console.log(`url: ${url}, ${viewCount} views`);
});

PHP

调用 Job.queryResults() 方法以等待查询完成。查询结果中的每一行都是一个关联数组。

if ($queryResults->isComplete()) {
    $i = 0;
    $rows = $queryResults->rows();
    foreach ($rows as $row) {
        printf('--- Row %s ---' . PHP_EOL, ++$i);
        printf('url: %s, %s views' . PHP_EOL, $row['url'], $row['view_count']);
    }
    printf('Found %s row(s)' . PHP_EOL, $i);
} else {
    throw new Exception('The query failed to complete');
}

Python

迭代 RowIterator,以获取结果中的所有行。迭代器会自动处理分页。每个 Row 都会按数字索引、按列名称或以 Python 属性的形式显示列。

for row in results:
    print("{} : {} views".format(row.url, row.view_count))

Ruby

Google::Cloud::Bigquery::Data 类会以哈希值的形式显示各行。

results.each do |row|
  puts "#{row[:url]}: #{row[:view_count]} views"
end

详细了解如何在 BigQuery 中处理数据行:

完整源代码

以下是该示例的完整源代码。

C#


using System;
using Google.Cloud.BigQuery.V2;

namespace GoogleCloudSamples
{
    public class Program
    {
        public static void Main(string[] args)
        {
            string projectId = "YOUR-PROJECT-ID";
            var client = BigQueryClient.Create(projectId);
            string query = @"SELECT
                CONCAT(
                    'https://stackoverflow.com/questions/',
                    CAST(id as STRING)) as url, view_count
                FROM `bigquery-public-data.stackoverflow.posts_questions`
                WHERE tags like '%google-bigquery%'
                ORDER BY view_count DESC
                LIMIT 10";
            var result = client.ExecuteQuery(query, parameters: null);
            Console.Write("\nQuery Results:\n------------\n");
            foreach (var row in result)
            {
                Console.WriteLine($"{row["url"]}: {row["view_count"]} views");
            }
        }
    }
}

Go

import (
	"context"
	"fmt"
	"io"
	"log"
	"os"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/iterator"
)

func main() {
	projectID := os.Getenv("GOOGLE_CLOUD_PROJECT")
	if projectID == "" {
		fmt.Println("GOOGLE_CLOUD_PROJECT environment variable must be set.")
		os.Exit(1)
	}

	ctx := context.Background()

	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		log.Fatalf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	rows, err := query(ctx, client)
	if err != nil {
		log.Fatal(err)
	}
	if err := printResults(os.Stdout, rows); err != nil {
		log.Fatal(err)
	}
}

// query returns a row iterator suitable for reading query results.
func query(ctx context.Context, client *bigquery.Client) (*bigquery.RowIterator, error) {

	query := client.Query(
		`SELECT
			CONCAT(
				'https://stackoverflow.com/questions/',
				CAST(id as STRING)) as url,
			view_count
		FROM ` + "`bigquery-public-data.stackoverflow.posts_questions`" + `
		WHERE tags like '%google-bigquery%'
		ORDER BY view_count DESC
		LIMIT 10;`)
	return query.Read(ctx)
}

type StackOverflowRow struct {
	URL       string `bigquery:"url"`
	ViewCount int64  `bigquery:"view_count"`
}

// printResults prints results from a query to the Stack Overflow public dataset.
func printResults(w io.Writer, iter *bigquery.RowIterator) error {
	for {
		var row StackOverflowRow
		err := iter.Next(&row)
		if err == iterator.Done {
			return nil
		}
		if err != nil {
			return fmt.Errorf("error iterating through results: %v", err)
		}

		fmt.Fprintf(w, "url: %s views: %d\n", row.URL, row.ViewCount)
	}
}

Java


import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobId;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;
import java.util.UUID;

public class SimpleApp {
  public static void main(String... args) throws Exception {
    BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
    QueryJobConfiguration queryConfig =
        QueryJobConfiguration.newBuilder(
                "SELECT "
                    + "CONCAT('https://stackoverflow.com/questions/', CAST(id as STRING)) as url, "
                    + "view_count "
                    + "FROM `bigquery-public-data.stackoverflow.posts_questions` "
                    + "WHERE tags like '%google-bigquery%' "
                    + "ORDER BY favorite_count DESC LIMIT 10")
            // Use standard SQL syntax for queries.
            // See: https://cloud.google.com/bigquery/sql-reference/
            .setUseLegacySql(false)
            .build();

    // Create a job ID so that we can safely retry.
    JobId jobId = JobId.of(UUID.randomUUID().toString());
    Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

    // Wait for the query to complete.
    queryJob = queryJob.waitFor();

    // Check for errors
    if (queryJob == null) {
      throw new RuntimeException("Job no longer exists");
    } else if (queryJob.getStatus().getError() != null) {
      // You can also look at queryJob.getStatus().getExecutionErrors() for all
      // errors, not just the latest one.
      throw new RuntimeException(queryJob.getStatus().getError().toString());
    }

    // Get the results.
    TableResult result = queryJob.getQueryResults();

    // Print all pages of the results.
    for (FieldValueList row : result.iterateAll()) {
      String url = row.get("url").getStringValue();
      long viewCount = row.get("view_count").getLongValue();
      System.out.printf("url: %s views: %d%n", url, viewCount);
    }
  }
}

Node.js

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');

async function queryStackOverflow() {
  // Queries a public Stack Overflow dataset.

  // Create a client
  const bigqueryClient = new BigQuery();

  // The SQL query to run
  const sqlQuery = `SELECT
    CONCAT(
      'https://stackoverflow.com/questions/',
      CAST(id as STRING)) as url,
    view_count
    FROM \`bigquery-public-data.stackoverflow.posts_questions\`
    WHERE tags like '%google-bigquery%'
    ORDER BY view_count DESC
    LIMIT 10`;

  const options = {
    query: sqlQuery,
    // Location must match that of the dataset(s) referenced in the query.
    location: 'US',
  };

  // Run the query
  const [rows] = await bigqueryClient.query(options);

  console.log('Query Results:');
  rows.forEach(row => {
    const url = row['url'];
    const viewCount = row['view_count'];
    console.log(`url: ${url}, ${viewCount} views`);
  });
}
queryStackOverflow();

PHP

require __DIR__ . '/vendor/autoload.php';

use Google\Cloud\BigQuery\BigQueryClient;

// get the project ID as the first argument
if (2 != count($argv)) {
    die("Usage: php stackoverflow.php YOUR_PROJECT_ID\n");
}

$projectId = $argv[1];

$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$query = <<<ENDSQL
SELECT
  CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
  view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
LIMIT 10;
ENDSQL;
$queryJobConfig = $bigQuery->query($query);
$queryResults = $bigQuery->runQuery($queryJobConfig);

if ($queryResults->isComplete()) {
    $i = 0;
    $rows = $queryResults->rows();
    foreach ($rows as $row) {
        printf('--- Row %s ---' . PHP_EOL, ++$i);
        printf('url: %s, %s views' . PHP_EOL, $row['url'], $row['view_count']);
    }
    printf('Found %s row(s)' . PHP_EOL, $i);
} else {
    throw new Exception('The query failed to complete');
}

Python

from google.cloud import bigquery

def query_stackoverflow():
    client = bigquery.Client()
    query_job = client.query(
        """
        SELECT
          CONCAT(
            'https://stackoverflow.com/questions/',
            CAST(id as STRING)) as url,
          view_count
        FROM `bigquery-public-data.stackoverflow.posts_questions`
        WHERE tags like '%google-bigquery%'
        ORDER BY view_count DESC
        LIMIT 10"""
    )

    results = query_job.result()  # Waits for job to complete.

    for row in results:
        print("{} : {} views".format(row.url, row.view_count))

if __name__ == "__main__":
    query_stackoverflow()

Ruby

require "google/cloud/bigquery"

# This uses Application Default Credentials to authenticate.
# @see https://cloud.google.com/bigquery/docs/authentication/getting-started
bigquery = Google::Cloud::Bigquery.new

sql     = "SELECT " +
          "CONCAT('https://stackoverflow.com/questions/', " +
          "       CAST(id as STRING)) as url, view_count " +
          "FROM `bigquery-public-data.stackoverflow.posts_questions` " +
          "WHERE tags like '%google-bigquery%' " +
          "ORDER BY view_count DESC LIMIT 10"
results = bigquery.query sql

results.each do |row|
  puts "#{row[:url]}: #{row[:view_count]} views"
end
恭喜!您已向 BigQuery 发送了第一个请求。

结果怎么样?

后续步骤

详细了解我们的 BigQuery API 客户端库