Google.Cloud.BigQuery.V2

Google.Cloud.BigQuery.V2 is a.NET client library for the Google BigQuery API. It wraps the Google.Apis.Bigquery.v2 generated library, providing a higher-level API to make it easier to use.

Note: This documentation is for version 3.5.0 of the library. Some samples may not work with other versions.

Installation

Install the Google.Cloud.BigQuery.V2 package from NuGet. Add it to your project in the normal way (for example by right-clicking on the project in Visual Studio and choosing "Manage NuGet Packages...").

Authentication

When running on Google Cloud, no action needs to be taken to authenticate.

Otherwise, the simplest way of authenticating your API calls is to download a service account JSON file then set the GOOGLE_APPLICATION_CREDENTIALS environment variable to refer to it. The credentials will automatically be used to authenticate. See the Getting Started With Authentication guide for more details.

Getting started

Common operations are exposed via the BigQueryClient class, and additional wrapper classes are present to make operations with datasets, tables and query results simpler.

Client life-cycle management

In many cases you don't need to worry about disposing of BigQueryClient objects, and can create them reasonably freely - but be aware that this can causes issues with memory and network connection usage. We advise you to reuse a single client object if possible; if your architecture requires you to frequently create new client objects, please dispose of them to help with timely resource clean-up. See the resource clean-up guide for more details.

Sample code

Querying

BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable("bigquery-public-data", "samples", "shakespeare");

string sql = $"SELECT corpus AS title, COUNT(word) AS unique_words FROM {table} GROUP BY title ORDER BY unique_words DESC LIMIT 10";
BigQueryParameter[] parameters = null;
BigQueryResults results = client.ExecuteQuery(sql, parameters);

foreach (BigQueryRow row in results)
{
    Console.WriteLine($"{row["title"]}: {row["unique_words"]}");
}

Parameterized queries

Queries can be provided with parameters, either using names (the default):

BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable(datasetId, tableId);
string sql = $"SELECT player, score, level FROM {table} WHERE score >= @score AND level >= @level";
BigQueryParameter[] parameters = new[]
{
    new BigQueryParameter("level", BigQueryDbType.Int64, 2),
    new BigQueryParameter("score", BigQueryDbType.Int64, 1500)
};
BigQueryResults results = client.ExecuteQuery(sql, parameters);
foreach (BigQueryRow row in results)
{
    Console.WriteLine($"Name: {row["player"]}; Score: {row["score"]}; Level: {row["level"]}");
}

Or using positional parameters:

BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable(datasetId, tableId);
string sql = $"SELECT player, score, level FROM {table} WHERE score >= ? AND level >= ?";
BigQueryParameter[] parameters = new[]
{
    new BigQueryParameter(BigQueryDbType.Int64, 1500), // For score
    new BigQueryParameter(BigQueryDbType.Int64, 2), // For level
};
QueryOptions queryOptions = new QueryOptions { ParameterMode = BigQueryParameterMode.Positional };
BigQueryResults results = client.ExecuteQuery(sql, parameters, queryOptions);
foreach (BigQueryRow row in results)
{
    Console.WriteLine($"Name: {row["player"]}; Score: {row["score"]}; Level: {row["level"]}");
}

Using legacy SQL

By default, BigQueryClient uses Standard SQL. To use Legacy SQL, simply set UseLegacySql to true in the query options, and make sure that you use the legacy format for the table name, as shown below.

BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable("bigquery-public-data", "samples", "shakespeare");

string sql = $"SELECT TOP(corpus, 10) AS title, COUNT(*) AS unique_words FROM {table:legacy}";
BigQueryParameter[] parameters = null;
BigQueryResults results = client.ExecuteQuery(sql, parameters, new QueryOptions { UseLegacySql = true });

foreach (BigQueryRow row in results)
{
    Console.WriteLine($"{row["title"]}: {row["unique_words"]}");
}

Wildcard queries

Wildcard queries can be used to query multiple tables at the same time. Wildcard table names only work in queries written using Standard SQL, so make sure to use the standard format for the table name as shown below.

BigQueryClient client = BigQueryClient.Create(projectId);

string sql = $"SELECT year, mo, da, temp, min, max FROM `bigquery-public-data.noaa_gsod.gsod*` where `max` > 120 and `max` < 121 LIMIT 10";
BigQueryParameter[] parameters = null;
BigQueryResults results = client.ExecuteQuery(sql, parameters);

foreach (BigQueryRow row in results)
{
    Console.WriteLine($"On {row["year"]}-{row["mo"]}-{row["da"]} the mean temperature was {row["temp"]} with min temperature at {row["min"]} and max temperature at {row["max"]}.");
}

Data insertion

BigQueryClient client = BigQueryClient.Create(projectId);

// Create the dataset if it doesn't exist.
BigQueryDataset dataset = client.GetOrCreateDataset("mydata");

// Create the table if it doesn't exist.
BigQueryTable table = dataset.GetOrCreateTable("scores", new TableSchemaBuilder
{
    { "player", BigQueryDbType.String },
    { "gameStarted", BigQueryDbType.Timestamp },
    { "score", BigQueryDbType.Int64 }
}.Build());

// Insert a single row. There are many other ways of inserting
// data into a table.
table.InsertRow(new BigQueryInsertRow
{
    { "player", "Bob" },
    { "score", 85 },
    { "gameStarted", new DateTime(2000, 1, 14, 10, 30, 0, DateTimeKind.Utc) }
});

DML

BigQuery supports DML.

Suppose we have a high score table, and we realize that on one day we accidentally recorded incorrect scores: each player was only awarded half the score they actually earned. We can update the data afterwards using DML:

BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable(datasetId, tableId);
BigQueryResults result = client.ExecuteQuery(
    $"UPDATE {table} SET score = score * 2 WHERE DATE(game_started) = @date",
    new[] { new BigQueryParameter("date", BigQueryDbType.Date, new DateTime(2013, 6, 1)) })
    .ThrowOnAnyError();
Console.WriteLine($"Modified {result.NumDmlAffectedRows} row(s)");

Important note on the result returned by DML operations (in version 1.3.0)

In version 1.3.0, iterating over the results of a BigQueryResults object returned from a DML operation will iterate over the entire table modified by that operation. This is a side-effect of the way the underlying API is called, but it's rarely useful to iterate over the results. The NumDmlAffectedRows property of the results object is useful, however, in determining how many rows were modified.

From version 1.4.0-beta01 onwards, the BigQueryResults object returned from a DML operation returns no rows, but NumDmlAffectedRows still returns the number of affected rows.

Creating a table partitioned by time

BigQueryClient client = BigQueryClient.Create(projectId);
TableSchema schema = new TableSchemaBuilder
{
    { "message", BigQueryDbType.String }
}.Build();
Table tableToCreate = new Table
{
    TimePartitioning = TimePartition.CreateDailyPartitioning(expiration: null),
    Schema = schema
};
BigQueryTable table = client.CreateTable(datasetId, tableId, tableToCreate);
// Upload a single row to the table, using JSON rather than the streaming buffer, as
// the _PARTITIONTIME column will be null while it's being served from the streaming buffer.
// This code assumes the upload succeeds; normally, you should check the job results.
table.UploadJson(new[] { "{ \"message\": \"Sample message\" }" }).PollUntilCompleted();

BigQueryResults results = client.ExecuteQuery(
    $"SELECT message, _PARTITIONTIME AS pt FROM {table}",
    parameters: null);
List<BigQueryRow> rows = results.ToList();
foreach (BigQueryRow row in rows)
{
    string message = (string) row["message"];
    DateTime partition = (DateTime) row["pt"];
    Console.WriteLine($"Message: {message}; partition: {partition:yyyy-MM-dd}");
}

Querying an external data source

As described in the documentation, BigQuery can query some external data sources. The sample code below demonstrates querying a CSV file stored in Google Cloud Storage.

BigQueryClient client = BigQueryClient.Create(projectId);
TableSchema schema = new TableSchemaBuilder
{
    { "name", BigQueryDbType.String },
    { "score", BigQueryDbType.Int64 }
}.Build();
Table tableToCreate = new Table
{
    ExternalDataConfiguration = new ExternalDataConfiguration
    {
        SourceFormat = "CSV",
        SourceUris = new[] { $"gs://{bucket}/{objectName}" }
    },
    Schema = schema
};
BigQueryTable table = client.CreateTable(datasetId, tableId, tableToCreate);
BigQueryParameter[] parameters = null;
List<BigQueryRow> rows = client.ExecuteQuery($"SELECT name, score FROM {table} ORDER BY score", parameters).ToList();
foreach (BigQueryRow row in rows)
{
    Console.WriteLine($"{row["name"]} - {row["score"]}");
}