Google.Cloud.BigQuery.V2
Google.Cloud.BigQuery.V2
is a.NET client library for the Google BigQuery API.
It wraps the Google.Apis.Bigquery.v2
generated library, providing a higher-level API to make it easier to use.
Note:
This documentation is for version 3.5.0
of the library.
Some samples may not work with other versions.
Installation
Install the Google.Cloud.BigQuery.V2
package from NuGet. Add it to
your project in the normal way (for example by right-clicking on the
project in Visual Studio and choosing "Manage NuGet Packages...").
Authentication
When running on Google Cloud, no action needs to be taken to authenticate.
Otherwise, the simplest way of authenticating your API calls is to
download a service account JSON file then set the GOOGLE_APPLICATION_CREDENTIALS
environment variable to refer to it.
The credentials will automatically be used to authenticate. See the Getting Started With
Authentication guide for more details.
Getting started
Common operations are exposed via the BigQueryClient class, and additional wrapper classes are present to make operations with datasets, tables and query results simpler.
Client life-cycle management
In many cases you don't need to worry about disposing of
BigQueryClient
objects, and can create them reasonably freely -
but be aware that this can causes issues with memory and network
connection usage. We advise you to reuse a single client object if
possible; if your architecture requires you to frequently create new
client objects, please dispose of them to help with timely resource
clean-up. See the resource clean-up guide for more
details.
Sample code
Querying
BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable("bigquery-public-data", "samples", "shakespeare");
string sql = $"SELECT corpus AS title, COUNT(word) AS unique_words FROM {table} GROUP BY title ORDER BY unique_words DESC LIMIT 10";
BigQueryParameter[] parameters = null;
BigQueryResults results = client.ExecuteQuery(sql, parameters);
foreach (BigQueryRow row in results)
{
Console.WriteLine($"{row["title"]}: {row["unique_words"]}");
}
Parameterized queries
Queries can be provided with parameters, either using names (the default):
BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable(datasetId, tableId);
string sql = $"SELECT player, score, level FROM {table} WHERE score >= @score AND level >= @level";
BigQueryParameter[] parameters = new[]
{
new BigQueryParameter("level", BigQueryDbType.Int64, 2),
new BigQueryParameter("score", BigQueryDbType.Int64, 1500)
};
BigQueryResults results = client.ExecuteQuery(sql, parameters);
foreach (BigQueryRow row in results)
{
Console.WriteLine($"Name: {row["player"]}; Score: {row["score"]}; Level: {row["level"]}");
}
Or using positional parameters:
BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable(datasetId, tableId);
string sql = $"SELECT player, score, level FROM {table} WHERE score >= ? AND level >= ?";
BigQueryParameter[] parameters = new[]
{
new BigQueryParameter(BigQueryDbType.Int64, 1500), // For score
new BigQueryParameter(BigQueryDbType.Int64, 2), // For level
};
QueryOptions queryOptions = new QueryOptions { ParameterMode = BigQueryParameterMode.Positional };
BigQueryResults results = client.ExecuteQuery(sql, parameters, queryOptions);
foreach (BigQueryRow row in results)
{
Console.WriteLine($"Name: {row["player"]}; Score: {row["score"]}; Level: {row["level"]}");
}
Using legacy SQL
By default, BigQueryClient
uses Standard SQL. To
use Legacy SQL,
simply set UseLegacySql
to true in the query options, and make
sure that you use the legacy format for the table name, as shown
below.
BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable("bigquery-public-data", "samples", "shakespeare");
string sql = $"SELECT TOP(corpus, 10) AS title, COUNT(*) AS unique_words FROM {table:legacy}";
BigQueryParameter[] parameters = null;
BigQueryResults results = client.ExecuteQuery(sql, parameters, new QueryOptions { UseLegacySql = true });
foreach (BigQueryRow row in results)
{
Console.WriteLine($"{row["title"]}: {row["unique_words"]}");
}
Wildcard queries
Wildcard queries can be used to query multiple tables at the same time. Wildcard table names only work in queries written using Standard SQL, so make sure to use the standard format for the table name as shown below.
BigQueryClient client = BigQueryClient.Create(projectId);
string sql = $"SELECT year, mo, da, temp, min, max FROM `bigquery-public-data.noaa_gsod.gsod*` where `max` > 120 and `max` < 121 LIMIT 10";
BigQueryParameter[] parameters = null;
BigQueryResults results = client.ExecuteQuery(sql, parameters);
foreach (BigQueryRow row in results)
{
Console.WriteLine($"On {row["year"]}-{row["mo"]}-{row["da"]} the mean temperature was {row["temp"]} with min temperature at {row["min"]} and max temperature at {row["max"]}.");
}
Data insertion
BigQueryClient client = BigQueryClient.Create(projectId);
// Create the dataset if it doesn't exist.
BigQueryDataset dataset = client.GetOrCreateDataset("mydata");
// Create the table if it doesn't exist.
BigQueryTable table = dataset.GetOrCreateTable("scores", new TableSchemaBuilder
{
{ "player", BigQueryDbType.String },
{ "gameStarted", BigQueryDbType.Timestamp },
{ "score", BigQueryDbType.Int64 }
}.Build());
// Insert a single row. There are many other ways of inserting
// data into a table.
table.InsertRow(new BigQueryInsertRow
{
{ "player", "Bob" },
{ "score", 85 },
{ "gameStarted", new DateTime(2000, 1, 14, 10, 30, 0, DateTimeKind.Utc) }
});
DML
BigQuery supports DML.
Suppose we have a high score table, and we realize that on one day we accidentally recorded incorrect scores: each player was only awarded half the score they actually earned. We can update the data afterwards using DML:
BigQueryClient client = BigQueryClient.Create(projectId);
BigQueryTable table = client.GetTable(datasetId, tableId);
BigQueryResults result = client.ExecuteQuery(
$"UPDATE {table} SET score = score * 2 WHERE DATE(game_started) = @date",
new[] { new BigQueryParameter("date", BigQueryDbType.Date, new DateTime(2013, 6, 1)) })
.ThrowOnAnyError();
Console.WriteLine($"Modified {result.NumDmlAffectedRows} row(s)");
Important note on the result returned by DML operations (in version 1.3.0)
In version 1.3.0, iterating over the results of a BigQueryResults
object returned
from a DML operation will iterate over the entire table modified by
that operation. This is a side-effect of the way the underlying API
is called, but it's rarely useful to iterate over the results. The
NumDmlAffectedRows
property of the results object is useful,
however, in determining how many rows were modified.
From version 1.4.0-beta01 onwards, the BigQueryResults
object
returned from a DML operation returns no rows, but
NumDmlAffectedRows
still returns the number of affected rows.
Creating a table partitioned by time
BigQueryClient client = BigQueryClient.Create(projectId);
TableSchema schema = new TableSchemaBuilder
{
{ "message", BigQueryDbType.String }
}.Build();
Table tableToCreate = new Table
{
TimePartitioning = TimePartition.CreateDailyPartitioning(expiration: null),
Schema = schema
};
BigQueryTable table = client.CreateTable(datasetId, tableId, tableToCreate);
// Upload a single row to the table, using JSON rather than the streaming buffer, as
// the _PARTITIONTIME column will be null while it's being served from the streaming buffer.
// This code assumes the upload succeeds; normally, you should check the job results.
table.UploadJson(new[] { "{ \"message\": \"Sample message\" }" }).PollUntilCompleted();
BigQueryResults results = client.ExecuteQuery(
$"SELECT message, _PARTITIONTIME AS pt FROM {table}",
parameters: null);
List<BigQueryRow> rows = results.ToList();
foreach (BigQueryRow row in rows)
{
string message = (string) row["message"];
DateTime partition = (DateTime) row["pt"];
Console.WriteLine($"Message: {message}; partition: {partition:yyyy-MM-dd}");
}
Querying an external data source
As described in the documentation, BigQuery can query some external data sources. The sample code below demonstrates querying a CSV file stored in Google Cloud Storage.
BigQueryClient client = BigQueryClient.Create(projectId);
TableSchema schema = new TableSchemaBuilder
{
{ "name", BigQueryDbType.String },
{ "score", BigQueryDbType.Int64 }
}.Build();
Table tableToCreate = new Table
{
ExternalDataConfiguration = new ExternalDataConfiguration
{
SourceFormat = "CSV",
SourceUris = new[] { $"gs://{bucket}/{objectName}" }
},
Schema = schema
};
BigQueryTable table = client.CreateTable(datasetId, tableId, tableToCreate);
BigQueryParameter[] parameters = null;
List<BigQueryRow> rows = client.ExecuteQuery($"SELECT name, score FROM {table} ORDER BY score", parameters).ToList();
foreach (BigQueryRow row in rows)
{
Console.WriteLine($"{row["name"]} - {row["score"]}");
}