BigQuery API - Module Google::Cloud::Bigquery::External (v1.39.0)

Reference documentation and code samples for the BigQuery API module Google::Cloud::Bigquery::External.

External

Creates a new DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.

See DataSource, CsvSource, JsonSource, SheetsSource, BigtableSource

Examples

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

csv_url = "gs://bucket/path/to/data.csv"
csv_table = bigquery.external csv_url do |csv|
  csv.autodetect = true
  csv.skip_leading_rows = 1
end

data = bigquery.query "SELECT * FROM my_ext_table",
                      external: { my_ext_table: csv_table }

# Iterate over the first page of results
data.each do |row|
  puts row[:name]
end
# Retrieve the next page of results
data = data.next if data.next?

Hive partitioning options:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*"
source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/"
external_data = bigquery.external gcs_uri, format: :parquet do |ext|
  ext.hive_partitioning_mode = :auto
  ext.hive_partitioning_require_partition_filter = true
  ext.hive_partitioning_source_uri_prefix = source_uri_prefix
end

external_data.hive_partitioning? #=> true
external_data.hive_partitioning_mode #=> "AUTO"
external_data.hive_partitioning_require_partition_filter? #=> true
external_data.hive_partitioning_source_uri_prefix #=> source_uri_prefix