Edit on GitHub
Report issue
Page history

Author(s): @{[ username ]}  Published: {[ TutorialCtrl.tutorial.date_published | date:'mediumDate' ]}

title: Using Node.js to Calculate the Size of a BigQuery Dataset description: Learn how to use Node.js to calculate the size of a BigQuery dataset. author: jmdobry tags: BigQuery, Node.js date_published: 2016-01-18


This tutorial shows how to use Node.js to calculate the size of a BigQuery dataset.

We will create a small Node.js script that accepts a project ID and a dataset ID and calculates the size of the dataset.

Setup

  1. Install Node.js.
  2. Create a file named index.js.
  3. Install the @google-cloud/bigquery package from NPM:

    npm install @google-cloud/bigquery
    

Validating input

The first step read and validate arguments that are passed to the script from the command-line. Add the following to index.js:

// Read and validate the input arguments
const [projectId, datasetId] = process.argv.slice(2);

if (!projectId || !datasetId) {
  console.log('Usage: node index.js PROJECT_ID DATASET_ID');
  console.log('Example: node index.js bigquery-public-data hacker_news');
  process.exit();
}

This code ensure the a project ID and dataset ID are passed to the script.

Instantiating a BigQuery client

The next step is to instantiate the BigQuery Node.js client. Add the following to index.js:

// Instantiate a BigQuery client
const bigquery = require('@google-cloud/bigquery')({
  projectId: projectId
});

Specifying the target dataset

With the client instantiated, we can now create a reference to the specified dataset. Add the following to index.js:

// References an existing dataset, e.g. "my_dataset"
const dataset = bigquery.dataset(datasetId);

Calculating the dataset size

Finally, we can load the dataset's table and sum their sizes. Add the following to index.js:

// Lists all tables in the dataset
dataset.getTables()
  .then((results) => results[0])
  // Retrieve the metadata for each table
  .then((tables) => Promise.all(tables.map((table) => table.get())))
  .then((results) => results.map((result) => result[0]))
  // Select the size of each table
  .then((tables) => tables.map((table) => (parseInt(table.metadata.numBytes, 10) / 1000) / 1000))
  // Sum up the sizes
  .then((sizes) => sizes.reduce((cur, prev) => cur + prev, 0))
  // Print and return the size
  .then((sum) => {
    console.log(`Size of ${dataset.id}: ${sum} MB`);
  });

Run the script

Run the index.js script against BigQuery's public Hacker News dataset:

node index.js bigquery-public-data hacker_news

The complete code

Here is the complete source code:

'use strict';

// Read and validate the input arguments
const [projectId, datasetId] = process.argv.slice(2);

if (!projectId || !datasetId) {
  console.log('Usage: node index.js PROJECT_ID DATASET_ID');
  console.log('Example: node index.js bigquery-public-data hacker_news');
  process.exit();
}

// Instantiate a BigQuery client
const bigquery = require('@google-cloud/bigquery')({
  projectId: projectId
});

// References an existing dataset, e.g. "my_dataset"
const dataset = bigquery.dataset(datasetId);

// Lists all tables in the dataset
dataset.getTables()
  .then((results) => results[0])
  // Retrieve the metadata for each table
  .then((tables) => Promise.all(tables.map((table) => table.get())))
  .then((results) => results.map((result) => result[0]))
  // Select the size of each table
  .then((tables) => tables.map((table) => (parseInt(table.metadata.numBytes, 10) / 1000) / 1000))
  // Sum up the sizes
  .then((sizes) => sizes.reduce((cur, prev) => cur + prev, 0))
  // Print and return the size
  .then((sum) => {
    console.log(`Size of ${dataset.id}: ${sum} MB`);
  });

You can also view the source code and its tests here;

You can check out Node.js and Google Cloud Platform to get an overview of Node.js and learn ways to run Node.js apps on Google Cloud Platform.

See more by @{[ username ]} and more tagged {[ tag ]}{[ $last ? '' : ', ' ]}

Submit a Tutorial

Share step-by-step guides

SUBMIT A TUTORIAL

Request a Tutorial

Ask for community help

SUBMIT A REQUEST

GCP Tutorials

Tutorials published by GCP

VIEW TUTORIALS

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.