Data Analytics

Analyzing GCP costs using folders and BigQuery Billing export

Pricing static hero

Effectively managing and reporting on costs is a critical part of financial stewardship, whether you’re running a multi-billion-dollar enterprise business or small household budget. But making data-driven decisions on how to cut costs or double down on a business venture all starts with collecting the data you’ll need to inform those decisions. 

At Google Cloud Platform (GCP), one of our goals is to help customers make data-driven decisions about their cloud costs and usage. You can do this through multiple native capabilities available in the Cloud Console, including Billing reports, budgets and alerts, configurable quotas, resource hierarchies, and Billing export

While costs per project, product, and SKU are available via our native Billing reports, we often hear that enterprises face a challenge when reporting comprehensive spend, KPIs, and cost of goods sold (COGS) per team or department, which often involves more comprehensive data. Fortunately, detailed GCP Billing data is easily exportable into BigQuery and visualized with a BI tool like Data Studio so you can do more detailed analysis. 

Right now, GCP folder hierarchies (denoted as project.ancestry_numbers in the Billing export) only reflect their numerical and immutable ID form rather than their folder name (i.e., a GCP project under folder /acme/production/app1 is reflected by something like /720879011932/792354814932/1068832318756).

To make it easier to allocate costs across departments, teams, cost centers, etc., we’ve put together this solution. Read on to see how to convert IDs into folder names so you can more easily track and analyze costs.

Generating folder hierarchy names

To convert your Billing export’s project.ancestry_numbers into something human-readable, you’ll first need a key that will map ancestry numbers like 720879011932 into the folder name “acme,” and a way to link parent/child folders.

Using our native tools to view your folder structure is simple for one-off requests via the Cloud Console or gcloud CLI, or by API, but it can be complex if your enterprise organization is built from dozens of departments, teams, and hierarchical structures. To make this process even simpler, consider using a GitHub tool like folder-lookup, which uses Cloud Functions to programmatically crawl your organization’s folder structure and create a folder-structure key directly in BigQuery

After running this tool, you’ll see a simple folder-structure key to join against your Billing export, like so:

bigquery_simple folder-structure key.png

Enhancing your BigQuery Billing export

The next step is to convert Folder IDs into Folder Names within your Billing export. In order to cut down costs (i.e., the total amount of data queried), we’ll run this query only once per day via a scheduled BigQuery query. This query joins the previous day’s Billing export to this folder-structure key and appends a running Billing export with a folder data table. If data freshness is a high priority for you, you can easily modify the scheduled query to pull data on a more regular cadence (you’ll also have to alter the line ”WHERE _PARTITIONDATE = DATE_SUB(current_date(), INTERVAL 1 DAY)”).

The query runs against the previous day’s data, so here we’ll run the scheduled query at 7 a.m. every day to get a full picture of the previous day’s spend. Note that the below query assumes a maximum folder depth of four, so it may have to be slightly augmented to meet your requirements.

  #Query to join your previous day's Billing Export with a folder-structure key table.
  IF(LENGTH(folder) > 0, CONCAT(base, '/', folder), base)

WITH CombinedData AS (
    DATE(usage_end_time) AS Date,
    invoice.month, as Project_Name,
    project.ancestry_numbers as Ancestry_Numbers,
    #Be mindful of the number of levels in your folder hierarchy. You may need to add more than 5.
    (SELECT name FROM `<Path_to_your_Folder_Structure_key>` WHERE id = SPLIT(project.ancestry_numbers, '/')[SAFE_OFFSET(1)]) AS level1,
    (SELECT name FROM `<Path_to_your_Folder_Structure_key>` WHERE id = SPLIT(project.ancestry_numbers, '/')[SAFE_OFFSET(2)]) AS level2,
    (SELECT name FROM `<Path_to_your_Folder_Structure_key>` WHERE id = SPLIT(project.ancestry_numbers, '/')[SAFE_OFFSET(3)]) AS level3,
    (SELECT name FROM `<Path_to_your_Folder_Structure_key>` WHERE id = SPLIT(project.ancestry_numbers, '/')[SAFE_OFFSET(4)]) AS level4,
    (SELECT name FROM `<Path_to_your_Folder_Structure_key> WHERE id = SPLIT(project.ancestry_numbers, '/')[SAFE_OFFSET(5)]) AS level5,
    service.description as service_description,
    sku.description as sku_description,
    ROUND(SUM(CAST(cost AS NUMERIC) + (SELECT IFNULL(SUM(CAST(c.amount AS NUMERIC)), 0) FROM UNNEST(credits) AS c)), 2) AS Total_Spend
    FROM `<Path_to_your_Billing_Export>` 
  GROUP BY ancestry_numbers, Date, month, Project_Name, service_description, sku_description
  ConcatFolder(ConcatFolder(ConcatFolder(ConcatFolder(level1, level2), level3), level4), level5) as Folder_Name_Hierarchy
FROM CombinedData
ORDER BY Date DESC, Total_Spend DESC

After running this query, you’ll see there’s now more detailed information on spend per department, and it’s easier to aggregate costs by environment or application.

aggregate costs by environment or application.png

And thanks to Data Studio, visualizing this data is a piece of cake too:

Data Studio.png

Monitoring your company’s cloud spend is crucial for quickly catching issues, prioritizing resources, and making better financial decisions. GCP provides tools to help you conduct a comprehensive analysis of your costs with Billing reports, Billing export, and Data Studio. And thanks to BigQuery’s powerful capabilities, crunching big data and using these insights to make data-driven decisions has never been easier. To see more options for BigQuery Billing export queries, check out our billing query example guide.