Jump to Content
Cost Management

How to calculate your AI costs on Google Cloud

March 3, 2025
Pathik Sharma

Cost Optimization Lead, delta, Google Cloud Consulting

Eric Lam

Head of Cloud FinOps, delta, Google Cloud Consulting

Join us at Google Cloud Next

April 9-11 in Las Vegas

Register

What is the true cost of enterprise AI?

As a technology leader and a steward of company resources, understanding these costs isn't just prudent – it's essential for sustainable AI adoption. To help, we’ll unveil a comprehensive approach to understanding and managing your AI costs on Google Cloud, ensuring your organization captures maximum value from its AI investments.

Whether you're just beginning your AI journey or scaling existing solutions, this approach will equip you with the insights needed to make informed decisions about your AI strategy.

Why understanding AI costs matters now

Google Cloud offers a vast and ever-expanding array of AI services, each with its own pricing structure. Without a clear understanding of these costs, you risk budget overruns, stalled projects, and ultimately, a failure to realize the full potential of your AI investments. This isn't just about saving money; it's about responsible AI development – building solutions that are both innovative and financially sustainable.

Breaking down the Total Cost of Ownership (TCO) for AI on Google Cloud

Let's dissect the major cost components of running AI workloads on Google Cloud:

Cost category

Description

Google Cloud services (Examples)

Model serving cost

The cost of running your trained AI model to make predictions (inference). This is often a per-request or per-unit-of-time cost.

OOTB models available in Vertex AI, Vertex AI Prediction, GKE (if self-managing), Cloud Run Functions (for serverless inference)

Training and tuning costs

The expense of training your AI model on your data and fine-tuning it for optimal performance. This includes compute resources (GPUs/TPUs) and potentially the cost of the training data itself.

Vertex AI Training, Compute Engine (with GPUs/TPUs), GKE or Cloud Run (with GPUs/TPUs)

Cloud hosting costs

The fundamental infrastructure costs for running your AI application, including compute, networking, and storage.

Compute Engine, GKE or Cloud Run, Cloud Storage, Cloud SQL (if your application uses a database)

Training data storage and adapter layers costs

The cost of storing your training data and any "adapter layers" (intermediate representations or fine-tuned model components) created during the training process.

Cloud Storage, BigQuery

Application layer and setup costs

The expenses associated with any additional cloud services needed to support your AI application, such as API gateways, load balancers, monitoring tools, etc.

Cloud Load Balancing, Cloud Monitoring, Cloud Logging, API Gateway, Cloud Functions (for supporting logic)

Operational support cost

The ongoing costs of maintaining and supporting your AI model, including monitoring performance, troubleshooting issues, and potentially retraining the model over time.

Google Cloud Support, internal staff time, potential third-party monitoring tools

Let’s estimate costs with an example

Let's illustrate this with a hypothetical, yet realistic, generative AI use case: Imagine you’re a retail customer with an automated customer support chatbot.

Scenario: A medium-sized e-commerce company wants to deploy a chatbot on their website to handle common customer inquiries (order status, returns, product information and more). They plan to use a pre-trained language model (like one available through Vertex AI Model Garden) and fine-tune it on their own customer support data.

Assumptions:

  • Model: Fine-tuning a low latency language model (in this case we will use Gemini 1.5 Flash).

  • Training data: 1 million customer support conversations (text data).

  • Traffic: 100K chatbot interactions per day.

  • Hosting: Vertex AI Prediction for serving the model.

  • Fine-tuning frequency: Monthly.

Cost estimation

As the retail customer in this example, here’s how you might approach this. 

1. First, discover your model serving cost:

  • Vertex AI Prediction (Gemini 1.5 Flash for Chat) pricing is modality-based pricing so in this case since our input and output is text, the usage unit will be characters. Let's assume an average of 1000 input characters and 500 output characters per interaction.
  • Cost per 1M characters input: $0.0375.
  • Cost per 1M characters output: $0.15
  • Input cost per day: 100,000 interactions * 1000 characters * $0.0375 / 1000000 = $3.75
  • Output cost per day: 100,000 interactions * 500 characters * $0.15 / 1000000 characters = $7.5
  • Total model serving cost per day: $11.25
  • Total model serving cost per month (~30 days): ~$337
https://storage.googleapis.com/gweb-cloudblog-publish/images/fig1.max-1800x1800.jpg

Servicing cost of Gemini Flash 1.5 LLM model

2. Second, identify your training and tuning costs:

In this scenario, we aim to enhance the model's accuracy and relevance to our specific use case through fine-tuning. This involves inputting a million past chat interactions, enabling the model to deliver more precise and customized interactions.

  • Cost per training tokens: $8 / M tokens
  • Cost per training characters: $2 / M characters (where each token approximately equates to 4 characters)
  • Tuning cost (first month): 1,000,000 conversation (training data) * 1500 characters (input + output) * 2 /1,000,000 = $3,000
  • Tuning cost (subsequent month): 100,000 conversation (new training data) * 1500 characters (input + output) * 2 /1,000,000 = $300

3. Third, understand the cloud hosting costs:

Since we're using Vertex AI Prediction, the underlying infrastructure is managed by Google Cloud. The cost is included in the per-request pricing. However, if we are self-managing the model on GKE or Compute Engine, we'd need to factor in VM costs, GPU/TPU costs (if applicable), and networking costs. For this example, we assume this is $0, as it is part of Vertex AI cost.

4. Fourth, define the training data storage and adapter layers costs:

The infrastructure costs for deploying machine learning models often raise concerns, but the data storage components can be economical at moderate scales. When implementing a conversational AI system, storing both the training data and the specialized model adapters represents a minor fraction of the overall costs. Let's break down these storage requirements and their associated expenses.

  • 1M conversations, assuming an average size of 5KB per conversation, would be roughly 5GB of data.
  • Cloud Storage cost for 5GB is negligible: $0.1 per month.
  • Adapter layers (fine-tuned model weights) might add another 1GB of storage. This would still be very inexpensive: $0.02 per month.
  • Total storage cost per month: < $1/month

5. Fifth, consider the application layer and setup costs: 

This depends heavily on the specific application. In this case we are using Cloud Run Functions and Logging. Cloud Run to handle pre- and post-processing of chatbot requests (e.g., formatting, database lookups). In this case let's assume we use request-based billing so we are only charged when it processes the request. In this example we are processing 3M requests per month (100K * 30) and assuming 1 sec for average execution time: $14.30

https://storage.googleapis.com/gweb-cloudblog-publish/images/fig2.max-1800x1800.jpg

Cloud Run function cost for request-based billing

  • Cloud Logging and Monitoring for tracking chatbot performance and debugging issues. Let's estimate 100GB of logging volume (which is on higher end) and retaining the logs for 3 months: $28
https://storage.googleapis.com/gweb-cloudblog-publish/images/fig3.max-1800x1800.jpg

Cloud Logging costs for storage and retention

Total application layer cost per month:~ $40

6. Finally, incorporate the Operational support cost:

This is the hardest to estimate, as it depends on the internal team's size and responsibilities. Let's assume a conservative estimate of 5 hours per week of an engineer's time dedicated to monitoring and maintaining the chatbot, at an hourly rate of $100.

  • Total operational support cost per month: 5 hours/week * 4 weeks/month * $100/hour = $2000
  • Total estimated monthly cost (First month):
  • $ 340 (Serving) + $3000 (Training) + $1 (Storage) + $40 (Application) + $2000 (Operational) = $5,381
  • Total estimated monthly cost (Subsequent months):
  • $340 (Serving) + $300 (Training) + $1 (Storage) + $40 (Application) + $2000 (Operational) = $2,681

You can find the full estimate of cost here. Note that this does not include tuning and operational cost as it is not available in pricing export yet. 

Once you have a good understanding of your AI costs, it is important to develop an optimization strategy that encompasses infrastructure choices, resource utilization, and monitoring practices to maintain performance while controlling expenses. By understanding the various cost components and leveraging Google Cloud's tools and resources, you can confidently embark on your AI journey. Cost management isn't a barrier; it's an enabler. It allows you to experiment, innovate, and build transformative AI solutions in a financially responsible way. 

Get started

Posted in