What are foundation models?

Foundation models, sometimes known as base models, are powerful artificial intelligence (AI) models that are trained on a massive amount of data and can be adapted to a wide range of tasks. The term "foundation model" was coined by the Stanford Institute for Human-Centered Artificial Intelligence (HAI) in 2021.

This technology offers new possibilities across industries, from streamlining software development to improving customer service interactions.

Introduction to foundation models on Google Cloud

Foundation models defined

Foundation models are a form of AI model that undergoes pre-training on a large amount of data to do a range of tasks. This training process, often using self-supervised learning, allows them to learn complex patterns and relationships within the data, helping them perform various tasks with improved accuracy. More importantly, this massive scale can lead to emergent capabilities, where the model can complete tasks it wasn’t explicitly trained to do. This shift from specialized tools to adaptable, general-purpose models is the hallmark of the foundation model paradigm. 

What is the difference between a foundation model and an LLM?

The terms "foundation model" and "large language model" (LLM) are often used interchangeably, but there's a key distinction. LLMs are a major type of foundation model, but they aren't the only kind. Think of it as a parent-child relationship: all LLMs are foundation models, but not all foundation models are LLMs.

The key difference is the type of data they're built on. LLMs, as the name implies, are trained specifically on vast amounts of text and code. The broader category of 'foundation models' also includes models trained on other data types, such as images, audio, and video, or a combination of them (multimodal).

What is the difference between generative AI and foundation models?

Generative AI and foundation models are distinct but closely related. The most helpful way to understand the difference is to think of them as the 'engine' vs. the 'function':

  • A foundation model is the powerful, pre-trained engine; it's the underlying technology built on massive data, designed for adaptation
  • Generative AI is a primary function that this engine can perform—the ability to create new content like text, images, or code

While most popular foundation models are used for generative tasks, a foundation model could be adapted for non-generative purposes like complex classification or analysis. Therefore, not all foundation models are inherently generative, but they are the key technology powering the current wave of generative AI applications.

What are the types of foundation models?

Foundation models encompass various architectures, each designed with unique strengths and applications. Here are a few notable types:

  • Large language models (LLMs): These models specialize in understanding and generating human language, excelling in tasks like translation, text summarization, and chatbot interactions.
  • Multimodal models: Trained on diverse data types, including text, images, and audio, these models can analyze and generate content across multiple modalities.
  • Generative adversarial networks (GANs): GANs are a type of foundation model involving two neural networks contesting with each other in a zero-sum game. One network, the generator, makes new data instances, while the other, the discriminator, assesses their authenticity. This adversarial process leads to the generation of increasingly realistic and complex content.
  • Computer vision models: These models are trained on image datasets to perform tasks like image classification, object detection, and image generation. They can be fine-tuned for specific applications, such as medical image analysis or object recognition in autonomous vehicles.

How do foundation models work?

Foundation models are trained on vast datasets using self-supervised learning, which is an approach in machine learning that leverages unsupervised learning techniques for tasks traditionally requiring supervised learning (for example, labeling data with human input). This helps train the model to predict masked or missing parts of the input data. As the model makes predictions, it learns to identify patterns, relationships, and underlying structures within the data.

The training process for a foundation model is similar to that of training a machine learning model, and typically involves several key steps:

Data collection and preparation

  • A large and diverse dataset is gathered that is representative of the real-world distribution of the data the model will encounter during deployment
  • The data is preprocessed to remove noise, outliers, and inconsistencies; this may include techniques such as data cleaning, normalization, and feature engineering

Model architecture selection

  • An appropriate model architecture is chosen based on several factors, including the complexity of the task, the type and volume of data, and the available computational resources
  • Common model architectures used for self-supervised learning include convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers

Self-supervised training

  • The model is trained using self-supervised learning techniques, which involve creating pseudo-labels for the data and training the model to predict these labels
  • This can be done using various methods, such as contrastive learning, masked language modeling, and jigsaw puzzles
  • Self-supervised training allows the model to learn useful representations of the data without relying on manually annotated labels, which can be expensive and time-consuming to obtain

Fine-tuning

  • After the model has been pre-trained using self-supervised learning, it can be fine-tuned on a more niche and task-specific collection of data
  • This involves tailoring the model's parameters to optimize performance on the target task
  • Fine-tuning helps the model adapt to the specific requirements of the task and improve its overall performance

Alignment and safety training

  • After pre-training and fine-tuning, most state-of-the-art models undergo an alignment phase to ensure their outputs are helpful, harmless, and aligned with human intent
  • This critical step often uses techniques like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), where human reviewers rate the model's responses to guide it toward more desirable behaviors

Evaluation and deployment

  • Once the model has been trained and fine-tuned, it’s assessed on a held-out test set to evaluate its performance
  • If the model meets the desired performance criteria, it can be deployed into production, where it can be used to solve real-world problems

Benefits of using foundation models

Foundation models offer several potential advantages for businesses and developers:

Versatility

Foundation models can be adapted to a wide range of tasks, eliminating the need to train separate models for each specific application. This adaptability makes them valuable across various industries and use cases.

Efficiency

Using pre-trained foundation models can significantly reduce the time and resources required to develop new AI applications. Fine-tuning a pre-trained model is often faster and more efficient than training a model from scratch.

Accuracy

Due to their extensive training on vast datasets, foundation models can achieve high accuracy on various tasks, outperforming models trained on smaller datasets.

Cost-effectiveness

By reducing the need for extensive training data and computational resources, foundation models can offer a cost-effective solution for developing AI applications.

Innovation

Foundation models are helping drive innovation in the field of AI, enabling the development of new and more sophisticated AI applications.

Scalability

Foundation models can be scaled to handle large datasets and complex tasks, making them suitable for demanding applications.

What are the challenges and risks of foundation models?

Despite their noted benefits, foundation models present significant challenges that users and developers must navigate:

  • Bias and fairness: Foundation models can inherit and amplify societal biases present in their vast training data, leading to unfair or prejudiced outputs
  • Hallucinations: Models can generate confident-sounding but factually incorrect or nonsensical information, a phenomenon known as "hallucination"
  • High computational cost: Training these models requires enormous computational power and energy, raising environmental and financial concerns

Examples of foundation models

The foundation model ecosystem is vibrant and competitive. Here are some of the most influential examples from key industry players:

  • Google: Known for the Gemini family, a series of powerful multimodal models (Gemini 2.5 Pro is a leading example), and Gemma, a family of open-weight, lightweight models for developers; Google has also developed specialized models like Imagen for text-to-image generation and Veo for video generation
  • OpenAI: The developer of the highly influential GPT (Generative Pre-trained Transformer) series, including the widely used GPT-4
  • Anthropic: Focuses on AI safety and has developed the Claude family of models; the Claude 3 series (including Opus, Sonnet, and Haiku) is known for its large context windows and strong reasoning capabilities
  • Meta: A major proponent of open source AI, Meta developed the Llama series; Llama 3 is an open model that has accelerated innovation across the entire community
  • Mistral AI: A European company that has gained significant traction with high-performing open and commercial models, such as Mistral Large and the open source Mixtral models which use a Mixture-of-Experts (MoE) architecture for greater efficiency

How does Google Cloud use foundation models?

Google Cloud provides an end-to-end enterprise platform, Vertex AI, designed to help organizations access, customize, and deploy foundation models for real-world applications. The strategy is built on providing choice, powerful tools, and integrated infrastructure.

Here’s how Google Cloud uses foundation models:

  • A diverse and open model ecosystem: Through the Vertex AI Model Garden, Google Cloud offers access to a comprehensive library of over 130 foundation models. This includes Google's own state-of-the-art models like the Gemini family (for multimodal tasks) and Gemma (for open, lightweight development), alongside popular third-party and open source models from partners like Anthropic (Claude), Meta (Llama), and Mistral. This allows developers to choose the best model for their specific cost and performance needs.
  • Tools for customization and grounding: Vertex AI provides a full suite of tools to move beyond simple prompts. With Generative AI Studio, teams can test and tune models. A key feature is the ability to ground models in an organization's own enterprise data. This connects the model's reasoning capabilities with a company's specific data sources, significantly reducing hallucinations and making responses factually consistent and relevant.
  • Building AI agents and applications: Google Cloud is focused on helping developers build sophisticated AI applications, not just chatbots. With Vertex AI Agent Builder, organizations can create and deploy conversational AI agents for customer service, internal helpdesks, and other business processes.
  • Embedding generative AI into workflows: Foundation models are being integrated directly into the Google Cloud services businesses already use. For example, Gemini Code Assist acts as an AI-powered assistant for developers to write, explain, and test code faster, while features in BigQuery allow for AI-driven data analysis directly within the data warehouse.

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Google Cloud