Benchmarking recipes

To support you with running your workloads, we have curated a set of reproducible benchmark recipes that use some of the most common machine learning (ML) frameworks and models. These are stored in GitHub repositories. To access these repositories, see AI Hypercomputer GitHub organization.

Overview

Before you get started with these recipes, ensure that you have completed the following steps:

  1. Choose an accelerator that best suits your workload. See Choose a deployment strategy.
  2. Select a consumption method based on your accelerator of choice, see Consumption options.
  3. Create your cluster based on the type of accelerator selected. See Cluster deployment guides.

Recipes for pre-training

The following reproducible benchmark recipes are available for pre-training on GKE clusters.

To search the catalog, you can filter by a combination of your framework, model, and accelerator.

Recipe Framework Workload type Model Orchestrator Accelerators
Llama3.1 70B - A3 Ultra MaxText LLM training Llama3.1 70B GKE A3 Ultra
Llama3.1 70B - A3 Ultra NeMo LLM training Llama3.1 70B GKE A3 Ultra
Mixtral-8-7B - A3 Ultra MaxText LLM training Mixtral-8-7B GKE A3 Ultra
Mixtral-8-7B - A3 Ultra NeMo LLM training Mixtral-8-7B GKE A3 Ultra
GPT3-175B - A3 Mega NeMo LLM training GPT3-175B GKE A3 Mega
Mixtral 8x7B - A3 Mega NeMo Spare MOE Mixtral 8x7B GKE A3 Mega
NeMo LLM training
  • Llama3 70B
  • Llama3.1 70B
GKE A3 Mega