To support you with running your workloads, we have curated a set of reproducible benchmark recipes that use some of the most common machine learning (ML) frameworks and models. These are stored in GitHub repositories. To access these repositories, see AI Hypercomputer GitHub organization.
Overview
Before you get started with these recipes, ensure that you have completed the following steps:
- Choose an accelerator that best suits your workload. See Choose a deployment strategy.
- Select a consumption method based on your accelerator of choice, see Consumption options.
Create your cluster based on the type of accelerator selected. See Cluster deployment guides.
Recipes for pre-training
The following reproducible benchmark recipes are available for pre-training on GKE clusters.
To search the catalog, you can filter by a combination of your framework, model, and accelerator.
Recipe | Framework | Workload type | Model | Orchestrator | Accelerators |
---|---|---|---|---|---|
Llama3.1 70B - A3 Ultra | MaxText | LLM training | Llama3.1 70B | GKE | A3 Ultra |
Llama3.1 70B - A3 Ultra | NeMo | LLM training | Llama3.1 70B | GKE | A3 Ultra |
Mixtral-8-7B - A3 Ultra | MaxText | LLM training | Mixtral-8-7B | GKE | A3 Ultra |
Mixtral-8-7B - A3 Ultra | NeMo | LLM training | Mixtral-8-7B | GKE | A3 Ultra |
GPT3-175B - A3 Mega | NeMo | LLM training | GPT3-175B | GKE | A3 Mega |
Mixtral 8x7B - A3 Mega | NeMo | Spare MOE | Mixtral 8x7B | GKE | A3 Mega |
NeMo | LLM training |
|
GKE | A3 Mega |