MaxDiffusion inference on v6e TPUs
This tutorial shows how to serve MaxDiffusion models on TPU v6e. In this tutorial, you generate images using the Stable Diffusion XL model.
Before you begin
Prepare to provision a TPU v6e with 4 chips:
- Sign in to your Google Account. If you haven't already, sign up for a new account.
- In the Google Cloud console, select or create a Google Cloud project from the project selector page.
- Enable billing for your Google Cloud project. Billing is required for all Google Cloud usage.
- Install the gcloud alpha components.
Run the following command to install the latest version of
gcloud
components.gcloud components update
Enable the TPU API through the following
gcloud
command using Cloud Shell. You can also enable it from the Google Cloud console.gcloud services enable tpu.googleapis.com
Create a service identity for the TPU VM.
gcloud alpha compute tpus tpu-vm service-identity create --zone=ZONE
Create a TPU service account and grant access to Google Cloud services.
Service accounts allow the Google Cloud TPU service to access other Google Cloud services. A user-managed service account is recommended. Follow these guides to create and grant roles. The following roles are necessary:
- TPU Admin: Needed to create a TPU
- Storage Admin: Needed for accessing Cloud Storage
- Logs Writer: Needed for writing logs with the Logging API
- Monitoring Metric Writer: Needed for writing metrics to Cloud Monitoring
Authenticate with Google Cloud and configure the default project and zone for Google Cloud CLI.
gcloud auth login gcloud config set project PROJECT_ID gcloud config set compute/zone ZONE
Secure capacity
Contact your Cloud TPU sales or account team to request TPU quota and to ask any questions about capacity.
Provision the Cloud TPU environment
You can provision v6e TPUs with GKE, with GKE and XPK, or as queued resources.
Prerequisites
- Verify that your project has enough
TPUS_PER_TPU_FAMILY
quota, which specifies the maximum number of chips you can access within your Google Cloud project. - This tutorial was tested with the following configuration:
- Python
3.10 or later
- Nightly software versions:
- nightly JAX
0.4.32.dev20240912
- nightly LibTPU
0.1.dev20240912+nightly
- nightly JAX
- Stable software versions:
- JAX + JAX Lib of
v0.4.35
- JAX + JAX Lib of
- Python
- Verify that your project has enough TPU quota for:
- TPU VM quota
- IP Address quota
- Hyperdisk balanced quota
- User project permissions
- If you are using GKE with XPK, see Cloud Console Permissions on the user or service account for the permissions needed to run XPK.
Provision a TPU v6e
gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \ --node-id TPU_NAME \ --project PROJECT_ID \ --zone ZONE \ --accelerator-type v6e-4 \ --runtime-version v2-alpha-tpuv6e \ --service-account SERVICE_ACCOUNT
Use the list
or describe
commands
to query the status of your queued resource.
gcloud alpha compute tpus queued-resources describe ${QUEUED_RESOURCE_ID} \
--project ${PROJECT_ID} --zone ${ZONE}
For a complete list of queued resource request statuses, see the Queued Resources documentation.
Connect to the TPU using SSH
gcloud compute tpus tpu-vm ssh TPU_NAME
Create a Conda environment
Create a directory for Miniconda:
mkdir -p ~/miniconda3
Download the Miniconda installer script:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
Install Miniconda:
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
Remove the Miniconda installer script:
rm -rf ~/miniconda3/miniconda.sh
Add Miniconda to your
PATH
variable:export PATH="$HOME/miniconda3/bin:$PATH"
Reload
~/.bashrc
to apply the changes to thePATH
variable:source ~/.bashrc
Create a new Conda environment:
conda create -n tpu python=3.10
Activate the Conda environment:
source activate tpu
Set up MaxDiffusion
Clone the MaxDiffusion repository and navigate to the MaxDiffusion directory:
git clone https://github.com/google/maxdiffusion.git && cd maxdiffusion
Switch to the
mlperf-4.1
branch:git checkout mlperf4.1
Install MaxDiffusion:
pip install -e .
Install dependencies:
pip install -r requirements.txt
Install JAX:
pip install -U --pre jax[tpu] -f https://storage.googleapis.com/jax-releases/jax_nightly_releases.html -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
Install additional dependencies:
pip install huggingface_hub==0.25 absl-py flax tensorboardX google-cloud-storage torch tensorflow transformers
Generate images
Set environment variables to configure the TPU runtime:
LIBTPU_INIT_ARGS="--xla_tpu_rwb_fusion=false --xla_tpu_dot_dot_fusion_duplicated=true --xla_tpu_scoped_vmem_limit_kib=65536"
Generate images using the prompt and configurations defined in
src/maxdiffusion/configs/base_xl.yml
:python -m src.maxdiffusion.generate_sdxl src/maxdiffusion/configs/base_xl.yml run_name="my_run"
When the images have been generated, be sure to clean up the TPU resources.
Clean up
Delete the TPU:
gcloud compute tpus queued-resources delete QUEUED_RESOURCE_ID \ --project PROJECT_ID \ --zone ZONE \ --force \ --async