Run a hybrid render farm proof of concept


This document shows how to run a proof of concept (PoC) to build a hybrid render farm on Google Cloud. This document is a companion to Build a hybrid render farm and is designed to facilitate testing and benchmarking rendering for animation, film, commercials, or video games on Google Cloud.

You can run a PoC for your hybrid render farm on Google Cloud if you narrow the scope of your tests to only the essential components. In contrast to architecting an entire end-to-end solution, consider the following purposes of a PoC:

  • Determine how to reproduce your on-premises rendering environment on the cloud.
  • Measure differences in rendering and networking performance between on-premises render workers and cloud instances.
  • Determine cost differences between on-premises and cloud workloads.

Of lesser importance are the following tasks that you can postpone or even eliminate from a PoC:

  • Determine how assets are synchronized (if at all) between your facility and the cloud.
  • Determine how to deploy jobs to cloud render workers by using queue management software.
  • Determine the best way to connect to Google Cloud.
  • Measure latency between your facility and Google data centers.

Connectivity

For a rendering PoC, you don't need enterprise-grade connectivity to Google. A connection over the public internet is sufficient. Connection speed, latency, and bandwidth are of secondary importance to rendering performance.

You can treat connectivity as a separate PoC because arranging Dedicated Interconnect or Partner Interconnect for a PoC can take time, and can be performed concurrently with rendering testing.

Objectives

  • Create a Compute Engine instance and customize it to serve as a render worker.
  • Create a custom image.
  • Deploy a render worker.
  • Copy assets to the render worker.
  • Perform render benchmarks.
  • Copy test renders from the render worker to your local workstation for evaluation.

Costs

When you estimate your projected usage, estimate the difference in cost between an on-premises and cloud-based render workers.

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Install the Google Cloud CLI.
  5. To initialize the gcloud CLI, run the following command:

    gcloud init
  6. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  7. Make sure that billing is enabled for your Google Cloud project.

  8. Install the Google Cloud CLI.
  9. To initialize the gcloud CLI, run the following command:

    gcloud init
  10. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  11. In this document, you mostly use Cloud Shell to perform the steps, but copying data from an on-premises machine to Cloud Storage requires that you have the Google Cloud CLI running on that machine.

Setting up your environment

  • In Cloud Shell, set the Compute Engine zone:

    gcloud config set compute/zone [ZONE]
    

    Where [ZONE] is the zone where all of your resources are created.

Deploying an instance

For your PoC, you might want to recreate your on-premises render worker hardware. While Google Cloud offers a number of CPU platforms that might match your own hardware, the architecture of a cloud-based virtual machine is different from a bare-metal render blade in an on-premises render farm.

On Google Cloud, resources are virtualized and independent of other resources. Virtual machines (instances) are composed of the following major components:

  • Virtual CPUs (vCPUs)
  • Memory (RAM)
  • Disks

    • Boot disk and guest OS
    • Additional storage disks
  • NVIDIA Tesla GPUs (optional)

You can also control other aspects of resources, such as networking, firewall rules, and user access. But for the purposes of your PoC, you need only pay attention to the four components mentioned previously.

Create an instance

  1. In Cloud Shell, create your prototype render worker instance:

    gcloud compute instances create [INSTANCE_NAME] \
        --machine-type [MACHINE_TYPE] \
        --image-project [IMAGE_PROJECT] \
        --image-family [IMAGE_FAMILY] \
        --boot-disk-size [SIZE]
    

    Where:

    • [INSTANCE_NAME] is a name of your instance.
    • [MACHINE_TYPE] is either a predefined machine type or a custom machine type using the format custom-[NUMBER_OF_CPUS]-[NUMBER_OF_MB] where you define the number of vCPUs and amount of memory for the machine type.
    • [IMAGE_PROJECT] is the image project of that image family.
    • [IMAGE_FAMILY] is an optional flag that specifies which image family this image belongs to.
    • [SIZE] is the size of the boot disk in GB.

    For example:

    gcloud compute instances create render-worker-proto \
        --machine-type custom-24-32768 \
        --image-project centos-cloud \
        --image-family centos-7 \
        --boot-disk-size 100
    

    The preceding command creates a CentOS 7 instance with 24 vCPUs, 32 GB RAM, and a standard 100 GB boot disk. The instance is created in the zone you set earlier as your default compute zone.

You can choose to create a VM of any size, up to 96 vCPUs (if you need more, try the ultramem types), over 624 GB RAM, or multiple NVIDIA Tesla GPUs. The possibilities are endless, but be careful not to overprovision; you want to architect a cost-effective, scalable, cloud-based render farm, suitable for jobs of any size.

Logging on to an instance

  1. In Cloud Shell, connect to your instance by using SSH:

    gcloud compute ssh [INSTANCE_NAME]
    
  2. Install and license software on your instance as you would with an on-premises render worker.

Building your default image

Unless you have custom software to test that requires things like a custom Linux kernel or older OS versions, we recommend you start with one of our public disk images and add the software you're going to use.

If you choose to import your own image, you need to configure this image by installing additional libraries to enable your guest OS to communicate with Google Cloud.

Set up your render worker

  1. In Cloud Shell, on the instance you created earlier, set up your render worker as you would your on-premises worker by installing your software and libraries.

  2. Stop the instance:

    gcloud compute instances stop [INSTANCE_NAME]
    

Create a custom image

  1. In Cloud Shell, determine the name of your VM's boot disk:

    gcloud compute instances describe [INSTANCE_NAME]
    

    The output contains the name of your instance's boot disk:

    mode: READ_WRITE
    source:https://www.googleapis.com/compute/v1/projects/[PROJECT]/zones/[ZONE]/disks/[DISK_NAME]
    

    Where:

    • [PROJECT] is the name of your Google Cloud project.
    • [ZONE] is the zone where the disk is located.
    • [DISK_NAME] is the name of the boot disk attached to your instance. The disk name is typically the same (or similar) to your instance name.
  2. Create an image from your instance:

    gcloud compute images create [IMAGE_NAME] \
        --source-disk [DISK_NAME] \
        --source-disk-zone [ZONE]
    

    Where:

    • [IMAGE_NAME] is a name for the new image.
    • [DISK_NAME] is the disk from which you want to create the new image.
    • [ZONE] is the zone where the disk is located.

Deploying a render worker

Now that you have a custom image ready with the OS, software, and libraries you need, you can deploy a render worker instance using your custom image, rather than using a public image.

  • In Cloud Shell, create a render worker instance. Add the scope devstorage.read_write so that you can write to Cloud Storage from this instance.

    gcloud compute instances create [WORKER_NAME] \
        --machine-type [MACHINE_TYPE] \
        --image [IMAGE_NAME] \
        --scopes https://www.googleapis.com/auth/devstorage.read_write \
        --boot-disk-size [SIZE]
    

    Where [WORKER_NAME] is a name for the render worker.

Licensing software

You can use your on-premises license server to provide licenses during a PoC, because you don't need to reissue licenses for new cloud-based license servers. To securely connect to your on-premises license server from your cloud instance, create a firewall rule that only allows traffic over the necessary ports. This firewall rule also allows traffic from the IP address of your on-premises internet gateway or of the license server itself.

You might need to configure your facility's internet gateway to allow traffic from your Google Cloud instance to reach your on-premises license server.

Use your on-premises license server

You can allow traffic into your Virtual Private Cloud (VPC) network by creating a firewall rule.

  • In Cloud Shell, create the firewall rule:

    gcloud compute firewall-rules create [RULE_NAME] \
       --direction=INGRESS \
       --priority=1000 \
       --network=default \
       --action=ALLOW \
       --rules=[PROTOCOL]:[PORT] \
       --source-ranges=[IP_ADDRESS]
    

Where:

  • [RULE_NAME] is a name for the firewall rule.
  • [PROTOCOL] is the protocol for the traffic.
  • [PORT] is the port over which the traffic travels.
  • [IP_ADDRESS] is the IP address of your on-premises license server.

Use a cloud-based license server

A cloud-based license server doesn't require connectivity to your on-premises network, and runs on the same VPC network as your render worker. Because license serving is a relatively lightweight task, a small instance (2-4 vCPUs, 6-8 GB RAM) can handle the workload of serving licenses to a handful of render workers.

Depending on the type of software you need to license, you might need to re-key your licenses to a unique hardware ID number, such as the MAC address of the license server. Other license managers can validate software licenses from any internet-connected host. There are many license managers, so consult your product licensing documentation for instructions.

Allow communication between instances

Render workers and license server instances need to communicate with each other. The firewall rule default-allow-internal allows all instances in your project to communicate with each other. This firewall rule is created when you create a new project. If you're using a new project, you can skip this section. If you're using an existing project, you need to test if the firewall rule is still in your Google Cloud project.

  1. In Cloud Shell, check to see if the firewall rule is in your project:

    gcloud compute firewall-rules list \
        --filter="name=default-allow-internal"
    

    If the firewall rule is in your project, you see the following output:

    NAME                   NETWORK DIRECTION PRIORITY ALLOW DENY                   DISABLED
    default-allow-internal default INGRESS   65534m   tcp:0-65535,udp:0-65535,icmp False
    

    If the firewall rule isn't in your project, the output doesn't display anything.

  2. If you need to create the firewall rule, use the following command:

    gcloud compute firewall-rules create default-allow-internal \
        --direction=INGRESS \
        --priority=65534 \
        --network=default \
        --action=ALLOW \
        --rules=tcp:0-65535,udp:0-65535,icmp \
        --source-ranges=0.0.0.0/0
    

Storing assets

Render pipelines can differ vastly, even within a single company. To implement your PoC quickly and with minimal configuration, you can use the boot disk of your render worker instance to store assets. Your PoC shouldn't yet evaluate data synchronization or more advanced storage solutions. You can evaluate those options in a separate PoC.

There are a number of storage options available on Google Cloud, but we recommend testing a scalable shared storage solution in a separate PoC.

If you're testing multiple render worker configurations and need a shared file system, you can create a Filestore volume and mount it by using NFS to your render workers. Filestore is a managed file storage service that can be mounted to read/write across many instances, acting as a file server.

Getting data to Google Cloud

To run a render PoC, you need to get your scene files, caches, and assets to your render workers. For larger (>10 GB) datasets, you can use gsutil to copy your data to Cloud Storage and then onto your render workers. For smaller (<10 GB) datasets, you can use the gcloud CLI to copy data directly to a path on your render workers (Linux only).

Create a destination directory on your render worker

  1. In Cloud Shell, connect to your render worker by using SSH:

    gcloud compute ssh [WORKER_NAME]
    

    Where [WORKER_NAME] is the name of your render worker.

  2. Create a destination directory for your data:

    mkdir [ASSET_DIR]
    

    Where [ASSET_DIR] is a local directory anywhere on your render worker.

Use gsutil to copy large amounts of data

If you're transferring large datasets to your render worker, use gsutil with Cloud Storage as an intermediate step. If you're transferring smaller datasets, you can skip to the next section and use the gcloud CLI to transfer smaller amounts of data.

  1. On your local workstation, create a Cloud Storage bucket:

    gsutil mb gs://[BUCKET_NAME_ASSETS]
    

    Where [BUCKET_NAME_ASSETS] represents the name of the Cloud Storage bucket for your files or directories that you want to copy.

  2. Copy data from your local directory to the bucket:

    gsutil -m cp -r [ASSETS] gs://[BUCKET_NAME_ASSETS]
    

    Where [ASSETS] is a list of files or directories to copy to your bucket.

  3. Connect to your render worker by using SSH:

    gcloud compute ssh [WORKER_NAME]
    
  4. Copy the contents of your bucket to your render worker:

    gsutil -m cp -r gs://[BUCKET_NAME_ASSETS]/* [ASSET_DIR]
    

Use the gcloud CLI to copy small amounts of data

If you're transferring smaller datasets, you can copy directly from your local workstation to a running Linux render worker by using the gcloud CLI.

  • On your local workstation, copy data between your local directory and your render worker:

    gcloud compute scp --recurse [ASSETS] [INSTANCE_NAME]:[ASSET_DIR]
    

    Where:

    • [ASSETS] is a list of files or directories to copy to your bucket.
    • [INSTANCE_NAME] is the name of your render worker.
    • [ASSET_DIR] is any local path on your render worker.

Running test renders

After you've installed and licensed your render software and copied scene data, you're ready to run render tests. This process depends entirely on how your render pipeline runs render commands.

Benchmark tools

If you want to benchmark cloud resources against your on-premises hardware, you can use Perfkit Benchmarker to measure statistics for things such as network bandwidth and disk performance.

Some rendering software has its own benchmarking tools, such as V-Ray, Octane, or Maxon, which you might want to run both on-premises and on the cloud to compare common render configurations.

Getting data from Google Cloud

After you've performed your render tests and want to see the results, you need to copy the resulting renders to your local workstation. Depending on the size of the dataset to transfer, you can either use gsutil or the gcloud CLI.

Create a destination directory on your local workstation

  • On your local workstation, create a directory for your renders:

    mkdir [RENDER_DIR]
    

    Where [RENDER_DIR] is a local path on your render worker.

Use gsutil to copy large amounts of data

If you're transferring large datasets, use gsutil. Otherwise, skip to the next section to use the gcloud CLI. To copy data from your render worker to a Cloud Storage bucket, create a separate Cloud Storage bucket to keep your renders separate from your asset data.

  1. On your local workstation, create a new Cloud Storage bucket:

    gsutil mb gs://[BUCKET_NAME_RENDERS]
    

    Where [BUCKET_NAME_RENDERS] represents the name of your Cloud Storage bucket for your rendered data.

  2. Connect to your render worker by using SSH:

    gcloud compute ssh [WORKER_NAME]
    
  3. Copy your rendered data to your bucket:

    gsutil -m cp -r [RENDERS] gs://[BUCKET_NAME_RENDERS]
    

    Where:

    • [RENDERS] is a list of files or directories to copy to your bucket.
  4. On your local worksation, copy the files from the Cloud Storage bucket to a local directory:

    gsutil -m cp -r gs://[BUCKET_NAME_RENDERS]/* [RENDER_DIR]
    

Use the gcloud CLI to copy small amounts of data

If you're copying smaller datasets, you can copy directly from your render worker to your local workstation.

  • On your local workstation, copy renders into your destination directory:

    gcloud compute scp --recurse [WORKER_NAME]:[RENDERS] [RENDER_DIR]
    

    Where [RENDERS] is a list of files or directories to copy to your local workstation.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete individual resources

  1. Delete the instance:
    gcloud compute instances delete INSTANCE_NAME
  2. Delete the bucket:
    gcloud storage buckets delete BUCKET_NAME

What's next