Securing Rendering Workloads

This article explains how you can take steps to help secure your rendering pipeline on Google Cloud Platform (GCP). You can take advantage of the GCP security features, such as projects and Google Cloud IAM for access control, automatic policy checking, encryption, subnetworks, and firewall rules. This article explains how to adhere to the security protocol demanded by major motion picture studios.

The following image shows a hybrid rendering architecture as a reference:

Diagram of a hybrid rendering architecture

Click for larger version.

Projects and access

Projects are a core organizational component of GCP. Projects provide an abstract grouping that you can use to associate resources with a particular department, application, or functional team. All GCP resources, such as Google Compute Engine instances and Google Cloud Storage buckets, belong to a project. You can manage projects by using the Google Cloud Platform Console, as well as the Resource Manager API. The Google Cloud Identity and Access (IAM) API includes a set of methods to manage project permissions through the Resource Manager API.

Using projects to control access to resources

Projects provide an isolation boundary for both network data and project administration. You can explicitly grant interconnects, however, between the GCP resources used by your organization or other projects within your organization. You can grant users and groups different roles, such as viewer, editor, and owner, for different projects. To assign roles, you can use the IAM & Admin page in the Cloud Platform Console or the Cloud IAM API.

Further, you can delegate control over who has access to a particular project. Users granted the owner role can grant and revoke access for users, groups, and service accounts.

The following image shows an example of a GCP resource hierarchy:

diagram of a project structure

Granting access

If your organization has implemented G Suite, you can grant access to any Cloud Platform project to any user or group in your organization. If you manage your identities outside of G Suite, you can also establish user credentials based on your own LDAP server, including Microsoft Active Directory, using Google Cloud Directory Sync.

You can also give any user in an organization access to a project or resource by adding them to a Google Group that has access to that project or resource. Groups enable you to quickly give access to external parties, such as contractors and freelancers. Your organization might not want to allow this degree of flexibility, however, depending on your security policies. You can use the Cloud API to build monitoring functionality that watches for off-policy assignments, and then raises an alert or revokes them automatically.

Access through SDK and API

If you access GCP through the gcloud or gsutil SDK, you must authenticate when you connect to Google Cloud API. You only have to do this once, per local user environment.

If your applications or scripts access GCP through API client libraries, you must first authenticate through the SDK. The API client libraries then pick up the created credentials.

Identifying projects

Each project has a universally unique project ID, which is a short string of lowercase letters, digits, and dashes. When you create a project, you specify your own project name. The project ID is based on this name, with numbers appended to make it globally unique. You can override the assigned project ID, but the name must be globally unique.

Your project is also assigned a long, globally unique, random project number, which is automatically generated. Project IDs can be from 6 to 30 characters long, while project names can be from 4 to 30 characters long.

After project creation, the project ID and project number stay the same, even if you change the project name.

We recommend that you spend some time planning your project names for manageability. Properly named projects can sort correctly and reduce confusion.

A typical project-naming convention might use the following pattern:

[studio]-[project]-[role (rnd, dev, prod)]

A resulting file name might be, for example: mystudio-myproject-rnd.

Automating security checks

Policy Scanner provides a framework to perform automated security checks on your project to help ensure policies are set correctly. Scanned projects that deviate from a known-good policy are tagged and you are alerted to the issue(s). You can run Policy Scanner on demand, or set it to run on a weekly or daily schedule.

Controlling user access

Not everyone on a project should have unrestricted access to all running instances, services, and stored data. In a rendering pipeline, user permissions are typically handled on-premises by OS-level permission settings, coupled with a directory service such as LDAP or Active Directory.

Unlike a typical rendering pipeline, most artists don’t need project access at all, as most cloud-based tasks, such as asset synchronization, rendering, and writing or copying data are performed by the queue management software operating under a Service Account.

By implementing the principle of least privilege both on-premises and in the cloud, you can restrict users’ access to only the areas of the project or the information necessary to perform specific tasks based on their role.

Unlike web-based workloads, rendering pipelines typically require direct access to running instances, for example to troubleshoot a rendering issue on a particular instance type. You can sign in to an instance using an API, such as the gcloud command, or using SSH to connect directly to an instance if you have established SSH key pairs.

Limit direct access to system administrators and other roles responsible for managing or troubleshooting renders. Direct access is essential for:

  • Debugging or troubleshooting failing jobs.
  • Rendering queue manager control over launching and terminating instances.
  • Granting access by a logging mechanism or software that tracks memory or CPU usage.
  • Executing tasks that run during the job itself.

It is important to differentiate between user permissions in Cloud IAM and permissions that are set at OS-level on a running instance. Although they work together to provide a complete user profile, the two systems serve different purposes and are created and modified with different mechanisms.

Cloud IAM does not manage SSH or user access to individual instances. Access is handled by OS-level user permissions, which can be synchronized with Cloud IAM. Granting Cloud IAM permissions to a user or service account does not change how users log into instances, or their permissions once logged in. The two systems are designed to be complementary: IAM is used to grant access to GCP resources such as the console or the API, and direct access is provisioned only for users that need it.

If you build a custom image, you can enable or disable SSH access by modifying the ssh and Google account daemons on the boot disk. For guidance, investigate the security features already incorporated into our public images.

Identity and Access Management (IAM)

To manage GCP resources, Cloud IAM enables you to create and manage permissions at the organization, project, and resource levels. Cloud IAM unifies access control for GCP services into a single system and presents a consistent set of operations.

IAM roles and groups

There are three primitive Cloud IAM roles: owner, editor, and viewer.

Owner Only a restricted group of trusted people in a facility or on a project should have project owner-level privileges, such as members of the IT Department, systems administrators, or production managers. Project owners can change everything from billing accounts to access levels for any other user, so this role should be assigned with extreme care.

You can create a project with one or more owners. The Organization Administrator role can maintain these owners at the organization level. This admin role can create projects, modify project billing, and assign roles, all without giving complete owner-level access to any individual project.

Organization administrators receive most notifications for all projects in the organization, although, by design, some notifications go only to project owners.

Editor A project editor can perform actions that modify state, such as reading and writing project data, launching and terminating instances, or reading and writing project metadata on all resources in the project.

Viewer This read-only role might not be useful in all pipelines, but you might want to assign this role to some service accounts that monitor and log into external systems. For example, dailies review systems that read images or videos or APIs that communicate with project management systems such as Shotgun.

Predefined roles A number of predefined roles restrict users or service accounts to specific tasks. These roles can help ensure that, for example, an artist doesn’t have access to the billing data of a production, or a production assistant is prevented from deleting a running instance.

Using resource hierarchy for access control

You can set IAM policies at different levels of the resource hierarchy. Resources inherit the policies of the parent resource. This allows you to mirror your IAM policy hierarchy structure to your organization structure. There are a set of best practices for implementing resource hierarchy that we recommend following.

Disable unused roles

A number of roles are enabled by default, and available for assignment by project creators/owners. To reduce confusion, you might want to disable roles that are not applicable to your particular workflow. You cannot do this on a per-project basis; it has to be done under your organization’s settings.

Controlling SSH access to instances

Ensuring the right people have access to resources requires:

  • Synchronization between your directory service and Cloud IAM using Google Cloud Directory Sync. This helps ensure you have an identical list of users and their permissions both on-premises and on the cloud.
  • User authentication mechanisms for SSH access to instances, for example, using the PAM Linux module coupled with LDAP.

For render workloads, we recommend restricting SSH access to a select few users, such as members of the IT Department, systems administrators, and render wranglers.

Jobs submitted to the cloud by a queue manager are typically owned by a dedicated render user or daemon. For example, by default, Qube! render jobs run as user ‘qubeproxy’. We recommend you change this Qube configuration to run as the user who initiated the job, which Qube calls ‘user mode’. This runs all processes under the user who launched the job. Completed renders retain this ownership.

You should set up your boot image as you would any on-premises render worker, with authentication performed using the same protocols as your on-premises render workers.

Service accounts

A service account is a special Google account that can be used to access Google services and resources programmatically.

For rendering pipelines, service accounts are useful for controlling how instances are deployed or terminated, and how jobs are allocated and run on instances. Your render queuing software will launch instances on GCP by using service account credentials, allowing jobs to be launched on the new instance.

When a new project is created, a number of default service accounts are created. We recommend that you keep only the service account named Compute Engine default service account, as well as any service accounts used by your queuing software to launch instances. Use caution when deleting service accounts, as doing so can remove access to some project resources.

You might choose to have separate service accounts for individual pipeline tasks to run program-driven events. These service accounts would be assigned a Cloud IAM role based on their scope of needs. For example:

  • Instance deployment by render queue manager: The main service account for running render jobs on GCP. This service account would be assigned the roles compute.instanceAdmin and iam.serviceAccountActor.
  • Asset manager: A service account for asset publishing, retrieval, or database management. If using Cloud Storage, this service account would be assigned the role storage.admin.
  • Logging agent: A service account used specifically by your project’s logging mechanism, such as Google Stackdriver. This service account would be assigned the role logging.logWriter.

Access scopes

Access scopes are the legacy method of specifying permissions for an instance. These permissions apply to any user on the instance. Access scopes grant default permissions from an instance to Google APIs. Resources such as Compute Engine and Cloud Storage are accessed through these APIs.

Instead of granting default permissions to all users from an instance, you can use Cloud IAM roles in concert with access scopes to grant permission to a single user or service account.

Specify the --no-scopes flag to prevent default scopes from being applied when creating an instance. If you do not specify --no-scopes, and if no scopes are specified with the --scope flag, your instance will have a default set of scopes applied.

By default, instances start with a set of scopes, most of which are necessary to access Cloud IAM user accounts, read from Cloud Storage buckets and write logs using the Stackdriver API.

When you create a new instance, the following scopes are granted to the instance:

Scope

API Task

read only

This scope prevents any user on the instance from writing to a Cloud Storage bucket using the Compute Engine API

logging.write

Permit instance write access to the Compute Engine logs using the Stackdriver Logging API (v2).

monitoring.write

Allow instance to publish metric data to your GCP projects using the Stackdriver Monitoring API (v3).

servicecontrol

Allow instance to manage your Google Service Control data using the Service Control API.

service.management.readonly

Allow instance to publish metric data to your GCP projects using the Stackdriver Monitoring API (v3).

trace.append

Allow instance to collect and write latency data for a project or application using the Stackdriver Trace API.

The default set of scopes, for example, don’t permit instances to write to Cloud Storage buckets. If your rendering pipeline requires instances to write finished renders to Cloud Storage, add the storage-rw scope prior to starting render-only worker instances. Note that doing this, however, permits users to copy any data off the instance, so don’t add this scope to instances with access to sensitive data.

Encryption Key Management

Cloud Storage

All project data (as well as all data on GCP), is encrypted at rest using AES128 or AES256 encryption. You can also choose to provide your own encryption keys for Cloud Storage and Compute Engine Disks.

Cloud Storage always encrypts your data on the server side before it is written to disk. By default, Cloud Storage uses its own server-side keys to encrypt data. Data is encrypted at a fine granularity with a Data Encryption Key (DEK), which itself is encrypted by a Key Encryption Key (KEK). KEKs are managed in a central key management service, and KEKs are shared with other Google services like Gmail.

To decrypt a data chunk, the storage service calls Google’s internal Key Management Service to retrieve the unwrapped data encryption key (DEK) for that data chunk:

image

Note that you can also choose Google Cloud KMS to manage your Key Encryption Keys.

Though we often refer to just a single key, we really mean that data is protected using a key set: one key active for encryption and a set of historical keys for decryption, the number of which is determined by the key rotation schedule.

Alternatively, you can provide your own encryption keys for use in Cloud Storage and for Compute Engine persistent disks, but unless you already have an on-premises key management service, we strongly recommend you let Google manage and rotate your storage data’s keys, which Google rotates every 90 days.

Cloud KMS is a GCP service that allows you to keep encryption keys centrally in the cloud, for direct use by cloud services. There are not currently storage-layer integrations for encrypting your data stored in other GCP services.

We recommend you set up a separate, centralized project to run Cloud KMS for all your projects.

Service Accounts

When you create a service account, a public/private key pair is automatically generated that is specific to that account. The public key is maintained by Google, but the private key is managed by you. This private key is needed to authenticate the service account when it runs tasks on GCP.

Network security

All compute instances are created as members of a Cloud Virtual Network. By default all networks are auto-subnet networks in which regional subnetworks are automatically created for you. Each network is constrained to a single project; multiple projects cannot exist on the same network. Only users with the specific roles of Project Owners, Organization Admins, or Compute Network Admins can change network properties.

Networks and subnetworks

You can isolate resources on separate networks to add an extra level of security. For example, a sequence of shots with highly confidential content can be rendered only within a separate network, isolated from the rest of the project data. Individual projects can be an even more effective way of separating data.

When you create a new project, because of the auto-subnets feature, multiple subnetworks are created, one per Compute Engine region. When you start a new instance in a specific region, it is placed in that region’s subnetwork, and assigned an internal IP within that subnetwork.

Firewall rules

Each network has a firewall that blocks all traffic to instances. To allow incoming traffic, you must create "allow" firewall rules.

The network labeled default in each project has default firewall rules, as shown below. No manually created network of any type has firewall rules. For all networks except the default network, you must create any firewall rules you need.

Not all of these default rules are necessary for a rendering pipeline:

Rule

Note

Recommendation

default-allow-internal

Necessary to permit communication between instances. If your queue manager is on-premises, instances probably don’t need to communicate with each other.

Delete this rule if your instances do not need to communicate with other instances.

default-allow-ssh

Used to allow access through SSH over port 22.

Delete this rule and create a similar one that only allows SSH across a VPN or a known IP.

default-allow-rdp

Only necessary if you want to access instances over Remote Desktop Protocol (RDP) via port 3389. Most of the time, SSH access is sufficient, so this rule can be deleted.

Delete this rule unless you are using machines running Windows.

default-allow-icmp

Permits communication of error or operational information across the network. This rule allows access from any IP.

Delete this rule and create a similar one that only allows ICMP from known IP addresses.

Firewall rules, by default, apply to the entire network. If you want two subnetworks to exchange traffic, you must configure allow permissions in both directions. Firewall rules are only allow rules. You cannot create deny rules.

You might want to incorporate instance tags into your pipeline to permit access to specific instance types with a firewall rule. For example, you could tag all render instances to allow SSH access for troubleshooting by certain roles. Any instance absent this tag would automatically restrict SSH access, such as to your license server.

If neither sourceRanges nor sourceTags are specified when creating a firewall rule, the default sourceRange will be 0.0.0.0/0, so the rule will apply to all incoming traffic inside and outside the network.

If no port is specified when creating a TCP or UDP firewall rule, connections will be allowed from all ports.

Network routes

All networks have automatically created routes to the public Internet and to the IP ranges in the network. Outbound traffic is not blocked by default. Only instances with an external IP address and a default internet gateway route can send packets outside of the network.

Google Cloud APIs (for example, gcloud and gsutil) can be accessed only through public IPs, so you must retain the Default internet gateway route under Networking > Routes.

Disabling external IP addresses

For security purposes, we recommend that your instances not have an external IP address. By default, an external IP address is assigned to all instances on launch. To prevent this, you can launch your instances with the --no-address flag.

In order for your render queue manager to control your instances without an external IP address, you must implement a Cloud VPN. The VPN Gateway is the only resource in your network with an external IP address, unless you add a Cloud Router, which uses Border Gateway Protocol (BGP) to broadcast private IP ranges between your on-premises network and your Cloud Platform networks.

Disk images

In a VFX pipeline, we recommend using a separate project at the organization level for image management. This approach prevents modification to facility-wide default image templates, which may be in active use by all projects. This approach also helps organize boot images into a central location, accessible by all other projects, given appropriate role assignment.

You can use IAM roles to share images across projects. See Sharing Images Across Projects for more information.

Public images

Compute Engine offers many preconfigured public images that have compatible Linux and Windows operating systems. Each OS image is configured to include certain packages, such as the Google Cloud SDK, or have services enabled, such as SSH.

These images also include a collection of packages that set up and manage user accounts, as well as enable SSH key-based authentication.

Custom images

You can create your own custom disk image based on an existing image. We strongly recommend that your images comply with these security best practices.

We recommend you install the Linux Guest Environment for Google Compute Engine to access the functionality available by default in public images. Installing the guest environment allows you to perform tasks with the same security controls on your custom image as you can on public images, such as metadata access, system configuration and optimizing the OS for use on GCP.

Connectivity

There are a number of ways to connect to Google from your facility. In all cases, you must implement a Virtual Private Network (VPN). Some methods require additional configuration, as described below, to help ensure the secure transmission of your data.

Some of the security methods applied to your data are:

  • Encrypting data links to Google using TLS with a 2048-bit certificate generated by Google’s Certificate Authority.
  • Encrypting data as it moves between our data centers on our private network.
  • Upgrading all RSA certificates to 2048-bit keys, making our encryption in transit for GCP and all other Google services even stronger.
  • Using Perfect Forward Secrecy (PFS), which helps minimize the impact of a compromised key, or a cryptographic breakthrough.

Connecting over the Internet

You can connect to Google’s network and take advantage of our end-to-end security model simply by accessing GCP services over the Internet. When traveling across VPN tunnels, your data is protected by authenticated and encrypted protocols.

Direct peering

Google hosts edge networking infrastructure at more than 100 point-of-presence facilities around the world to which GCP customers can connect directly. Any GCP customer who has a public ASN and a publicly routable IP prefix is welcome to peer with Google. This option utilizes the same interconnection model as the public Internet.

Cloud Interconnect

For customers who do not have public ASNs, or otherwise want to connect to Google using a service provider, the Google Cloud Interconnect service is an option. Cloud Interconnect is for customers who want enterprise-grade connectivity to Google's edge. Cloud Interconnect partner service providers help deliver enterprise-grade connectivity in one of two ways:

  • Over existing peering connections, which are jointly capacity-managed to ensure high performance and low latency.
  • Over dedicated interconnects that are intended to carry only GCP customer traffic (although Google announces routes for all services over these links).

Google Cloud VPN

On-premises rendering pipelines don’t always encrypt data in transit. For a hybrid cloud rendering pipeline, however, we recommend that all data in transit be encrypted.

Regardless of how you’re connected to Google, you must secure your connection with a VPN. Google Cloud VPN connects your peer VPN gateway to your GCP network through an IPsec VPN connection. Traffic traveling between the two networks is encrypted by one VPN gateway, then decrypted by the other VPN gateway. This helps protect your data as it travels over the Internet, and does not require data encryption to be implemented as part of your rendering pipeline.

If your facility has multiple locations or networks, you can keep your routes in sync across these locations and your Cloud VPN using a Cloud Router.

Customer-supplied VPN

You can set up your own VPN gateway within GCP, but it’s better to use Cloud VPN for more flexibility and better integration with GCP.

File systems

There are a number of file server options available to manage your data. Depending on your pipeline methodology, you may need to implement more than one.

Object-based

Cloud Storage is unified object storage that is appropriate for all data generated or consumed throughout the rendering pipeline. Because it’s part of GCP, Cloud Storage can take advantage of the security features of GCP, such as access control, Cloud IAM, and encryption.

When you create a bucket in a regional storage class, the data within the bucket is accessible to project members globally, yet the data is stored within a specific data center. From a performance point of view, it’s best to keep storage and compute in the same region for better throughput and lower latency.

Data on Cloud Storage is available globally, so you can share data with another facility in another part of the world without requiring replication. This might incur additional egress charges. Because this data is globally accessible, it is essential that you manage your VM scopes, users and access keys appropriately to avoid data escaping your rendering pipeline.

You might need to adapt your asset management pipeline in order to interface with the object-based architecture of Cloud Storage.

POSIX-compliant

Live production data is often stored on a POSIX-compliant file server, which can be well-suited for rendering pipelines that require access to file metadata such as modification times or rely on file paths for scene assets.

Depending on your facility’s needs and workload, you have a few choices when implementing an NFS file system.

Single-node file server

A POSIX-compliant NFS server is available as a click-to-deploy solution. You can run multiple single node file servers and mount them on your instances. This means you can isolate storage for each portion of your pipeline, restricting access at the operating system user and group levels in the same manner as on-premises filesystems.

You can also help secure data on single node file servers by mounting them as read-only on your render instances. Software, pipeline tools, and asset libraries should never be modified from a render instance, so mounting as read-only is an easy way to enforce this restriction.

To help secure your project further, you could also deploy one Single Node Filer per network, as instances can mount file servers only on the same network.

You can also create snapshots of your software or pipeline disk for quick rollback to previous versions with minimal impact on production.

Other file systems

Other third-party file systems are available for use with GCP, such as clustered and caching file systems. Consult the individual vendor’s compliance documentation for security on third-party caching filesystems.

Storage security

By default, Cloud Storage manages server-side encryption keys on your behalf using the same hardened key management systems that we use for our own encrypted data, including strict key-access controls and auditing. Cloud Storage encrypts user content at rest, and each encryption key is itself encrypted with a regularly rotated set of master keys.

All storage classes support the same OAuth and granular access controls to secure your data.

We recommend using Cloud IAM to restrict who has access to data within Cloud Storage buckets or a project. You can also leverage Access Control Lists if you need to manage only a small number of objects.

Transfer options

The security of data in transit refers to the safety of your data as it passes back and forth from your on-premises storage to the cloud. There are numerous pipeline methodologies that help manage the movement of data between on-premises and cloud, the design and implementation of which is outside the scope of this document. All transfer methods outlined below (except for third-party transfer methods) run within Google’s full security suite for authentication and authorization.

Command line

For transferring data to or from Cloud Storage, we recommend using the gsutil command to copy, move or sync data that is less than 10 TB in size. The gsutil command uses the same security features and authentication as GCP, respects Cloud IAM roles, and performs all operations using transport-layer encryption (HTTPS). gsutil also supports parallel uploads.

To transfer to or from POSIX-compliant file systems such as single-node file servers and Persistent Disk, we recommend using scp or rsync across a VPN connection.

UDP

If you choose to use a third party, UDP-based Data Transfer Protocol for uploading data directly to a Cloud Storage bucket, such as Aspera, Tervela Cloud FastPath, BitSpeed, or FDT, refer to the third party’s documentation to learn about their security model and best practices. These third-party services are not managed by Google.

Logging with Stackdriver

Google Stackdriver allows you to monitor and log a variety of activities within your project or organization. Stackdriver was originally written for web applications and services but can be used as a logging server for a rendering pipeline, providing a collection point for the massive amount of data generated by command-line renders.

Stackdriver APIs are not enabled by default on new GCP projects. There are currently four APIs: Debugging, Logging, Monitoring and Trace. Not all of them will apply to your workflow, but we recommend at least enabling Stackdriver Logging, so that Stackdriver can act as a logging server for external applications.

Audit logging helps you track admin activity by using the console or API that modify the configuration or metadata of a service or project. You cannot modify or delete audit logs.

All logs are kept for a specified length of time and are then deleted. The Stackdriver Logging Quota Policy explains how long log entries are retained. To retain logs beyond their retention period, you can export them to a Cloud Storage bucket, a Google BigQuery dataset, a Google Cloud Pub/Sub topic, or any combination of the three.

Logs are collected from individual instances through the Logging Agent, which is not installed by default. Installation instructions are here.

Other considerations

This section discusses topics that are outside of the Google product line, but are typically part of a hybrid rendering pipeline.

Queue management

Many studios use a queue manager to control the deployment, tracking, and tasks deployed to an on-premises render farm. You can sometimes use the same queue manager to deploy jobs to GCP. The specific approach might differ depending on the software involved.

Some queue managers provide software plugins to allow both servers and clients to connect to GCP. Consult the third-party documentation to review their security practices.

Instructions sent to GCP by the queue manager should be issued using the gcloud command. If you need to send commands using ssh, you must generate an SSH key for communication. You might want to consider running your queue management server on GCP, rather than on-premises, to avoid this.

Automating instance creation and termination

In some cases, you will want to automate the creation of instances when a job starts, as well as the termination of the instance when the job finishes successfully. You should avoid keeping instances running when not running a job for cost and security reasons.

Custom software

Rendering pipelines commonly include both third-party and custom software. Custom software can include anything from simple scripts to complex, compiled binaries with multiple dependencies.

To manipulate GCP instances from within scripts or programs, use the available client libraries. Each version provides methods for OAuth 2.0 authorization.

Licensing

On-premises license server

Using your own on-premises license server can help provide a more secure environment if you’re running across a VPN. The level of security is still subject to the limitations of the licensing technology in use.

Cloud license server

If you run your own license server on GCP, we recommend running it on a separate network to allow for additional control and monitoring.

What’s next

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...