Using Cloud Storage FUSE with Cloud Run tutorial

This tutorial shows Operators how to mount Cloud Storage as a network file system onto a Cloud Run service. The tutorial uses the open source FUSE adapter to share data between multiple containers and services. This tutorial uses the Cloud Run second generation execution environment.

The second generation execution environment allows network file systems to be mounted into a directory in container. Mounting a file system allows for sharing resources between a host system and container instances and for resources to persist after a container instance is garbage collected.

Using a network file system with Cloud Run requires advanced Docker knowledge because your container must run multiple processes, including the file system mount and application process. This tutorial explains the necessary concepts alongside a working example; however, as you adapt this tutorial to your own application, make sure you understand the implications of any changes you might make.

Design Overview

filesystem-architecture

The diagram shows the Cloud Run service connecting to the Cloud Storage bucket via the gcsfuse FUSE adapter. The Cloud Run service and Cloud Storage bucket are located within same region to remove networking cost and for best performance.

Limitations

  • This tutorial does not describe how to choose a file system nor does it cover production-readiness. Review the Key differences from a POSIX file system and other semantics of Cloud Storage FUSE.

  • This tutorial does not show how to work with a file system or discuss file access patterns.

Objectives

  • Create a Cloud Storage bucket to serve as a file share.

  • Build a Dockerfile with system packages and init-process to manage the mount and application processes.

  • Deploy to Cloud Run and verify access to the file system in the service.

Costs

This tutorial uses billable components of Google Cloud, including:

Permissions

The permissions needed for this tutorial can be fulfilled by the Owner or Editor roles. The minimum set of roles are:

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  5. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  6. Enable the Cloud Run, Cloud Storage, Artifact Registry, and Cloud Build APIs.

    Enable the APIs

  7. Install and initialize the Cloud SDK.

Setting up gcloud defaults

To configure gcloud with defaults for your Cloud Run service:

  1. Set your default project:

    gcloud config set project PROJECT_ID

    Replace PROJECT_ID with the name of the project you created for this tutorial.

  2. Configure gcloud for your chosen region:

    gcloud config set run/region REGION

    Replace REGION with the supported Cloud Run region of your choice.

Cloud Run locations

Cloud Run is regional, which means the infrastructure that runs your Cloud Run services is located in a specific region and is managed by Google to be redundantly available across all the zones within that region.

Meeting your latency, availability, or durability requirements are primary factors for selecting the region where your Cloud Run services are run. You can generally select the region nearest to your users but you should consider the location of the other Google Cloud products that are used by your Cloud Run service. Using Google Cloud products together across multiple locations can affect your service's latency as well as cost.

Cloud Run is available in the following regions:

Subject to Tier 1 pricing

  • asia-east1 (Taiwan)
  • asia-northeast1 (Tokyo)
  • asia-northeast2 (Osaka)
  • europe-north1 (Finland) leaf icon Low CO2
  • europe-west1 (Belgium) leaf icon Low CO2
  • europe-west4 (Netherlands)
  • us-central1 (Iowa) leaf icon Low CO2
  • us-east1 (South Carolina)
  • us-east4 (Northern Virginia)
  • us-west1 (Oregon) leaf icon Low CO2

Subject to Tier 2 pricing

  • asia-east2 (Hong Kong)
  • asia-northeast3 (Seoul, South Korea)
  • asia-southeast1 (Singapore)
  • asia-southeast2 (Jakarta)
  • asia-south1 (Mumbai, India)
  • asia-south2 (Delhi, India)
  • australia-southeast1 (Sydney)
  • australia-southeast2 (Melbourne)
  • europe-central2 (Warsaw, Poland)
  • europe-west2 (London, UK)
  • europe-west3 (Frankfurt, Germany)
  • europe-west6 (Zurich, Switzerland) leaf icon Low CO2
  • northamerica-northeast1 (Montreal) leaf icon Low CO2
  • northamerica-northeast2 (Toronto) leaf icon Low CO2
  • southamerica-east1 (Sao Paulo, Brazil) leaf icon Low CO2
  • southamerica-west1 (Santiago, Chile)
  • us-west2 (Los Angeles)
  • us-west3 (Salt Lake City)
  • us-west4 (Las Vegas)

If you already created a Cloud Run service, you can view the region in the Cloud Run dashboard in the Cloud Console.

Retrieving the code sample

To retrieve the code sample for use:

  1. Clone the sample app repository to your local machine:

    Python

    git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git

    Alternatively, you can download the sample as a zip file and extract it.

    Java

    git clone https://github.com/GoogleCloudPlatform/java-docs-samples.git

    Alternatively, you can download the sample as a zip file and extract it.

  2. Change to the directory that contains the Cloud Run sample code:

    Python

    cd python-docs-samples/run/filesystem/

    Java

    cd java-docs-samples/run/filesystem/

Understanding the code

Normally, you should run a single process or application within a container. Running a single process per container reduces the complexity of managing the life cycle of multiple processes: managing restarts, terminating the container if any process failes, and PID 1 responsibilities such as signal forwarding and zombie child reaping. However, using network file systems in Cloud Run requires you to use multi-processes containers to run both the the file system mount process and application. This tutorial shows how to terminate the container on process failure and manage PID 1 responsibilities. The mount command has built-in functionality to handle retries.

You can use a process manager to run and manage multiple processes as the container's entrypoint. This tutorial uses tini, an init replacement that cleans up zombie processes and performs signal forwarding. Specifically, this init process allows the SIGTERM signal on shutdown to propagate to the application. The SIGTERM signal can be caught for graceful termination of the application. Learn more about the lifecycle of a container on Cloud Run.

Defining your environment configuration with the Dockerfile

This Cloud Run service requires one or more additional system packages not available by default. The RUN instruction will install tini as our init-process and gcsfuse, the FUSE adapter. Read more about working with system packages in your Cloud Run service in the Using system packages tutorial.

The next set of instructions create a working directory, copy source code, and install app dependencies.

The ENTRYPOINT specifies the init-process binary that prepends to the CMD instructions, in this case it's the startup script. This launches a single tini process and then proxies all received signals to a session rooted at that child process.

The CMD instruction sets the command to be executed when running the image, the startup script. It also provides default arguments for the ENTRYPOINT. Understand how CMD and ENTRYPOINT interact.

Python

# Use the official lightweight Python image.
# https://hub.docker.com/_/python
FROM python:3.9-buster

# Install system dependencies
RUN set -e; \
    apt-get update -y && apt-get install -y \
    tini \
    lsb-release; \
    gcsFuseRepo=gcsfuse-`lsb_release -c -s`; \
    echo "deb http://packages.cloud.google.com/apt $gcsFuseRepo main" | \
    tee /etc/apt/sources.list.d/gcsfuse.list; \
    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | \
    apt-key add -; \
    apt-get update; \
    apt-get install -y gcsfuse \
    && apt-get clean

# Set fallback mount directory
ENV MNT_DIR /mnt/gcs

# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./

# Install production dependencies.
RUN pip install -r requirements.txt

# Ensure the script is executable
RUN chmod +x /app/gcsfuse_run.sh

# Use tini to manage zombie processes and signal forwarding
# https://github.com/krallin/tini
ENTRYPOINT ["/usr/bin/tini", "--"] 

# Pass the startup script as arguments to Tini
CMD ["/app/gcsfuse_run.sh"]

Java

# Use the official maven/Java 11 image to create a build artifact.
# https://hub.docker.com/_/maven
FROM maven:3.8.4-jdk-11 as builder

# Copy local code to the container image.
WORKDIR /app
COPY pom.xml .
COPY src ./src

# Build a release artifact.
RUN mvn package -DskipTests

# Use AdoptOpenJDK for base image.
# https://docs.docker.com/develop/develop-images/multistage-build/#use-multi-stage-builds
FROM eclipse-temurin:11-jdk

# Install system dependencies
RUN set -e; \
    apt-get update -y && apt-get install -y \
    gnupg2 \
    tini \
    lsb-release; \
    gcsFuseRepo=gcsfuse-`lsb_release -c -s`; \
    echo "deb http://packages.cloud.google.com/apt $gcsFuseRepo main" | \
    tee /etc/apt/sources.list.d/gcsfuse.list; \
    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | \
    apt-key add -; \
    apt-get update; \
    apt-get install -y gcsfuse && apt-get clean

# Set fallback mount directory
ENV MNT_DIR /mnt/gcs

# Copy the jar to the production image from the builder stage.
COPY --from=builder /app/target/filesystem-*.jar /filesystem.jar

# Copy the statup script
COPY gcsfuse_run.sh ./gcsfuse_run.sh
RUN chmod +x ./gcsfuse_run.sh

# Use tini to manage zombie processes and signal forwarding
# https://github.com/krallin/tini
ENTRYPOINT ["/usr/bin/tini", "--"]

# Run the web service on container startup.
CMD ["/gcsfuse_run.sh"]

Defining your processes in the startup script

The startup script creates the mount point directory, where the Cloud Storage bucket will be made accessible. Next the script attaches the Cloud Storage bucket to the service's mount point using the gcsfuse command, then starts the application server. The gcsfuse command has built-in retry functionality; therefore further bash scripting is not necessary. Lastly, the wait command is used to listen for any background processes to exit then exits the script.

Python

set -eo pipefail

# Create mount directory for service
mkdir -p $MNT_DIR

echo "Mounting GCS Fuse."
gcsfuse --debug_gcs --debug_fuse $BUCKET $MNT_DIR 
echo "Mounting completed."

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.
exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app &

# Exit immediately when one of the background processes terminate.
wait -n

Java

set -eo pipefail

# Create mount directory for service
mkdir -p $MNT_DIR

echo "Mounting GCS Fuse."
gcsfuse --debug_gcs --debug_fuse $BUCKET $MNT_DIR
echo "Mounting completed."

# Start the application
java -jar filesystem.jar

# Exit immediately when one of the background processes terminate.
wait -n

Working with files

Python

See main.py for interacting with the filesystem.

Java

See FilesystemApplication.java for interacting with the filesystem.

Shipping the service

  1. Create a Cloud Storage bucket or reuse an existing bucket:

    gsutil mb gs://BUCKET_NAME -l REGION
    

    Replace BUCKET_NAME with the name of the Cloud Storage bucket, i.e. my-fuse-bucket. Cloud Storage bucket names must be globally unique and are subject to naming requirements.

    Set the -l to specify the location of your bucket. For example, us-central1. For the best performance and to avoid cross-regional networking charges, ensure that the Cloud Storage bucket is located in the same region as the Cloud Run services that need to access it.

  2. Create a service account to serve as the service identity:

    gcloud iam service-accounts create fs-identity
  3. Grant the service account access to the Cloud Storage bucket:

    gcloud projects add-iam-policy-binding PROJECT_ID \
         --member "serviceAccount:fs-identity@PROJECT_ID.iam.gserviceaccount.com" \
         --role "roles/storage.objectAdmin"
    
  4. To deploy from source, delete extra Dockerfile and rename tutorial Dockerfile:

    rm Dockerfile
    cp gcsfuse.Dockerfile Dockerfile
    
  5. Build and deploy the container image to Cloud Run:

    gcloud beta run deploy filesystem-app --source . \
        --execution-environment gen2 \
        --allow-unauthenticated \
        --service-account fs-identity \
        --update-env-vars BUCKET=BUCKET_NAME
    

    This command builds and deploys the Cloud Run service and specifies second generation execution environment. Deploying from source will build the image based on the Dockerfile and push the image to the Artifact Registry repo: cloud-run-source-deploy.

    Learn more about Deploying from source code.

Debugging

If the deployment does not succeed, check Cloud logging for further details.

If you would like all the logs from the mount process use the --foreground flag in combination with the mount command in the startup script, gcsfuse_run.sh:

  gcsfuse --foreground --debug_gcs --debug_fuse GCSFUSE_BUCKET MNT_DIRECTORY &
  

  • Add --debug_http for HTTP request/response debug output.
  • Add --debug_fuse to enable fuse-related debugging output.
  • Add --debug_gcs to print GCS request and timing information.

Find more tips on Troubleshooting guidance.

Trying it out

To try out the complete service:

  1. Navigate your browser to the URL provided by the deployment step above.
  2. You should see a newly created file in your Cloud Storage bucket.
  3. Click on the file to see the contents.

If you choose to continue developing these services, remember that they have restricted Identity and Access Management (IAM) access to the rest of Google Cloud and will need to be given additional IAM roles to access many other services.

Cost discussion

Cloud Storage pricing is heavily dependent on data storage, the amount of data stored by storage class and location of your buckets, and network usage, the amount of data read from or moved between your buckets. Learn more about Charges incurred with Cloud Storage FUSE.

Cloud Run is priced by resource usage, rounded to the nearest 100ms, for memory, CPU, number of requests, and networking. Therefore the cost will vary based on your service's settings, number of requests and execution time.

For example, 1 TiB of data stored in a Standard storage bucket hosted in Iowa (us-central1) costs $0.02 per month, which is approximately 1024 GiB * $0.02 = $20.48. This estimate is dependent on the Cloud Run service and Cloud Storage bucket being hosted in the same region to negate egress costs.

Visit individual pricing pages for the most up-to-date pricing or explore an estimate in the Google Cloud Pricing Calculator.

Clean up

If you created a new project for this tutorial, delete the project. If you used an existing project and wish to keep it without the changes added in this tutorial, delete resources created for the tutorial.

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Cloud Console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Deleting tutorial resources

  1. Delete the Cloud Run service you deployed in this tutorial:

    gcloud run services delete SERVICE-NAME

    Where SERVICE-NAME is your chosen service name.

    You can also delete Cloud Run services from the Google Cloud Console.

  2. Remove the gcloud default region configuration you added during tutorial setup:

     gcloud config unset run/region
    
  3. Remove the project configuration:

     gcloud config unset project
    
  4. Delete other Google Cloud resources created in this tutorial:

What's next