Storage & Data Transfer

Uploading images directly to Cloud Storage using Signed URL

Cloud Storage

As cloud grows in popularity, it has become common to deploy applications in the cloud and provide them to end users. At the same time, the trend of using serverless architecture means that an unspecified number of end users can seamlessly access resources in the cloud. If you’re offering services to users and serving them content, you’ll want to use a cloud architecture that can solve traditional problems intelligently, and reduce the cost of developing and hosting servers by taking advantage of the cloud.

In this blog post, you’ll see how to implement a service hosted on Google Cloud Platform (GCP) that allows users to upload images into Cloud Storage using Signed URL, then serve that static content using managed services. This is particularly helpful if you’re running a social media company with constant image uploads, or hosting and managing consumer-generated media.

Without using the cloud, it can be difficult to develop and manage a server that lets users upload image files, especially at high scale. You have to queue requests to the process responsible for uploading the files to control the flow rate, and you have to prevent the system from going down due to request overload. You also need to set appropriate resource limits for finite resources (such as RAM) of each server that’s involved.

Furthermore, to allow uploading very large files to the server, it can be very costly to develop and host the server. And if the upload server is to be used by millions of users, it will require a lot of work to ensure its scalability and user experience.

Let’s get started building an image upload feature to use with Cloud Storage. You can access the full set of source files that we’ve used on Github, here.

Getting started building an image upload feature
Before diving into the GCP components needed to implement this service, let’s define our requirements:

  • Use managed services as much as possible
  • Enable only authenticated users to upload files
  • Validate/filter the content uploaded by users

In order to satisfy the above requirements, we devised the following architecture with GCP services:

architecture with GCP services.png

The key characteristic of this architecture is that it uses a serverless platform from file upload to delivery. This is the general workflow you’ll be setting up. Let’s detail each step shown in the diagram.

1. Once App Engine receives a request from the user, App Engine generates a Signed URL that allows PUT requests to be executed only for a specific Cloud Storage bucket and object for authenticated users, using application domain logic.

2. Next, the user can upload a file for a specific bucket and object accordingly.

3. Once the user uploads the file to Cloud Storage, it triggers Cloud Functions as finalize event. Then Cloud Functions validates the uploaded file.

4. After confirming that the file is in image format and appropriate size at step 3, annotate the image by posting to the Cloud Vision API to filter inappropriate content.

5. Once the validations in steps 3 and 4 are complete, copy the image file from the Uploadable Bucket to the Distribution Bucket.

6. The copied image file is now available to the public.

Next, we will devise an actual implementation for the above steps.

Generate Signed URL on App Engine standard runtime
Cloud Storage provides the Signed URL feature to let individual end users perform specific actions. Signed URL makes it possible to generate temporary credentials valid only for a specific end user to securely upload a file. The Google Cloud official client library makes it easy to generate a Signed URL. Using this feature, you’ll prepare an API server to generate a Signed URL dynamically for specific end users. For our purposes, we’re implementing the API using App Engine Standard.

Here’s how to generate a Signed URL:

1. Create a new service account or prepare the service account for signing bytes in App Engine.

2. Create a string necessary for the signature to generate a Signed URL (note that Content_MD5 and Canonicalized_Extension_Headers are omitted).
    a. Choose PUT for Cloud Storage API that accepts file upload.
    b. The value of Content-Type depends on the MIME type of the file being uploaded. This is determined by the API request from the end user.
    c. Set Unix epoch time for expiration. After accepting the API request, set the shortest possible value of the future. We will set 15 minutes here.
    d. Specify a bucket and an object as Canonicalized_Resource. The object key is dynamically generated so it doesn’t duplicate an existing one.

3. Sign the string generated in step 2 with the secret key of the service account prepared in step 1.

Check out the official documentation for more details on procedures. If you are using Golang to implement all of the above procedures, it will look like this:

  package main

import (
        "context"
        "encoding/base64"
        "fmt"
        "log"
        "net/http"
        "os"
        "time"

        "cloud.google.com/go/storage"
        "github.com/google/uuid"
        "golang.org/x/oauth2/google"
        iam "google.golang.org/api/iam/v1"
)

var (
        // iamService is a client for calling the signBlob API.
        iamService *iam.Service

        // serviceAccountName represents Service Account Name.
        // See more details: https://cloud.google.com/iam/docs/service-accounts
        serviceAccountName string

        // serviceAccountID follows the below format.
        // "projects/%s/serviceAccounts/%s"
        serviceAccountID string

        // uploadableBucket is the destination bucket.
        // All users will upload files directly to this bucket by using generated Signed URL.
        uploadableBucket string
)

func signHandler(w http.ResponseWriter, r *http.Request) {
        // Accepts only POST method.
        // Otherwise, this handler returns 405.
        if r.Method != "POST" {
                w.Header().Set("Allow", "POST")
                http.Error(w, "Only POST is supported", http.StatusMethodNotAllowed)
                return
        }

        ct := r.FormValue("content_type")
        if ct == "" {
                http.Error(w, "content_type must be set", http.StatusBadRequest)
                return
        }

        // Generates an object key for use in new Cloud Storage Object.
        // It's not duplicate with any object keys because of UUID.
        key := uuid.New().String()
        if ext := r.FormValue("ext"); ext != "" {
                key += fmt.Sprintf(".%s", ext)
        }

        // Generates a signed URL for use in the PUT request to GCS.
        // Generated URL should be expired after 15 mins.
        url, err := storage.SignedURL(uploadableBucket, key, &storage.SignedURLOptions{
                GoogleAccessID: serviceAccountName,
                Method:         "PUT",
                Expires:        time.Now().Add(15 * time.Minute),
                ContentType:    ct,
                // To avoid management for private key, use SignBytes instead of PrivateKey.
                // In this example, we are using the `iam.serviceAccounts.signBlob` API for signing bytes.
                // If you hope to avoid API call for signing bytes every time,
                // you can use self hosted private key and pass it in Privatekey.
                SignBytes: func(b []byte) ([]byte, error) {
                        resp, err := iamService.Projects.ServiceAccounts.SignBlob(
                                serviceAccountID,
                                &iam.SignBlobRequest{BytesToSign: base64.StdEncoding.EncodeToString(b)},
                        ).Context(r.Context()).Do()
                        if err != nil {
                                return nil, err
                        }
                        return base64.StdEncoding.DecodeString(resp.Signature)
                },
        })
        if err != nil {
                log.Printf("sign: failed to sign, err = %v\n", err)
                http.Error(w, "failed to sign by internal server error", http.StatusInternalServerError)
                return
        }
        w.WriteHeader(http.StatusOK)
        fmt.Fprintln(w, url)
}

func main() {
        cred, err := google.DefaultClient(context.Background(), iam.CloudPlatformScope)
        if err != nil {
                log.Fatal(err)
        }
        iamService, err = iam.New(cred)
        if err != nil {
                log.Fatal(err)
        }

        uploadableBucket = os.Getenv("UPLOADABLE_BUCKET")
        serviceAccountName = os.Getenv("SERVICE_ACCOUNT")
        serviceAccountID = fmt.Sprintf(
                "projects/%s/serviceAccounts/%s",
                os.Getenv("GOOGLE_CLOUD_PROJECT"),
                serviceAccountName,
        )

        http.HandleFunc("/sign", signHandler)
        log.Fatal(http.ListenAndServe(fmt.Sprintf(":%s", os.Getenv("PORT")), nil))
}

As mentioned in the code, there are two ways to sign the generated bytes to create the Signed URL.

One way is to sign it yourself using the private key associated with the service account. This is suitable for use in environments such as Google Compute Engine and Google Kubernetes Engine (GKE). If you choose this option, you have to manage the private key yourself.

The second way is to use the serviceAccounts.signBlob API, which is one of the Cloud Identity and Access Management (IAM) Service Account APIs. This lets you sign the bytes without managing the private key on the application runtime. Here we’re using the signBlob to avoid managing the private key.

In addition, within Cloud Storage, you need to grant the following permissions to generate a Signed URL.

  • storage.buckets.get
  • storage.objects.create
  • storage.objects.delete

If you would like to use the signBlob API, you also have to grant your service account the Service Account Token Creator role.

Uploading files using Signed URL
Now, a user can upload files directly to Cloud Storage using Signed URLs dispatched in the above way. Here, we’ll use PUT Object, one of the Cloud Storage XML APIs, for the Signed URL that App Engine has generated.

You can run this from a mobile application or front-end code of a web application. Here’s the sample file upload code:

  package main

import (
        "bytes"
        "fmt"
        "io/ioutil"
        "log"
        "net/http"
        "net/url"
        "strings"
)

const signerUrl = "<APPENGINE_URL>"

func getSignedURL(target string, values url.Values) (string, error) {
        resp, err := http.PostForm(target, values)
        if err != nil {
                return "", err
        }
        defer resp.Body.Close()
        b, err := ioutil.ReadAll(resp.Body)
        if err != nil {
                return "", err
        }
        return strings.TrimSpace(string(b)), nil
}

func main() {
        // Get signed url from the API server hosted on App Engine.
        u, err := getSignedURL(signerUrl, url.Values{"content_type": {"image/png"}})
        if err != nil {
                log.Fatal(err)
        }
        fmt.Printf("Signed URL here: %q\n", u)

        b, err := ioutil.ReadFile("/path/to/sample.png")
        if err != nil {
                log.Fatal(err)
        }
        // Generates *http.Request to request with PUT method to the Signed URL.
        req, err := http.NewRequest("PUT", u, bytes.NewReader(b))
        if err != nil {
                log.Fatal(err)
        }
        req.Header.Add("Content-Type", "image/png")
        client := new(http.Client)
        resp, err := client.Do(req)
        if err != nil {
                log.Fatal(err)
        }
        fmt.Println(resp)
}

Using Bucket Lock with Signed URL
As you can see from the architecture diagram above, we’ve set up two Cloud Storage buckets: the upload destination (Uploadable Bucket) and the delivery source (Distribution Bucket).

Although all the files will be copied to the Distribution Bucket after validation, these files will still reside in the Uploadable Bucket as well. However, these files will no longer be referred by any processing module. So it’s meaningless to keep the files copied to the Distribution Bucket still lying around in the Uploadable Bucket.

In addition, end users can upload files any number of times in the Uploadable Bucket until the Signed URL is expired.

To solve these two problems, use Object Lifecycle Management and Retention Policy.

First, define the lifecycle to make sure there are no unnecessary objects left in the Uploadable Bucket. You only need to define the lifecycle conditions for the bucket and the behaviors that align with each condition. Here, let’s define the lifecycle of one day before deleting the object, which would look like this:

  {
  "rule": [
    {
      "action": {"type": "Delete"},
      "condition": {"age": 1}
    }
  ]
}

Next, let’s make sure users can’t upload the file over and over again within the prescribed period of time. You can do this by defining a retention policy.

A retention policy lets you specify the retention period for the bucket, and makes it impossible to overwrite or delete the object during that period. Use the same expiration date in the Signed URL and the retention policy period. You need to avoid conflict with the lifecycle setting, since the object will be deleted after one day. (Note that retention policies cannot be enabled for buckets with versioning enabled.)

In order to create a bucket with these settings, execute the following command:

  REGION="REGION"
PROJECT_ID="PROJECT_ID"
UPLOADABLE_BUCKET="UPLOADABLE_BUCKET"
DISTRIBUTION_BUCKET="DISTRIBUTION_BUCKET"
LIFECYCLE_POLICY_FILE="/path/to/lifecycle.json"

# Creates the uploadable bucket
gsutil mb -p $PROJECT_ID -l $REGION --retention 900s gs://$UPLOADABLE_BUCKET
# Creates the bucket for distribution
gsutil mb -p $PROJECT_ID -l $REGION gs://$DISTRIBUTION_BUCKET
# Set lifecyle for the uploadable bucket
gsutil lifecycle set $LIFECYCLE_POLICY_FILE gs://$UPLOADABLE_BUCKET

Validate and copy the uploaded file
You’ve so far seen how to generate a Signed URL to upload files directly to Cloud Storage. However, before making this available to users, more validation is necessary. You’ll want to verify the validity of the file uploaded by the authenticated user, and Signed URL and Cloud Storage alone can’t validate the content.

To do this, we’ll implement the validation/copy process using Cloud Functions. To run Cloud Functions triggered by a specific event on GCP, use Background Functions, specifically to trigger the event of Cloud Storage. We use google.storage.object.finalize as the event of object creation.

Next, how should we validate the uploaded file? Here are the requirements in order.

1. Make sure that an object with the same object key as the uploaded file does not exist in Distribution Bucket.
    a. If the object already exists, stop further processing.
    b. This is a conditional branch that may occur because the execution of Cloud Functions guarantees at-least-once execution.
2. Get Content-Type and the size from the metadata of the target object in the Uploadable Bucket.
    a. Check that the size does not exceed the Cloud Storage size limit. If so, stop further processing.
    b. The size limit should be determined based on the specification of your runtime (in this case, the Cloud Function is the runtime for validation).
3. Read the target object in the Uploadable Bucket, and validate based on the Content-Type acquired in step 2.
    a. There’s no guarantee that the Content-Type specified at the time of generating the Signed URL matches the content of the actually uploaded object, so verify that next.
4. Post the target object in the Uploadable Bucket to the Vision API, and verify whether it contains violent content by using Safe Search Annotation.
    a. If so, it will filter what may be violent, adult, medical and racy (likelihood over POSSIBLE).
5. If the object passed validation, copy the object from the Uploadable Bucket to the Distribution Bucket.

To create an implementation that meets the above requirements, use the Cloud Functions Go 1.11 runtime. The source code is as follows:

  package function

import (
        "context"
        "errors"
        "fmt"
        "image"
        "image/gif"
        "image/jpeg"
        "image/png"
        "log"

        "cloud.google.com/go/storage"
        vision "cloud.google.com/go/vision/apiv1"
        "golang.org/x/xerrors"
        pb "google.golang.org/genproto/googleapis/cloud/vision/v1"
)

type GCSEvent struct {
        Bucket string `json:"bucket"`
        Name   string `json:"name"`
}

var retryableError = xerrors.New("upload: retryable error")

func validate(ctx context.Context, obj *storage.ObjectHandle) error {
        attrs, err := obj.Attrs(ctx)
        if err != nil {
                return xerrors.Errorf("upload: failed to get object attributes %q : %w",
                        obj.ObjectName(), retryableError)
        }
        // You can enlarge maximum size up to 20MB by modifying this line. 
        if attrs.Size >= 1024*100 {
                return fmt.Errorf("upload: image file is too large, got = %d", attrs.Size)
        }
        // Validates obj and returns true if it conforms supported image formats.
        if err := validateMIMEType(ctx, attrs, obj); err != nil {
                return err
        }
        // Validates obj by calling Vision API.
        return validateByVisionAPI(ctx, obj)
}

func validateMIMEType(ctx context.Context, attrs *storage.ObjectAttrs, obj *storage.ObjectHandle) error {
        r, err := obj.NewReader(ctx)
        if err != nil {
                return xerrors.Errorf("upload: failed to open new file %q : %w",
                        obj.ObjectName(), retryableError)
        }
        defer r.Close()
        if _, err := func(ct string) (image.Image, error) {
                switch ct {
                case "image/png":
                        return png.Decode(r)
                case "image/jpeg", "image/jpg":
                        return jpeg.Decode(r)
                case "image/gif":
                        return gif.Decode(r)
                default:
                        return nil, fmt.Errorf("upload: unsupported MIME type, got = %q", ct)
                }
        }(attrs.ContentType); err != nil {
                return err
        }
        return nil
}

// validateByVisionAPI uses Safe Search Detection provided by Cloud Vision API.
// See more details: https://cloud.google.com/vision/docs/detecting-safe-search
func validateByVisionAPI(ctx context.Context, obj *storage.ObjectHandle) error {
        client, err := vision.NewImageAnnotatorClient(ctx)
        if err != nil {
                return xerrors.Errorf(
                        "upload: failed to create a ImageAnnotator client, error = %v : %w",
                        err,
                        retryableError,
                )
        }
        ssa, err := client.DetectSafeSearch(
                ctx,
                vision.NewImageFromURI(fmt.Sprintf("gs://%s/%s", obj.BucketName(), obj.ObjectName())),
                nil,
        )
        if err != nil {
                return xerrors.Errorf(
                        "upload: failed to detect safe search, error = %v : %w",
                        err,
                        retryableError,
                )
        }
        // Returns an unretryable error if there is any possibility of inappropriate image.
        // Likelihood has been defined in the following:
        // https://github.com/google/go-genproto/blob/5fe7a883aa19554f42890211544aa549836af7b7/googleapis/cloud/vision/v1/image_annotator.pb.go#L37-L50
        if ssa.Adult >= pb.Likelihood_POSSIBLE ||
                ssa.Medical >= pb.Likelihood_POSSIBLE ||
                ssa.Violence >= pb.Likelihood_POSSIBLE ||
                ssa.Racy >= pb.Likelihood_POSSIBLE {
                return errors.New("upload: exceeds the prescribed likelihood")
        }
        return nil
}

// distributionBucket is the distribution bucket.
// It's used for distributing all of passed files.
// TODO: This value MUST be updated before deploying this function.
const distributionBucket = "DISTRIBUTION_BUCKET"

// UplaodImage validates the object and copy it into the distribution bucket.
func UploadImage(ctx context.Context, e GCSEvent) error {
        client, err := storage.NewClient(ctx)
        if err != nil {
                return fmt.Errorf("upload: failed to construct a client, error = %v", err)
        }
        defer client.Close()

        dst := client.Bucket(distributionBucket).Object(e.Name)
        _, err = dst.Attrs(ctx)
        // Avoid proceeding if the object has been copied to destination.
        if err == nil {
                log.Printf("upload: %s has already been copied to destination\n", e.Name)
                return nil
        }
        // Return retryable error as there is a possibility that object does not temporarily exist.
        if err != storage.ErrObjectNotExist {
                return err
        }
        src := client.Bucket(e.Bucket).Object(e.Name)
        if err := validate(ctx, src); err != nil {
                if xerrors.Is(err, retryableError) {
                        return err
                }
                log.Println(err)
                return nil
        }
        // Returns an error if the copy operation failed.
        // Will retry the same processing later.
        if _, err := dst.CopierFrom(src).Run(ctx); err != nil {
                return err
        }

        return nil
}

To deploy UploadImage in the above source code to Cloud Functions, execute the following command:

  gcloud functions deploy UploadImage --runtime go111 --trigger-resource [UPLOADABLE_BUCKET] --trigger-event google.storage.object.finalize --retry

Don’t forget to turn on --retry. In case there’s a temporary error case, the background function should be made with retries taken into account. In addition, since Cloud Functions guarantees at-least-once execution, it is not idempotent in that the object in Distribution Bucket may be updated more than once. However, there are no side effects.

Upload a file!
Let’s actually upload a file using the system we’ve built, which uses managed services from uploading images to delivery. Testing is easy; just run the source code used to post the image in Signed URL earlier.

After the background function finishes running, make sure that the object in the Uploadable Bucket is copied to the Distribution Bucket. You can also introduce push notifications to let the user know when the distribution starts, since these processes are performed asynchronously. Or, you can update the database referred by the original application if necessary. You can also do such processes on Cloud Functions, or notify your application that it is loosely coupled.

You can also customize this system to create another file upload feature that fits your purpose. You may decide to use Signed Policy Document instead of Signed URL, such as if you want to set more detailed conditions or use a POST Object API. Furthermore, if you’re using Firebase, our mobile and web development tool, you can use Cloud Storage for Firebase instead of this feature because similar functionality can be easily implemented on Firebase.

If you’d like to connect with Google Cloud experts, get in touch and learn more here.