Automating log uploads with gcloud transfer
Nicholas Hartunian
Software Engineer
Hi, we just released the gcloud transfer command-line tooI. This tutorial will show you how to use it for a common task: uploading logs to the cloud.
Setup
You’ll need a device running a Linux operating system with at least 8 GB of RAM to continue. If you don’t have one lying around, it’s easy to spin up a Compute Engine virtual machine.
Let’s create some logs to upload. In the real world, Google Cloud’s transfer service is a great tool if you have large amounts of data (terabytes+). But tutorials don’t usually ask people to create multiple harddrive’s worth of fake data. So let’s do this:
Perfect. That will fool them.
On to the gcloud CLI. If you haven’t already, install the gcloud CLI. You should be prompted to log into Google during the installation process.
You’re probably wondering how much this tutorial will cost to complete in your Google Cloud project. At the time of writing, transfer jobs cost “$0.0125 per GB transferred to the destination successfully.” Here’s the current price table.
Next, you’ll need a Google Cloud Storage bucket to upload to. Object storage also shouldn’t be very expensive, but please save resource names for cleanup at the end of the tutorial. Here’s the price table. You can create a bucket by running:
Using gcloud transfer
To begin, let’s grant ourselves the permissions necessary to use all gcloud transfer features:
Creating transfers from one cloud bucket to another is straightforward with gcloud transfer. Setting up your local file system to handle transfer jobs requires a little more work. Specifically, you need to install an “agent.” An agent is basically a docker container that runs a program dedicated to copying files.
Before installing any agents, you need an agent pool. When a transfer job assigns work to an agent pool, any agent in that pool might end up copying files. If not all the agents in an agent pool can access the files needed for a transfer, you may encounter errors.
Now, to install an agent on your system, run:
All right, now we can upload our fake logs! Storage Transfer Service works best with absolute paths, so use the “pwd” command to get the path to your current folder—you should be inside the “my-logs” folder from earlier.
We require a “posix://” scheme for uploading from a POSIX file system (Linux & Mac). I know it’s a bit odd, but it’s to leave space open if we support transfer jobs dedicated to other file system types in the future (e.g. “ntfs://”).
Great, the above should return your new transfer job’s metadata. To monitor the transfer, run the below with the value for the “name” key returned above:
Automation
Say we wanted to upload logs every midnight from 2022 to 2023. The ability to schedule regular transfers for large amounts of data differentiate gcloud transfer from tools like gcloud storage or gsutil. To do this, we just need to update the schedule properties of our job:
If you have another machine, and you do not care which one uploads logs, you could install an agent on that machine in the same pool as before.
More realistically, if you want each machine in your fleet to upload logs to a different cloud destination, we can write a script to run once on each device. Just make sure the agent pool and destination argument are different for each device, or more than one machine may upload to the same location.
You don’t have to go around running this script on multiple computers to complete the tutorial but for demonstrative purposes:
If you’re interested in more complex scripting, the “jobs create” and “jobs run” commands have a “--no-async” flag you can use to delay until a transfer completes.
Teardown
This is the part where we delete everything to save you monthly costs.
First, let’s delete the transfer job:
Next, follow the instructions provided by this command to delete any agents you installed:
Now, let’s delete the empty agent pool:
Lastly, let’s delete the Google Cloud Storage bucket and the fake logs on your device:
Conclusion
Superb—you learned how to build an automated log uploader!
If you’re comparing gcloud transfer to other tools like gsutil, I linked some helpful articles in the “Related” section. TLDR: gcloud transfer is for copying huge amounts of data (even petabytes!) and automating recurring copies. gsutil is better for less than a terabyte of data, and recurring copies have to be manually scripted (e.g. cron job calls gsutil).
If you’re copying files between clouds, we also support Amazon S3 and Azure Storage sources.
Congratulations on adding another tool to your Google toolkit!