Use Storage Transfer Service to move large datasets from Cloud Storage to your Filestore file shares.
Storage Transfer Service helps you to quickly and securely transfer large datasets between object and file storage systems, whether your data is hosted in Cloud Storage, third-party cloud providers, or on-premises.
Storage Transfer Service supports accelerated transfers of large datasets, handling hundreds of TB of data or more. Move your large datasets to the cloud to take advantage of analytics and machine learning operations available from the underlying Compute Engine instances where your Filestore instances are mounted.
With Storage Transfer Service you can create Google-managed transfers or configure self-hosted transfers for full control over network routing and bandwidth usage.
Transfer data from a Cloud Storage bucket to a Filestore file share
Transferring data from Cloud Storage to a Filestore file share using Storage Transfer Service requires the following tasks:
- Set up your environment.
- Configure Filestore.
- Configure Storage Transfer Service.
- Create and initiate the transfer job.
The following sections walk you through each task.
Set up your environment
Select or create a project.
For the purposes of this guide, ensure your source and destination resources reside in the same project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
If you are testing out Filestore and don't plan to keep the resources that you create, we recommend that you create a project instead of selecting an existing project. Once you're done testing, you can delete the project, removing all resources associated with the project.
Enable billing.
Make sure that billing is enabled for your Google Cloud project. Learn how to confirm that billing is enabled for your project.
-
Filestore API
Resource Manager API
Pub/Sub API
Cloud Storage API
Storage Transfer API
Cloud Logging API
Compute Engine API
Service Usage API
Identity and Access Management API
Optional:
gcloud
, a major component of the Google Cloud SDK, is installed on every Compute Engine VM. If performing any of the following steps from your local command line, set up the Google Cloud SDK.Install and initialize the Google Cloud SDK.
If you installed Google Cloud SDK previously, make sure you have the latest available version by running:
gcloud components update
Create a service account. In the Grant this service account access to project section, assign the following roles:
Owner
Project IAM Admin
Role Administrator
Pub/Sub Editor
Cloud Filestore Editor
Storage Object Admin
Storage Transfer Admin
Storage Transfer Agent
Copy and save the name of the service account you created for a later step.
Create a service account key for the account you just created. For the purposes of this guide, create only one key. Download the key file and save for a later step.
Assign roles to a user account. In the IAM page, find your user account and assign it the following roles:
Owner
Project IAM Admin
Role Administrator
Storage Transfer Admin
Storage Admin
For more information see User permissions.
Configure Filestore
Create a Filestore instance. When creating the instance, apply the following specifications:
Ensure the Cloud Storage bucket, client VM, and Filestore instance all reside in the same region.
Select a regional or enterprise instance type.
Optional: For larger datasets, request a quota increase.
Copy the instance name and IP address and save for a later step.
Mount a Filestore instance on a client machine.
This guide describes a transfer that uses four Compute Engine VMs as NFS client machines. You'll create a single service account that operates on behalf of the four client machines. Each client machine will have three Storage Transfer Service agents installed.
Create a Compute Engine VM instance with access to other Google Cloud services.
Configure a VM with the following specifications:
When specifying a location, ensure the Google Cloud bucket, client VM, and Filestore instance all reside in the same region.
Each Storage Transfer Service agent needs 4 vCPU and 8 GB RAM. For best performance, run multiple agents per VM. For the purposes of this guide, provision an
e2-standard-32
Compute Engine virtual machine instance.In the Identity and API Access section, specify the following:
- In the Service accounts drop-down, select the service account you just created.
Once the Compute Engine VM instance is created, sign into the machine using SSH. From the Compute Engine VM instance page, locate the instance you created, and click SSH.
Use a text editor such as Vim to create a copy of the service account key file and temporarily save it locally to the VM. For example,
service-account-key.json
.gcloud
is already installed on the Compute Engine VM instance. From the SSH command line, enter the following command to authorize the service account to usegcloud
:gcloud auth activate-service-account ACCOUNT --key-file=KEY_FILE
where:
ACCOUNT is the email address for the service account you created. For example,
my-service-account@my-project.iam.gserviceaccount.com
.KEY_FILE is the relative local path to the key file you copied earlier. For example,
sa-key.json
.
Still from the SSH command line, install NFS:
sudo apt-get -y update && sudo apt-get install nfs-common
Make a local directory to map to the Filestore file share. When you repeat these steps for subsequent Compute Engine VM instances, use the same name and path:
sudo mkdir -p MY_DIRECTORY
where:
- MY_DIRECTORY is the name of the local POSIX directory for the
Compute Engine VM instance. For example,
/usr/local/my_dir
.
- MY_DIRECTORY is the name of the local POSIX directory for the
Compute Engine VM instance. For example,
Mount the file share associated with the Filestore instance by running the
mount
command. You can use any NFS mount options. For the best performance, see the NFS mount recommendations in Mounting a file share on a Compute Engine VM instance:sudo mount -o rw IP_ADDRESS:/FILE_SHARE MY_DIRECTORY
where:
IP_ADDRESS is the IP address for the Filestore instance. This can be found from the Filestore instances page.
FILE_SHARE is the name of the file share on the instance. For example,
my_fs_instance
.MY_DIRECTORY is the name of the directory you mapped to in the previous step. This is a directory on the Compute Engine VM instance where you want to mount the Filestore instance.
Confirm the mount point:
mount -l | grep nfs
This returns the following or similar:
10.66.55.194:/my_fs_instance on /home/usr/my_dir type nfs (rw,relatime,vers=3,rsize=262144,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.55.194,mountvers=3,mountport=2050,mountproto=udp,local_lock=none,addr=10.66.55.194)
Alternatively, you can also use the following command:
df -h --type=nfs
This returns the following or similar:
Filesystem Size Used Avail Use% Mounted on 10.66.55.194:/my_fs_instance 1.0T 0 1.0T 0% /home/usr/my_dir
Make note of the local POSIX directory path and save for a later step.
Repeat the previous steps to create three more Compute Engine VM instances and mount the same Filestore instance to each. Use the same service account to manage all four Compute Engine VMs. Temporarily save a local copy of the service account key to each VM.
Configure Storage Transfer Service
Authorize the service agent for all Storage Transfer Service features.
Enter the following command:
gcloud transfer authorize --add-missing --creds-file=KEY_FILE
where:
- KEY_FILE is the relative local path to the key file you
copied earlier. For example,
sa-key.json
.
Note the returned notification regarding the service agent and save the associated email address for the next step.
- KEY_FILE is the relative local path to the key file you
copied earlier. For example,
After a few minutes, you should see the service agent in the IAM page. Once propagated, verify the following roles are assigned:
Pub/Sub Editor
Storage Admin
Install transfer agents.
Each Storage Transfer Service agent requires 4 vCPU and 8 GB RAM.
We recommend installing multiple agents to maximize fault tolerance and to take advantage of the dynamic scaling offered by Storage Transfer Service. The following example shows how to install three agents on a client machine. From the SSH command line, run the following command:
gcloud transfer agents install --pool=MY_AGENT_POOL --count=3 \ --creds-file=MY_SERVICE_ACCOUNT_KEY_FILE
where:
MY_AGENT_POOL is the name of the agent pool you previously created. For example,
my-agent-pool
.MY_SERVICE_ACCOUNT_KEY_FILE is the relative path to the service account key. For example,
/relative/path/to/service-account-key.json
.
Repeat these steps for each client machine.
Create and initiate the transfer job
- Create a transfer job to move data from your Cloud Storage bucket to
your Filestore instance.
Reference the local POSIX directory you saved earlier to specify the
destination path. For example,
/home/usr/my_dir
.
Monitor transfer status
Console
Monitor the status of your transfer from the Transfer jobs page of the Google Cloud console.
Command line
You can monitor status using the command line:
gcloud transfer jobs monitor JOB_NAME
where:
- JOB_NAME is the name of your transfer job. For example,
transferJobs/OPI6300379522015192941
.
The response shows the following or similar:
Polling for latest operation name...done.
Operation name: my-sts-project_transferJobs/OPI6300379522015192941_0000000001660692377
Parent job: OPI6300379522015192941
Start time: 2022-08-16T23:26:17.600981Z
SUCCESS | 100% (731.9MiB of 731.9MiB) | Skipped: 129.8kiB | Errors: 0
End time: 2022-08-16T23:27:23.429472Z
For more information, see Monitor agent activity or File system transfer details.
What's next
- Improve performance across Google Cloud resources.
- Create a Compute Engine VM instance with access to other Google Cloud services.