Jump to Content
Developers & Practitioners

How to transfer your data to Google Cloud

April 28, 2021
https://storage.googleapis.com/gweb-cloudblog-publish/images/move_data.max-1100x1100.png
Priyanka Vergadia

Staff Developer Advocate, Google Cloud

So you’ve decided to migrate your business to the cloud—good call!

Now, comes the question of transferring the data. Here’s what you need to know about transferring your data to Google Cloud, and what tools are available.

Any number of factors can motivate your need to move data into Google Cloud, including data center migration, machine learning, content storage and delivery, and backup and archival requirements. When moving data between locations it's important to think about reliability, predictability, scalability, security, and manageability. Google Cloud provides four major transfer solutions that meet these requirements across a variety of use cases.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Data-Transfer-Service_v03-30-21.max-2000x2000.jpeg
(Click to enlarge)

Google Cloud Data Transfer Options

You can get your data into Google Cloud using any of four major approaches:

  1. Cloud Storage transfer tools—These tools help you upload data directly from your computer into Google Cloud Storage. You would typically use this option for small transfers up to a few TBs. These include the Google Cloud Console UI, the JSON API, and the GSUTIL command line interface.  GSUTIL is an open-source command-line utility for scripted transfers from your shell. It also enables you to manage GCS buckets.  It can operate in rsync mode for incremental copies and streaming mode for pushing script output - for large multi-threaded/multi-processing data moves.  Use it in place of the UNIX cp (copy) command, which is not multithreaded.
  2. Storage Transfer Service—This service enables you to quickly import online data into Cloud Storage from other clouds, from on-premises sources, or from one bucket to another within Google Cloud. You can set up recurring transfer jobs to save time and resources and it can scale to 10’s of Gbps. To automate creation and management of transfer jobs you can use the storage transfer API or client libraries in the language of your choice. As compared to GSUTIL, Storage Transfer Service is a managed solution which handles retries and provides detailed transfer logging. The data transfer is fast since the data moves over high bandwidth network pipes. The on-premise transfer service minimizes the transfer time by utilizing the maximum available bandwidth and by applying performance optimizations. 
  3. Transfer Appliance—This is a great option if you want to migrate a large dataset and don’t have lots of bandwidth to spare. Transfer Appliance enables seamless, secure, and speedy data transfer to Google Cloud. For example, a 1 PB data transfer can be completed in just over 40 days using the Transfer Appliance, as compared the three years it would take to complete an online data transfer over a typical network (100 Mbps). Transfer Appliance is a physical box that comes in two form factors: TA40 (40TB) and TA300 (300TB). The process is simple. First, you order the appliance through the Cloud Console.  Once it is shipped to you, you copy your data to the appliance (via a file copy over NFS), where the data is encrypted and secured. Finally, you ship the appliance back to Google for data transfer into your GCS bucket and the data is erased from the appliance. Transfer appliance is highly performant because it uses all solid state drives, minimal software and multiple network connectivity options.
  4. BigQuery Data Transfer Service—With this option your analytics team can lay the foundation for a BigQuery data warehouse without writing a single line of code. It automates data movement into BigQuery on a scheduled, managed basis. It supports several third-party sources along with transfers from Google SaaS apps, external cloud storage providers. and data warehouses such as Teradata and Amazon Redshift. Once that data is in. you can use it right inside BigQuery for analytics, machine learning, or just warehousing. 

Conclusion

Whatever your use case for data transfer may be, getting it done fast, reliably, securely, and consistently is important. And, no matter how much data you have to move, where it’s located, or how much bandwidth you have, there is an option that can work for you. For a more in-depth look check out the documentation.
Video Thumbnail

For more #GCPSketchnote, follow the GitHub repo. For similar cloud content follow me on twitter @pvergadia and keep an eye out on thecloudgirl.dev.

Posted in