This page documents known limitations of Cloud Storage and Transfer for on-premises.
Cloud Storage 5TB object size limit
Cloud Storage supports a maximum single-object size up 5 terabytes. If you have objects larger than 5TB, the object transfer fails for those objects for either Cloud Storage or Transfer for on-premises.
Cloud Storage object naming requirements
Cloud Storage imposes object name requirements that apply to all Storage Transfer Service transfers.
Changed objects aren't transferred
If an object's data is updated during a transfer, the following describes how Storage Transfer Service responds:
Transfers from non-Google clouds to Google Cloud: If an object's data is updated during a transfer, Storage Transfer Service fails the transfer for that particular object and the object isn't transferred.
Transfers from on-premises to Google Cloud: If an object's data is updated during a transfer, Transfer for on-premises attempts the upload again. If the upload fails multiple times, Transfer for on-premises logs a
FILE_MODIFIED_FAILURE. For more information, see Troubleshooting Transfer for on-premises.
Transfers from Google Cloud to on-premises: If an object's data is updated during a transfer, Transfer for on-premises attempts the download again. If the download fails multiple times, Transfer for on-premises logs a
PRECONDITION_FAILURE. For more information, see Troubleshooting Transfer for on-premises.
To resolve the failure:
- Attempt the transfer again.
If the object's transfer continues to fail, ensure that its data cannot be updated during transfer:
After the transfer completes, you can re-enable updates to the object.
Folders in Cloud Storage
Cloud Storage objects reside within a flat namespace within a bucket. For more information, see Object name considerations. Due to this, Storage Transfer Service doesn't create hierarchical namespaces within Cloud Storage. For instance, if you're transferring from Azure Data Lake Storage (ADLS) Gen 2, then Storage Transfer Service does not recreate the ADLS Gen 2 namespaces in Cloud Storage.
Deleting objects in versioning-suspended Amazon S3 buckets
When using Storage Transfer Service's delete objects from source after transfer feature on a version-suspended Amazon S3 bucket, Storage Transfer Service removes the object with a null version ID, not the current version.
Location of Storage Transfer Service jobs
Storage Transfer Service chooses its location based on the region of the source Cloud Storage bucket. As of today, we only create Storage Transfer Service jobs in the locations listed below. This list may change as Storage Transfer Service adds support for new regions.
If your source Cloud Storage bucket is located in a different region than the ones mentioned above, we will choose the default region within its outer region.
Known limitations of Transfer for on-premises
No real-time support
Transfer service for on-premises data does not support sub-hourly change detection. Transfer service for on-premises data is a batch data movement service that can scan the source with a frequency of up to once an hour.
Supported operating system configurations
Transfer for on-premises agents require Docker installed, and run on Linux servers or virtual machines (VMs). To copy data on a CIFS or SMB filesystem, you can mount the volume on a Linux server or VM and then run the agent from the Linux server or VM.
Memory requirementsThe following are memory requirements for Transfer service for on-premises data agents:
- Minimum memory: 1GiB
- Minimum memory to support high-performance uploads: 6GiB
Transfer service for on-premises data supports individual transfers that are:
- Hundreds of terabytes in size
- Up to 1 billion files
- Several 10s of Gbps in transfer speed
Individual transfers greater than these sizes are reliable, but have not been tested for performance.
If you have a larger data set than these limits, we recommend that you split your data across multiple transfer jobs.
We currently support large directories, as long as every agent has at least 1GB of memory available for every 1 million files in the largest directory, so we can iterate over the directory contents without exceeding memory.
We support up to 100 agents for a single transfer project. It is unlikely that you'll need more agents to achieve better performance given typical on-premises environments.
Single directory per job
We support transferring only the full contents of a file system directory (recursively). You may partition the transfer by creating multiple jobs that transfer different subdirectories of your dataset, but we currently do not support file globbing or filtering support within a single job.
For more information about the differentiation between Transfer service for on-premises data and Cloud Storage, see Differences between Cloud Storage transfer options.
Supported file names
We expect that file names are Unicode-compatible and don't contain newlines. If your source directory contains file names with newlines, the file listing task for that directory fails.
If this occurs, replace any newlines in your file names and re-run the job.
Supported file types
Transfer service for on-premises data supports transferring regular files and Unix-like hidden files.
Unix-style hidden files are files that start with a
. character. When Transfer service for on-premises data
encounters a non-regular file, such as a device, named pipe, or socket, it
Empty directories are not created in Cloud Storage, because objects don't reside within subdirectories within a bucket. For more information, see Object name considerations.
Maximum path length
Transfer service for on-premises data follows Cloud Storage's maximum path length of 1024 bytes. The object prefix for the destination object is included in the length limitation, as the prefix is incorporated in the object's name in Cloud Storage.
Supported file metadata
See Metadata preservation for details on which metadata is preserved, either by default or optionally.
Extended job pauses
Jobs that are paused for more than 30 days are considered inactive. When a job is inactive, the paused job is aborted and the job configuration schedule is disabled. No new job runs start unless you explicitly enable the job again.