This page documents known limitations of Cloud Storage and Transfer for on-premises.
Cloud Storage 5TB object size limit
Cloud Storage supports a maximum single-object size up 5 terabytes. If you have objects larger than 5TB, the object transfer fails for those objects for either Cloud Storage or Transfer for on-premises.
Cloud Storage object naming requirements
Cloud Storage imposes object name requirements that apply to all Storage Transfer Service transfers.
Changed objects aren't transferred
If an object's data is updated during a transfer, the following describes how Storage Transfer Service responds:
Transfers from non-Google clouds to Google Cloud: If an object's data is updated during a transfer, Storage Transfer Service fails the transfer for that particular object and the object isn't transferred.
Transfers from on-premises to Google Cloud: If an object's data is updated during a transfer, Transfer for on-premises attempts the upload again. If the upload fails multiple times, Transfer for on-premises logs a
FILE_MODIFIED_FAILURE. For more information, see Troubleshooting Transfer for on-premises.
Transfers from Google Cloud to on-premises: If an object's data is updated during a transfer, Transfer for on-premises attempts the download again. If the download fails multiple times, Transfer for on-premises logs a
PRECONDITION_FAILURE. For more information, see Troubleshooting Transfer for on-premises.
To resolve the failure:
- Attempt the transfer again.
If the object's transfer continues to fail, ensure that its data cannot be updated during transfer:
After the transfer completes, you can re-enable updates to the object.
Folders in Cloud Storage
Cloud Storage objects reside within a flat namespace within a bucket. For more information, see Object name considerations. Due to this, Storage Transfer Service doesn't create hierarchical namespaces within Cloud Storage. For instance, if you're transferring from Azure Data Lake Storage (ADLS) Gen 2, then Storage Transfer Service does not recreate the ADLS Gen 2 namespaces in Cloud Storage.
Known limitations of Transfer for on-premises
No real-time support
Transfer service for on-premises data does not support sub-hourly change detection. Transfer service for on-premises data is a batch data movement service that can scan the source with a frequency of up to once an hour.
Supported operating system configurations
Transfer for on-premises agents require Docker installed, and run on Linux servers or virtual machines (VMs). To copy data on a CIFS or SMB filesystem, you can mount the volume on a Linux server or VM and then run the agent from the Linux server or VM.
Memory requirementsThe following are memory requirements for Transfer service for on-premises data agents:
- Minimum memory: 1GiB
- Minimum memory to support high-performance uploads: 6GiB
Transfer service for on-premises data supports individual transfers that are:
- Hundreds of terabytes in size
- Up to 1 billion files
- Several 10s of Gbps in transfer speed
Individual transfers greater than these sizes are reliable, but have not been tested for performance.
If you have a larger data set than these limits, we recommend that you split your data across multiple transfer jobs.
We currently support large directories, as long as every agent has at least 1GB of memory available for every 1 million files in the largest directory, so we can iterate over the directory contents without exceeding memory.
We support up to 100 agents for a single transfer project. It is unlikely that you'll need more agents to achieve better performance given typical on-premises environments.
Single directory per job
We support transferring only the full contents of a file system directory (recursively). You may partition the transfer by creating multiple jobs that transfer different subdirectories of your dataset, but we currently do not support file globbing or filtering support within a single job.
For more information about the differentiation between Transfer service for on-premises data and Cloud Storage, see Differences between Cloud Storage transfer options.
Uniform file system access for agents
Transfer service for on-premises data assumes that all running agents have equal permissions to read data from the source file system and that access is equal to the source file system for all jobs within a Google Cloud project.
If you need to run agents across multiple data centers that have different permissions set on source directories, you must segregate the jobs and agents to different Google Cloud projects. Each job in turn has a different Pub/Sub topic and subscription to communicate with that job's set of agents.
Supported file names
We expect that file names are Unicode-compatible and don't contain newlines. If your source directory contains file names with newlines, the file listing task for that directory fails.
If this occurs, replace any newlines in your file names and re-run the job.
Supported file types
Transfer service for on-premises data supports transferring regular files and Unix-like hidden files.
Unix-style hidden files are files that start with a
. character. When Transfer service for on-premises data
encounters a non-regular file, such as a device, named pipe, or socket, it
Empty directories are not created in Cloud Storage, because objects don't reside within subdirectories within a bucket. For more information, see Object name considerations.
Maximum path length
Transfer service for on-premises data follows Cloud Storage's maximum path length of 1024 bytes. The object prefix for the destination object is included in the length limitation, as the prefix is incorporated in the object's name in Cloud Storage.
Supported file metadata
Transfer service for on-premises data extracts the last modified time (mtime) from the source file to copy into the corresponding Cloud Storage destination object. Other file metadata is not preserved in the transfer.
Extended job pauses
Jobs that are paused for more than 30 days are considered inactive. When a job is inactive, the paused job is aborted and the job configuration schedule is disabled. No new job runs start unless you explicitly enable the job again.