This page documents known limitations of Cloud Storage and Transfer for on-premises.
Cloud Storage 5TB object size limit
Cloud Storage supports a maximum single-object size up 5 terabytes. If you have objects larger than 5TB, the object transfer will fail for those objects for either Cloud Storage or Transfer for on-premises.
Cloud Storage object naming requirements
Cloud Storage imposes object name requirements that apply to all Storage Transfer Service transfers.
Changed objects aren't transferred
If an object's data is updated during a transfer, Storage Transfer Service fails the transfer for that particular object and the object isn't transferred.
To resolve the failure:
- Attempt the transfer again.
If the object's transfer continues to fail, ensure that its data cannot be updated during transfer:
After the transfer completes, you can re-enable updates to the object.
Known limitations of Transfer for on-premises
Transfer service for on-premises data is accessible only via the Google Cloud Console. We don't currently offer an API interface to Transfer service for on-premises data jobs.
No real-time support
Transfer service for on-premises data does not support sub-hourly change detection. Transfer service for on-premises data is a batch data movement service that can scan the source with a frequency of up to once an hour.
Supported operating system configurations
Transfer for on-premises agents require Docker installed, and run on Linux servers or virtual machines (VMs). To copy data on a CIFS or SMB filesystem, you can mount the volume on a Linux server or VM and then run the agent from the Linux server or VM.
Transfer service for on-premises data supports individual transfers that are:
- Hundreds of terabytes in size
- Up to 1 billion files
- Several 10s of Gbps in transfer speed
Individual transfers greater than these sizes will be reliable, but have not been tested for performance.
If you have a larger data set than these limits, we recommend that you split your data across multiple transfer jobs.
We currently support large directories, as long as every agent has at least 1GB of memory available for every 1 million files in the largest directory, so we can iterate over the directory contents without exceeding memory.
We support up to 100 agents for a single transfer project. It is unlikely that you'll need more agents to achieve better performance given typical on-premises environments.
Single directory per job
We support transferring only the full contents of a file system directory (recursively). You may partition the transfer by creating multiple jobs that transfer different subdirectories of your dataset, but we currently do not support file globbing or filtering support within a single job.
For more information about the differentiation between Transfer service for on-premises data and Cloud Storage, see Differences between Cloud Storage transfer options.
Uniform file system access for agents
Transfer service for on-premises data assumes that all running agents have equal permissions to read data from the source file system and that access is equal to the source file system for all jobs within a Google Cloud project.
If you need to run agents across multiple data centers that have different permissions set on source directories, you must segregate the jobs and agents to different Google Cloud projects. Each job will in turn have a different Pub/Sub topic and subscription to communicate with that job's set of agents.
Supported file names
We expect that file names are Unicode-compatible and don't contain newlines. If your source directory contains file names with newlines, the file listing task for that directory will fail.
If this occurs, replace any newlines in your file names and re-run the job.
Maximum path length
Transfer service for on-premises data follows Cloud Storage's maximum path length of 1024 bytes. The object prefix for the destination object is included in the length limitation, as the prefix is incorporated in the object's name in Cloud Storage.
Supported file metadata
Transfer service for on-premises data extracts the last modified time (mtime) from the source file to copy into the corresponding Cloud Storage destination object. Other file metadata is not preserved in the transfer.
Extended job pauses
Jobs that are paused for more than 30 days are considered inactive. When a job is inactive, the paused job is aborted and the job configuration schedule is disabled. No new job runs will start unless you explicitly enable the job again.