cp - Copy files and objects
gsutil cp [OPTION]... src_url dst_url gsutil cp [OPTION]... src_url... dst_url gsutil cp [OPTION]... -I dst_url
gsutil cp command allows you to copy data between your local file
system and the cloud, within the cloud, and between
cloud storage providers. For example, to upload all text files from the
local directory to a bucket, you can run:
gsutil cp *.txt gs://my-bucket
You can also download text files from a bucket:
gsutil cp gs://my-bucket/*.txt .
-r option to copy an entire directory tree.
For example, to upload the directory tree
gsutil cp -r dir gs://my-bucket
If you have a large number of files to transfer, you can perform a parallel
multi-threaded/multi-processing copy using the
-m option (see gsutil help options):
gsutil -m cp -r dir gs://my-bucket
You can use the
-I option with
stdin to specify a list of URLs to
copy, one per line. This allows you to use gsutil
in a pipeline to upload or download objects as generated by a program:
cat filelist | gsutil -m cp -I gs://my-bucket
cat filelist | gsutil -m cp -I ./download_dir
where the output of
cat filelist is a list of files, cloud URLs, and
wildcards of files and cloud URLs.
How Names Are Constructed
gsutil cp command attempts to name objects in ways that are consistent with the
cp command. This means that names are constructed depending
on whether you're performing a recursive directory copy or copying
individually-named objects, or whether you're copying to an existing or
When you perform recursive directory copies, object names are constructed to
mirror the source directory structure starting at the point of recursive
processing. For example, if
dir1/dir2 contains the file
a/b/c, then the
following command creates the object
gsutil cp -r dir1/dir2 gs://my-bucket
In contrast, copying individually-named files results in objects named by
the final path component of the source files. For example, assuming again that
a/b/c, the following command creates the object
gsutil cp dir1/dir2/** gs://my-bucket
Note that in the above example, the '**' wildcard matches all names
dir. The wildcard '*' matches names just one level deep. For
more details, see gsutil help wildcards.
The same rules apply for uploads and downloads: recursive copies of buckets and bucket subdirectories produce a mirrored filename structure, while copying individually or wildcard-named objects produce flatly-named files.
In addition, the resulting names depend on whether the destination subdirectory
exists. For example, if
gs://my-bucket/subdir exists as a subdirectory,
the following command creates the object
gsutil cp -r dir1/dir2 gs://my-bucket/subdir
In contrast, if
gs://my-bucket/subdir does not exist, this same
command creates the object
Copying To/From Subdirectories; Distributing Transfers Across Machines
You can use gsutil to copy to and from subdirectories by using a command like this:
gsutil cp -r dir gs://my-bucket/data
dir and all of its files and nested subdirectories to be
copied under the specified destination, resulting in objects with names like
gs://my-bucket/data/dir/a/b/c. Similarly, you can download from bucket
subdirectories using the following command:
gsutil cp -r gs://my-bucket/data dir
This causes everything nested under
gs://my-bucket/data to be downloaded
dir, resulting in files with names like
Copying subdirectories is useful if you want to add data to an existing
bucket directory structure over time. It's also useful if you want
to parallelize uploads and downloads across multiple machines (potentially
reducing overall transfer time compared with running
cp on one machine). For example, if your bucket contains this structure:
gs://my-bucket/data/result_set_01/ gs://my-bucket/data/result_set_02/ ... gs://my-bucket/data/result_set_99/
you can perform concurrent downloads across 3 machines by running these commands on each machine, respectively:
gsutil -m cp -r gs://my-bucket/data/result_set_[0-3]* dir gsutil -m cp -r gs://my-bucket/data/result_set_[4-6]* dir gsutil -m cp -r gs://my-bucket/data/result_set_[7-9]* dir
dir could be a local directory on each machine, or a
directory mounted off of a shared file server. The performance of the latter
depends on several factors, so we recommend experimenting
to find out what works best for your computing environment.
Copying In The Cloud And Metadata Preservation
If both the source and destination URL are cloud URLs from the same
provider, gsutil copies data "in the cloud" (without downloading
to and uploading from the machine where you run gsutil). In addition to
the performance and cost advantages of doing this, copying in the cloud
preserves metadata such as
Cache-Control. In contrast,
when you download data from the cloud, it ends up in a file with
no associated metadata, unless you have some way to keep
or re-create that metadata.
Copies spanning locations and/or storage classes cause data to be rewritten in the cloud, which may take some time (but is still faster than downloading and re-uploading). Such operations can be resumed with the same command if they are interrupted, so long as the command parameters are identical.
Note that by default, the gsutil
cp command does not copy the object
ACL to the new object, and instead uses the default bucket ACL (see
gsutil help defacl). You can override this behavior with the
When copying in the cloud, if the destination bucket has Object Versioning
enabled, by default
gsutil cp copies only live versions of the
source object. For example, the following command causes only the single live
gs://bucket1/obj to be copied to
gs://bucket2, even if there
are noncurrent versions of
gsutil cp gs://bucket1/obj gs://bucket2 To also copy noncurrent versions, use the ``-A`` flag: gsutil cp -A gs://bucket1/obj gs://bucket2
The top-level gsutil
-m flag is not allowed when using the
cp -A flag, to
ensure that version ordering is preserved.
At the end of every upload or download, the
gsutil cp command validates that
the checksum it computes for the source file matches the checksum that
the service computes. If the checksums do not match, gsutil deletes the
corrupted object and prints a warning message. If this happens, contact
If you know the MD5 of a file before uploading, you can specify it in the Content-MD5 header, which enables the cloud storage service to reject the upload if the MD5 doesn't match the value computed by the service. For example:
% gsutil hash obj Hashing obj: Hashes [base64] for obj: Hash (crc32c): lIMoIw== Hash (md5): VgyllJgiiaRAbyUUIqDMmw== % gsutil -h Content-MD5:VgyllJgiiaRAbyUUIqDMmw== cp obj gs://your-bucket/obj Copying file://obj [Content-Type=text/plain]... Uploading gs://your-bucket/obj: 182 b/182 B If the checksums don't match, the service rejects the upload and gsutil prints a message like: BadRequestException: 400 Provided MD5 hash "VgyllJgiiaRAbyUUIqDMmw==" doesn't match calculated MD5 hash "7gyllJgiiaRAbyUUIqDMmw==".
Specifying the Content-MD5 header has several advantages:
- It prevents the corrupted object from becoming visible. If you don't specify the header, the object is visible for 1-3 seconds before gsutil deletes it.
- If an object already exists with the given name, specifying the Content-MD5 header prevents the existing object from being replaced. Otherwise, the existing object is replaced by the corrupted object and deleted a few seconds later.
- If you don't specify the Content-MD5 header, it's possible for the gsutil process to complete the upload but then be interrupted or fail before it can delete the corrupted object, leaving the corrupted object in the cloud.
- It supports a customer-to-service integrity check handoff. For example,
if you have a content production pipeline that generates data to be
uploaded to the cloud along with checksums of that data, specifying the
MD5 computed by your content pipeline when you run
gsutil cpensures that the checksums match all the way through the process. This way, you can detect if data gets corrupted on your local disk between the time it was written by your content pipeline and the time it was uploaded to Cloud Storage.
cp command retries when failures occur, but if enough failures happen
during a particular copy or delete operation, the
cp command skips that
object and moves on. If any failures were not successfully retried by the end
of the copy run, the
cp command reports the number of failures, and
exits with a non-zero status.
Note that there are cases where retrying never succeeds, such as if you have insufficient write permissions to the destination bucket, or if the destination path for an object is longer than the maximum allowed length.
For more details about gsutil's retry handling, see gsutil help retries.
gsutil automatically performs a resumable upload whenever you use the
command to upload an object that is larger than 8 MiB. You do not need to
specify any special command line options to make this happen. If your upload
is interrupted, you can restart the upload by running the same
cp command that
you used to start the upload. You can adjust the minimum size for performing
resumable uploads by changing the
resumable_threshold parameter in the
boto configuration file.
Until the upload has completed successfully, it is not visible at the destination object and does not replace any existing object the upload is intended to overwrite. However, parallel composite uploads may leave temporary component objects in place during the upload process. See Parallel Composite Uploads for more information.
Similarly, gsutil automatically performs resumable downloads using standard
HTTP Range GET operations whenever you use the
cp command, unless the
destination is a stream. In this case, a partially downloaded temporary file
is visible in the destination directory. Upon completion, the original
file is deleted and overwritten with the downloaded contents.
Resumable uploads and downloads store state information in files under ~/.gsutil, named by the destination object or file. If you attempt to resume a transfer from a machine with a different directory, the transfer starts over from scratch.
See gsutil help prod for details on using resumable transfers in production.
Use '-' in place of src_url or dst_url to perform a streaming transfer. For example:
long_running_computation | gsutil cp - gs://my-bucket/obj
Streaming uploads using the JSON API (see gsutil help apis) are buffered in memory part-way back into the file and can thus retry in the event of network or service problems.
Streaming transfers using the XML API do not support resumable uploads or downloads. If you have a large amount of data to upload or download, over 100 MiB for example, we recommend that you write the data to a local file and copy that file rather than streaming it.
Sliced Object Downloads
gsutil uses HTTP Range GET requests to perform "sliced" downloads in parallel when downloading large objects from Cloud Storage. This means that disk space for the temporary download destination file is pre-allocated and byte ranges (slices) within the file are downloaded in parallel. Once all slices have completed downloading, the temporary file is renamed to the destination file. No additional local disk space is required for this operation.
This feature is only available for Cloud Storage objects because it requires a fast composable checksum (CRC32C) to verify the data integrity of the slices. Because sliced object downloads depend on CRC32C, they require a compiled crcmod on the machine performing the download. If compiled crcmod is not available, a non-sliced object download is performed instead.
Parallel Composite Uploads
gsutil can automatically use
to perform uploads in parallel for large, local files being uploaded to
Cloud Storage. If enabled, a large file is split into
component pieces that are uploaded in parallel and composed in the cloud. The
temporary components are deleted afterwards. A file can be broken into as
many as 32 component pieces. Until this piece limit is reached, the maximum
size of each component piece is determined by the variable
"parallel_composite_upload_component_size," specified in the [GSUtil] section
.boto configuration file. For files that are otherwise too big,
components are as large as needed to fit into 32 pieces. No additional local
disk space is required for this operation. Parallel composite uploads are disabled
by default and cannot be used when uploading an object to a bucket that has a default
customer-managed encryption key.
Using parallel composite uploads presents a tradeoff between upload performance and download configuration. Your uploads run faster if you enable parallel composite uploads, but crcmod is required to download objects that are uploaded through parallel composite uploads if you are using gsutil or other Python applications. You should only enable parallel composite uploads if:
- All users who need to download the data using gsutil or other Python applications can install crcmod.
- No gsutil or Python users need to download your objects.
For example, if you use gsutil to upload video assets that are only served by a Java application, it would make sense to enable parallel composite uploads on your machine, since there are efficient CRC32C implementations available in Java.
To try parallel composite uploads, you can run the command:
gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp bigfile gs://your-bucket
bigfile is larger than 150 MiB. Note that the upload
progress indicator continuously updates for the file, until all parts of the
upload complete. If you want to enable parallel composite
uploads for all of your future uploads, you can uncomment and set the
"parallel_composite_upload_threshold" config value in your
file to 150M or your desired value.
If a parallel composite upload fails prior to composition, run the gsutil command again to take advantage of resumable uploads for the components that failed. The component objects are deleted after the first successful attempt. Any temporary objects that were uploaded successfully before the failure remain until the upload is completed successfully. The temporary objects are named in the following fashion:
where <random ID> is a numerical value, and <hash> is an MD5 hash (not related to the hash of the contents of the file or object).
To avoid leaving temporary objects around, you should check the exit status from the gsutil command. You can do this in a bash script by running:
if ! gsutil cp ./local-file gs://your-bucket/your-object; then << Code that handles failures >> fi
Or, for copying a directory, run this script:
if ! gsutil cp -c -L cp.log -r ./dir gs://bucket; then << Code that handles failures >> fi
Note that an object uploaded using parallel composite uploads has a CRC32C hash, but no MD5 hash. For details, see gsutil help crc32c.
Disable parallel composite uploads by setting the
"parallel_composite_upload_threshold" variable in the
.boto config file to 0.
Changing Temp Directories
gsutil writes data to a temporary directory in several cases:
- when compressing data to be uploaded (see the
- when decompressing data being downloaded (for example, when the data has
Content-Encoding:gzipas a result of being uploaded using gsutil cp -z or gsutil cp -Z)
- when running integration tests using the gsutil test command
In these cases, it's possible the temporary file location on your system that
gsutil selects by default may not have enough space. If gsutil runs out of
space during one of these operations (for example, raising
"CommandException: Inadequate temp space available to compress <your file>"
gsutil cp -z operation), you can change where it writes these
temp files by setting the TMPDIR environment variable. On Linux and macOS,
you can set the variable as follows:
TMPDIR=/some/directory gsutil cp ...
You can also add this line to your ~/.bashrc file and restart the shell before running gsutil:
On Windows 7, you can change the TMPDIR environment variable from Start -> Computer -> System -> Advanced System Settings -> Environment Variables. You need to reboot after making this change for it to take effect. Rebooting is not necessary after running the export command on Linux and macOS.
|-a canned_acl||Applies the specific |
Copy all source versions from a source bucket or folder. If not set, only the live version of each source object is copied.
If an error occurs, continue attempting to copy the remaining
files. If any copies are unsuccessful, gsutil's exit status
is non-zero, even if this flag is set. This option is
implicitly set when running
Copy in "daisy chain" mode, which means copying between two buckets by first downloading to the machine where gsutil is run, then uploading to the destination bucket. The default mode is a "copy in the cloud," where data is copied between two buckets without uploading or downloading.
During a "copy in the cloud," a source composite object remains composite at its destination. However, you can use "daisy chain" mode to change a composite object into a non-composite object. For example:
gsutil cp -D -p gs://bucket/obj gs://bucket/obj_tmp gsutil mv -p gs://bucket/obj_tmp gs://bucket/obj
|-e||Exclude symlinks. When specified, symbolic links are not copied.|
cat filelist | gsutil -m cp -I gs://my-bucket
where the output of
Applies gzip transport encoding to any file upload whose
extension matches the
When you specify the
Note that if you want to use the top-level
gsutil -o "GSUtil:max_upload_compression_buffer_size=8G" \ -m cp -j html,txt -r /local/source/dir gs://bucket/path
Applies gzip transport encoding to file uploads. This option
works like the
Outputs a manifest log file with detailed information about each item that was copied. This manifest contains the following information for each item:
If the log file already exists, gsutil uses the file as an
input to the copy process, and appends log items to
the existing file. Objects that are marked in the
existing log file as having been successfully copied or
skipped are ignored. Objects without entries are
copied and ones previously marked as unsuccessful are
retried. This option can be used in conjunction with the
until gsutil cp -c -L cp.log -r ./dir gs://bucket; do sleep 1 done
The -c option enables copying to continue after failures occur, and the -L option allows gsutil to pick up where it left off without duplicating work. The loop continues running as long as gsutil exits with a non-zero status. A non-zero status indicates there was at least one failure during the copy operation.
|-n||No-clobber. When specified, existing files or objects at the destination are not overwritten. Any items that are skipped by this option are reported as skipped. gsutil performs an additional GET request to check if an item exists before attempting to upload the data. This saves gsutil from retransmitting data, but the additional HTTP requests may make small object transfers slower and more expensive.|
Preserves ACLs when copying in the cloud. Note
that this option has performance and cost implications only when
using the XML API, as the XML API requires separate HTTP calls for
interacting with ACLs. You can mitigate this
performance issue using
Note that it's not valid to specify both the
Enables POSIX attributes to be preserved when objects are
On Windows, this flag only sets and restores access time and modification time. This is because Windows doesn't support POSIX uid/gid/mode.
|-R, -r||The |
|-s <class>||Specifies the storage class of the destination object. If not specified, the default storage class of the destination bucket is used. This option is not valid for copying to non-cloud destinations.|
|-U||Skips objects with unsupported object types instead of failing. Unsupported object types include Amazon S3 objects in the GLACIER storage class.|
|-v||Prints the version-specific URL for each uploaded object. You can use these URLs to safely make concurrent upload requests, because Cloud Storage refuses to perform an update if the current object version doesn't match the version-specific URL. See gsutil help versions for more details.|
Applies gzip content-encoding to any file upload whose
extension matches the
When you specify the
For example, the following command:
gsutil cp -z html -a public-read \ cattypes.html tabby.jpeg gs://mycats
does the following:
Note that if you download an object with
Applies gzip content-encoding to file uploads. This option
works like the