CRC32C and Installing crcmod
Google Cloud Storage provides a cyclic redundancy check (CRC) header that
allows clients to verify the integrity of object contents. For non-composite
objects Google Cloud Storage also provides an MD5 header to allow clients to
verify object integrity, but for composite objects only the CRC is available.
gsutil automatically performs integrity checks on all uploads and downloads.
Additionally, you can use the
gsutil hash command to calculate a CRC for
any local file.
The CRC variant used by Google Cloud Storage is called CRC32C (Castagnoli), which is not available in the standard Python distribution. The implementation of CRC32C used by gsutil is provided by a third-party Python module called crcmod.
The crcmod module contains a pure-Python implementation of CRC32C, but using it results in very poor performance. A Python C extension is also provided by crcmod, which requires compiling into a binary module for use. gsutil ships with a precompiled crcmod C extension for macOS; for other platforms, see the installation instructions below.
At the end of each copy operation, the
gsutil cp and
commands validate that the checksum of the source file/object matches the
checksum of the destination file/object. If the checksums do not match,
gsutil will delete the invalid copy and print a warning message. This very
rarely happens, but if it does, please contact email@example.com.
To determine if the compiled version of crcmod is available in your Python
environment, you can inspect the output of the
gsutil version command for
the "compiled crcmod" entry:
$ gsutil version -l ... compiled crcmod: True ...
If your crcmod library is compiled to a native binary, this value will be True. If using the pure-Python version, the value will be False.
To control gsutil's behavior in response to crcmod's status, you can set the
"check_hashes" configuration variable. For details on this variable, see the
surrounding comments in your boto configuration file. If "check_hashes"
is not present in your configuration file, rerun
gsutil config to
regenerate the file.
These installation instructions assume that:
- You have
pipinstalled. Consult the pip installation instructions for details on how to install
- Your installation of
pipcan be found in your
PATHenvironment variable. If it cannot, you may need to replace
pipin the commands below with the full path to the executable.
- You are installing the crcmod package for use with your system installation
of Python, and thus use the
sudocommand. If installing crcmod for a different Python environment (e.g. in a virtualenv), you should omit
sudofrom the commands below.
CentOS, RHEL, and Fedora
To compile and install crcmod:
sudo yum install gcc python-devel python-setuptools redhat-rpm-config sudo pip uninstall crcmod sudo pip install -U crcmod
Debian and Ubuntu
To compile and install crcmod:
sudo apt-get install gcc python-dev python-setuptools sudo pip uninstall crcmod sudo pip install -U crcmod
gsutil distributes a pre-compiled version of crcmod for macOS, so you shouldn't
need to compile and install it yourself. If for some reason the pre-compiled
version is not being detected, please let the Google Cloud Storage team know
gsutil help support).
To compile manually on macOS, you will first need to install XCode and then run:
sudo pip install -U crcmod
An installer is available for the compiled version of crcmod from the Python Package Index (PyPi) at the following URL:
MSI installers are available for the 32-bit versions of Python 2.7. Make sure to install to a 32-bit Python directory. If you're using 64-bit Python it won't work with 32-bit crcmod, and instead you'll need to install 32-bit Python in order to use crcmod.
Note: If you have installed crcmod and gsutil hasn't detected it, it may have been installed to the wrong directory. It should be located at <python_dir>\files\Lib\site-packages\crcmod
In some cases the installer will incorrectly install to <python_dir>\Lib\site-packages\crcmod
Manually copying the crcmod directory to the correct location should resolve the issue.