Data integrity

Storage Transfer Service uses metadata available from the source storage system, such as checksums and file sizes, to ensure that data written to Cloud Storage is the same data read from the source.

When checksum metadata is available

If the checksum metadata on the source storage system indicates that the data Storage Transfer Service received doesn't match the source data, Storage Transfer Service records a failure for the transfer operation. Examples of storage systems that include checksum metadata include most Amazon Simple Storage Service (Amazon S3) and Microsoft Azure Blob Storage objects (with some exceptions) and HTTP transfers (checksum metadata are provided by the user).

When checksum metadata is unavailable

When agents can run near the source

If checksum metadata isn't available from the underlying source storage system but agents can be run locally near the source storage system, Storage Transfer Service attempts to read the source data and compute a checksum before sending the data to Cloud Storage. This occurs when moving data from file systems to Cloud Storage.

When agents can't run near the source

If checksum metadata isn't available from the underlying source storage system, and agents can't be run locally near the source storage system, Storage Transfer Service can't compute a checksum until the data arrives in Cloud Storage. In this scenario, Storage Transfer Service copies the data but can't perform end-to-end data integrity checks to confirm that the data received is the same as the source data. Instead, Storage Transfer Service attempts a "best effort" approach by using available metadata, such as file size, to validate that the file copied to Cloud Storage matches the source file.

For example, Storage Transfer Service uses file sizes to validate data for:

After transfer checks

After your transfer is complete, we recommend performing additional data integrity checks to validate that:

  • The correct version of the files are copied, for files that change at the source.
  • The correct set and number of files are copied, to verify that you've set up the transfer jobs correctly.
  • The files were copied correctly, by verifying the metadata on the files, such as file checksums, file size, and so forth.