This page explains how to create a URL list and test your process for generating MD5 hashes. You can use Storage Transfer Service to transfer data from a list of public data locations to a Cloud Storage bucket. When you configure your transfer, you simply refer to the URL list.
The following are requirements of URL lists:
The URL list must be a tab-separated values (TSV) file.
URLs must be sorted in UTF-8 lexicographical order.
The server sets a strong
Etagheader in the HTTP response when it returns the URL list.
The URL list is accessible from a URL beginning with either
To ensure your data is transferable, verify the following:
That each URL you specify is publicly accessible.
For example, in Cloud Storage, you can share an object publicly and get a link to it.
robots.txtfile allows access to each URL.
The server hosting each object:
- Returns a
Content-Lengthheader in each response.
Formatting the URL list
Do the following to format a URL list:
Create a tab-separated values (TSV) file.
Insert the format specifier
TsvHttpData-1.0on the first line.
Add additional lines for each object to transfer. Include the following tab-separated fields, in order, on each line:
The HTTP or HTTPS URL of a source object.
When an object located at
http(s)://[HOSTNAME]:[PORT]/[URL_PATH]is transferred to Cloud Storage, the name of the object in Cloud Storage is
The size of the object in bytes.
Ensure that the specified size matches the actual size of the object when it is fetched. If the size of the object received by Cloud Storage does not match the size specified, the object transfer will fail.
The Base64-encoded MD5 checksum of the object.
Ensure that the specified MD5 checksum matches the MD5 checksum computed from the transferred bytes. If the MD5 checksum of the object received by Cloud Storage does not match the MD5 checksum specified, the object transfer will fail.
See Generating MD5 checksums for information on generating MD5 checksums.
The following is a sample TSV file that specifies two objects to transfer:
TsvHttpData-1.0 https://example.com/buckets/obj1 1357 wHENa08V36iPYAsOa2JAdw== https://example.com/buckets/obj2 2468 R9acAaveoPd2y8nniLUYbw==
Generating MD5 checksums
Cloud Storage uses the MD5 checksum you provide for each to verify your data's integrity.
Use the following public object to verify that you are generating MD5 checksum correctly:
This object has a Base64-encoded MD5 checksum of
Copy the object to a local file called
md5-test, and verify the checksum using
openssl md5 -binary md5-test | openssl enc -base64
- Configure access to your data sources and sinks.
- Create and manage transfer jobs with the Google Cloud Console.
- Create and manage transfer jobs programmatically.