이 페이지에서는 URL 목록을 만들고 MD5 해시 생성 프로세스를 테스트하는 방법을 설명합니다. Storage Transfer Service를 사용하여 데이터를 공개 데이터 위치 목록에서 Cloud Storage 버킷으로 전송할 수 있습니다. 전송을 구성할 때 URL 목록을 명시하기만 하면 됩니다.
요구사항
URL 목록의 요구사항은 다음과 같습니다.
URL 목록은 탭으로 구분된 값(TSV) 파일이어야 합니다.
URL은 UTF-8 사전 순서로 정렬되어야 합니다.
서버는 URL 목록을 반환할 때 HTTP 응답에 강력한 Etag 헤더를 설정합니다.
http 또는 https로 시작하는 URL에서 URL 목록에 공개적으로 액세스할 수 있습니다.
데이터를 전송하려면 다음을 확인하세요.
지정한 각 URL에 공개적으로 액세스할 수 있습니다.
예를 들면 Cloud Storage에서 객체를 공개적으로 공유하고 이 객체에 대한 링크를 가져올 수 있습니다.
서버의 robots.txt 파일은 각 URL에 대한 액세스를 허용합니다.
각 객체를 호스팅하는 서버는 다음과 같습니다.
Range 요청을 지원합니다.
각 응답에서 Content-Length 헤더를 반환합니다.
URL 목록 형식 지정
URL 목록의 형식을 지정하려면 다음 단계를 따르세요.
탭으로 구분된 값(TSV) 파일을 만듭니다.
형식 지정자 TsvHttpData-1.0을 첫 번째 행에 삽입합니다.
전송할 각 객체에 새로운 줄을 추가합니다. 각 줄에 다음과 같은 탭으로 구분된 필드를 순서대로 포함합니다.
소스 객체의 HTTP 또는 HTTPS URL.
http(s)://[HOSTNAME]:[PORT]/[URL_PATH]에 있는 객체가 Cloud Storage로 전송되면 Cloud Storage의 객체 이름은 [HOSTNAME]/[URL_PATH]입니다.
객체 크기(바이트).
객체를 가져올 때 지정된 크기가 객체의 실제 크기와 일치하는지 확인합니다. Cloud Storage에서 수신한 객체의 크기가 지정된 크기와 일치하지 않으면 객체 전송이 실패합니다.
객체의 Base64로 인코딩된 MD5 체크섬입니다.
지정된 MD5 체크섬이 전송된 바이트에서 계산된 MD5 체크섬과 일치하는지 확인합니다. Cloud Storage에서 수신한 객체의 MD5 체크섬이 지정된 MD5 체크섬과 일치하지 않으면 객체 전송이 실패합니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2024-12-21(UTC)"],[],[],null,["# Transfer from public URLs to Cloud Storage\n\nStorage Transfer Service can copy files from a list of public URLs to your\nCloud Storage bucket.\n\nWhen creating a transfer, you provide a link to a tab-separated values (TSV)\nfile containing the URLs and details of the objects to transfer. The TSV file\ncan be hosted in any publicly-accessible HTTP or HTTPS location; or can be\nstored in a Cloud Storage bucket.\n\nThis page explains how to create a URL list and pass it to the job creation\ncommand.\n\nSource file requirements\n------------------------\n\n- URLs must be publicly-accessible and use HTTP or HTTPS protocols.\n- The server hosting each object:\n - Must not deny access to the object with a `robots.txt`.\n - Supports `Range` requests.\n - Returns a `Content-Length` header in each response.\n\nURL list format\n---------------\n\nA URL list must adhere to the following requirements:\n\n- The file must be formatted as tab-separated values.\n- URLs must be in UTF-8 lexicographical order.\n- The first line must specify `TsvHttpData-1.0`.\n- After the first line, specify one object per row.\n- Each row must contain the URL, and may also contain the file size and the base64-encoded MD5 checksum of the object.\n\nThe following is a sample TSV file that specifies two objects to transfer. Note\nthat on this page the tabs are rendered as spaces; for your own file, make sure\nto use tabs between fields. \n\n TsvHttpData-1.0\n https://example.com/myfile.pdf 1357 wHENa08V36iPYAsOa2JAdw==\n https://example2.com/images/dataset1/flower.png 2468 R9acAaveoPd2y8nniLUYbw==\n\nEach line contains:\n\n- The HTTP or HTTPS URL of a source object.\n- (Optional) The size of the object in bytes.\n\n Ensure that the specified size matches the actual size of the object\n when it is fetched. If the size of the object received by\n Cloud Storage does not match the size specified, the object\n transfer will fail.\n- (Optional) The base64-encoded MD5 checksum of the object.\n\n Ensure that the specified MD5 checksum matches the MD5 checksum computed\n from the transferred bytes. If the MD5 checksum of the object received\n by Cloud Storage does not match the MD5 checksum specified, the\n object transfer will fail.\n | **Note:** Many checksumming tools output an object's MD5 checksum in base16-encoded format. Ensure the values provided are in base64-encoded format.\n\nWhile the object size and MD5 checksum values are optional, we strongly\nrecommend including them to help ensure data integrity.\n\nHosting the URL list\n--------------------\n\nThe URL list can be hosted in one of two locations:\n\n- A publicly-accessible URL.\n- A Cloud Storage bucket, to which the service agent for Storage Transfer Service\n has been granted access.\n\n### Publicly-accessible URLs\n\nWhen storing the URL list at a publicly-accessible URL, the following\nrequirements apply:\n\n- The URL must begin with `http://` or `https://`.\n- The server must set a strong `Etag` header in the HTTP response when it returns the URL list.\n\nFor example, you can store the list in a Cloud Storage bucket and\n[share the object publicly](/storage/docs/cloud-console#_sharingdata) to\nget a link to it.\n\n### Cloud Storage buckets\n\nTo avoid storing your list in a public location, you can store it in a\nCloud Storage bucket, and grant access to the service agent\nfor Storage Transfer Service.\n\nThe service agent must be granted the following permissions:\n\n- The `storage.object.get` permission on the object. This can be granted by granting the `roles/storage.legacyObjectReader` role on the bucket, or with a custom role.\n- The `storage.buckets.get` permission on the bucket. This can be granted by granting the `roles/storage.legacyBucketReader` role, or with a custom role.\n\nTo grant permissions to the service agent:\n\n#### Find the service agent's email\n\n1. Go to the\n [`googleServiceAccounts.get` reference\n page](/storage-transfer/docs/reference/rest/v1/googleServiceAccounts/get).\n\n An interactive panel opens, titled **Try this method**.\n2. In the panel, under **Request parameters**, enter your\n project ID. The project you specify here must be the project you're\n using to manage Storage Transfer Service, which might be different from the URL\n list bucket's project.\n\n3. Click **Execute**.\n\nYour service agent's email is returned as the value of `accountEmail` and\nuses the format\n`project-`\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e`@storage-transfer-service.iam.gserviceaccount.com`.\n\nCopy this value.\n\n#### Grant the required roles\n\nTo grant the `roles/storage.objectViewer` role and the\n`roles/storage.bucketViewer` role to the service agent on the bucket containing\nthe URL list, follow the instructions in\n[Set and manage IAM policies on buckets](/storage/docs/access-control/using-iam-permissions).\n\nThe principal you are adding is the service agent's email address. If required,\nthe principal identifier is `serviceAccount`. For example,\n`serviceAccount:project-`\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e`@storage-transfer-service.iam.gserviceaccount.com`.\n\nCreate a URL list transfer job\n------------------------------\n\nTo specify a URL list when creating a transfer job, follow these instructions: \n\n### Google Cloud console\n\nFollow the instructions in\n[Create a transfer job](/storage-transfer/docs/create-transfers#google-cloud-console).\n\nWhen choosing a source:\n\n1. Under **Source type** , select **URL list** and click **Next step**.\n\n2. Under **URL of TSV file** , provide the URL to your tab-separated values\n (TSV) file. The URL is either an HTTP/HTTPS address (e.g.\n `https://example.com/urllist.tsv`) or a Cloud Storage path\n (e.g. `gs://my-bucket/urllist.tsv`).\n\n### gcloud CLI\n\nTo create a new transfer job, use the `gcloud transfer jobs create`\ncommand. \n\n gcloud transfer jobs create \\\n \u003cvar translate=\"no\"\u003eSOURCE\u003c/var\u003e \u003cvar translate=\"no\"\u003eDESTINATION\u003c/var\u003e\n\nFor URL list transfers, the value of \u003cvar translate=\"no\"\u003eSOURCE\u003c/var\u003e is the URL of the\nTSV file. The URL is either an HTTP/HTTPS address (e.g.\n`https://example.com/urllist.tsv`) or a Cloud Storage path\n(e.g. `gs://my-bucket/urllist.tsv`).\n\nFor more information on creating transfers using gcloud CLI, see\n[Create transfer jobs](/storage-transfer/docs/create-transfers#gcloud-cli).\n\n### REST\n\nTo create a URL list transfer job using the REST API, specify the URL of the\nTSV file in the `listUrl` field: \n\n {\n \"projectId\": \"\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\",\n \"transferSpec\": {\n \"httpDataSource\": {\n \"listUrl\": \"\u003cvar translate=\"no\"\u003eURL\u003c/var\u003e\"\n },\n \"gcsDataSink\": {\n \"bucketName\": \"DESTINATION_BUCKET\"\n }\n },\n \"status\": \"ENABLED\"\n }\n\nThe value of \u003cvar translate=\"no\"\u003eURL\u003c/var\u003e is either an HTTP/HTTPS address (e.g.\n`https://example.com/urllist.tsv`) or a Cloud Storage path\n(e.g. `gs://my-bucket/urllist.tsv`).\n\nFor more details on creating transfers using the REST API, see the\n[REST API reference](/storage-transfer/docs/reference/rest/v1/transferJobs/create)."]]