이 가이드에서는 Cloud Storage에서 사용할 수 있는 Azure FileShare Storage 및 Amazon S3의 연산자를 보여줍니다. 이 외에도 Google Cloud 내의 서비스와Google Cloud이외의 서비스에서 사용할 수 있는 전송 연산자가 더 많이 있습니다.
Amazon S3에서 Cloud Storage로 전송
이 섹션에서는 Amazon S3의 데이터를 Cloud Storage 버킷에 동기화하는 방법을 보여줍니다.
Amazon 제공업체 패키지 설치
apache-airflow-providers-amazon 패키지에는 Amazon S3와 상호작용하는 연결 유형과 기능이 포함됩니다.
사용자 환경에 이 PyPI 패키지를 설치합니다.
Amazon S3에 대한 연결 구성
Amazon 제공업체 패키지는 Amazon S3의 연결 유형을 제공합니다. 이 유형의 연결을 만듭니다. 이름이 google_cloud_default인 Cloud Storage 연결이 이미 환경에 설정되어 있습니다.
다음 예시에서는 aws_s3이라는 연결을 사용합니다. 연결에는 이 이름 또는 다른 이름을 사용할 수 있습니다.
Amazon Web Services Connection에 대한 Airflow 문서에 설명된 대로 연결 매개변수를 지정합니다.
예를 들어 AWS 액세스 키와의 연결을 설정하려면 AWS에서 계정의 액세스 키를 생성한 다음 연결에 대해 AWS 액세스 키 ID를 로그인으로, AWS 보안 비밀 액세스 키를 비밀번호로 제공합니다.
Amazon S3에서 데이터 전송
나중에 다른 DAG 또는 태스크에서 동기화된 데이터에 대해 작업하려면 환경 버킷의 /data 폴더로 가져옵니다. 이 폴더는 다른 Airflow 작업자와 동기화되므로 DAG의 태스크가 여기에서 작동할 수 있습니다.
다음 DAG 예시는 다음을 수행합니다.
S3 버킷의 /data-for-gcs 디렉터리 콘텐츠를 환경 버킷의 /data/from-s3/data-for-gcs/ 폴더에 동기화합니다.
데이터가 사용자 환경의 모든 Airflow 작업자와 동기화될 때까지 2분 정도 기다립니다.
ls 명령어를 사용하여 이 디렉터리의 파일 목록을 출력합니다. 이 태스크를 데이터에 사용할 수 있는 다른 Airflow 연산자로 바꿉니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-03(UTC)"],[[["\u003cp\u003eGoogle Transfer Operators in Cloud Composer 2 facilitate data transfer from external services into Google Cloud.\u003c/p\u003e\n"],["\u003cp\u003eThe guide demonstrates synchronizing data from Amazon S3 and Azure FileShare to Cloud Storage buckets using specific Airflow operators.\u003c/p\u003e\n"],["\u003cp\u003eTransferring data involves installing the appropriate provider package (e.g., \u003ccode\u003eapache-airflow-providers-amazon\u003c/code\u003e or \u003ccode\u003eapache-airflow-providers-microsoft-azure\u003c/code\u003e) and configuring connections in the Airflow UI.\u003c/p\u003e\n"],["\u003cp\u003eThe example DAGs shown can be used to transfer data, wait for it to synchronize across all the workers, and perform operations on the data with operators.\u003c/p\u003e\n"],["\u003cp\u003eIt is highly recommended to store all credentials for the connections using Secret Manager for enhanced security.\u003c/p\u003e\n"]]],[],null,["\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\n[Cloud Composer 3](/composer/docs/composer-3/transfer-data-with-transfer-operators \"View this page for Cloud Composer 3\") \\| **Cloud Composer 2** \\| [Cloud Composer 1](/composer/docs/composer-1/transfer-data-with-transfer-operators \"View this page for Cloud Composer 1\")\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\nThis page demonstrates how to transfer data from other services with Google\nTransfer Operators in your DAGs.\n\nAbout Google Transfer Operators\n\n[Google Transfer Operators](https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/transfer/index.html) are a\nset of Airflow operators that you can use to pull data from other services into\nGoogle Cloud.\n\nThis guide shows operators for Azure FileShare Storage and Amazon S3 that work\nwith Cloud Storage. There are many more transfer operators that work\nwith services within Google Cloud and with services other than\nGoogle Cloud.\n\nAmazon S3 to Cloud Storage\n\nThis section demonstrates how to synchronize data from Amazon S3 to a\nCloud Storage bucket.\n\nInstall the Amazon provider package\n\nThe `apache-airflow-providers-amazon` package contains the connection\ntypes and functionality that interacts with Amazon S3.\n[Install this PyPI package](/composer/docs/composer-2/install-python-dependencies#install-pypi) in your\nenvironment.\n\nConfigure a connection to Amazon S3\n\nThe Amazon provider package provides a connection type for Amazon S3. You\ncreate a connection of this type. The connection for Cloud Storage,\nnamed `google_cloud_default` is already set up in your environment.\n\nSet up a connection to Amazon S3 in the following way:\n\n1. In [Airflow UI](/composer/docs/composer-2/access-airflow-web-interface), go to **Admin** \\\u003e **Connections**.\n2. Create a new connection.\n3. Select `Amazon S3` as the connection type.\n4. The following example uses a connection named `aws_s3`. You can use this name, or any other name for the connection.\n5. Specify connection parameters as described in the Airflow documentation for [Amazon Web Services Connection](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html). For example, to set up a connection with AWS access keys, you generate an access key for your account on AWS, then provide the AWS access key ID as a login the AWS secret access key as a password for the connection.\n\n| **Note:** We recommend to **store all credentials for connections in Secret Manager** . For more information, see [Configure Secret Manager for your environment](/composer/docs/composer-2/configure-secret-manager). For example, you can create a secret named `airflow-connections-aws_s3` that stores credentials for the `aws_s3` connection.\n\nTransfer data from Amazon S3\n\nIf you want to operate on the synchronized data later in another DAG or task,\npull it to the `/data` folder of your environment's bucket. This folder is\nsynchronized to other Airflow workers, so that tasks in your DAG\ncan operate on it.\n\nThe following example DAG does the following:\n\n- Synchronizes contents of the `/data-for-gcs` directory from an S3 bucket to the `/data/from-s3/data-for-gcs/` folder in your environment's bucket.\n- Waits for two minutes, for the data to synchronize to all Airflow workers in your environment.\n- Outputs the list of files in this directory using the `ls` command. Replace this task with other Airflow operators that work with your data.\n\n import datetime\n import airflow\n from airflow.providers.google.cloud.transfers.s3_to_gcs import S3ToGCSOperator\n from airflow.operators.bash_operator import BashOperator\n\n with airflow.DAG(\n 'composer_sample_aws_to_gcs',\n start_date=datetime.datetime(2022, 1, 1),\n schedule_interval=None,\n ) as dag:\n\n transfer_dir_from_s3 = S3ToGCSOperator(\n task_id='transfer_dir_from_s3',\n aws_conn_id='aws_s3',\n prefix='data-for-gcs',\n bucket='example-s3-bucket-transfer-operators',\n dest_gcs='gs://us-central1-example-environ-361f2312-bucket/data/from-s3/')\n\n sleep_2min = BashOperator(\n task_id='sleep_2min',\n bash_command='sleep 2m')\n\n print_dir_files = BashOperator(\n task_id='print_dir_files',\n bash_command='ls /home/airflow/gcs/data/from-s3/data-for-gcs/')\n\n\n transfer_dir_from_s3 \u003e\u003e sleep_2min \u003e\u003e print_dir_files\n\nAzure FileShare to Cloud Storage\n\nThis section demonstrates how to synchronize data from Azure FileShare to a\nCloud Storage bucket.\n\nInstall the Microsoft Azure provider package\n\nThe `apache-airflow-providers-microsoft-azure` package contains the connection\ntypes and functionality that interacts with Microsoft Azure.\n[Install this PyPI package](/composer/docs/composer-2/install-python-dependencies#install-pypi) in your\nenvironment.\n\nConfigure a connection to Azure FileShare\n\nThe Microsoft Azure provider package provides a connection type for Azure File\nShare. You create a connection of this type. The connection for\nCloud Storage, named `google_cloud_default` is already set up in\nyour environment.\n\nSet up a connection to Azure FileShare in the following way:\n\n1. In [Airflow UI](/composer/docs/composer-2/access-airflow-web-interface), go to **Admin** \\\u003e **Connections**.\n2. Create a new connection.\n3. Select `Azure FileShare` as the connection type.\n4. The following example uses a connection named `azure_fileshare`. You can use this name, or any other name for the connection.\n5. Specify connection parameters as described in the Airflow documentation for [Microsoft Azure File Share Connection](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/connections/azure_fileshare.html). For example, you can specify a connection string for your storage account access key.\n\n| **Note:** We recommend to **store all credentials for connections in Secret Manager** . For more information, see [Configure Secret Manager for your environment](/composer/docs/composer-2/configure-secret-manager). For example, you can create a secret named `airflow-connections-azure_fileshare` that stores credentials for the `azure_fileshare` connection.\n\nTransfer data from Azure FileShare\n\nIf you want to operate on the synchronized data later in another DAG or task,\npull it to the `/data` folder of your environment's bucket. This folder is\nsynchronized to other Airflow workers, so that tasks in your DAG\ncan operate on it.\n\nThe following DAG does the following:\n\nThe following example DAG does the following:\n\n- Synchronizes contents of the `/data-for-gcs` directory from Azure File Share to the `/data/from-azure` folder in your environment's bucket.\n- Waits for two minutes, for the data to synchronize to all Airflow workers in your environment.\n- Outputs the list of files in this directory using the `ls` command. Replace this task with other Airflow operators that work with your data.\n\n import datetime\n import airflow\n from airflow.providers.google.cloud.transfers.azure_fileshare_to_gcs import AzureFileShareToGCSOperator\n from airflow.operators.bash_operator import BashOperator\n\n with airflow.DAG(\n 'composer_sample_azure_to_gcs',\n start_date=datetime.datetime(2022, 1, 1),\n schedule_interval=None,\n ) as dag:\n\n transfer_dir_from_azure = AzureFileShareToGCSOperator(\n task_id='transfer_dir_from_azure',\n azure_fileshare_conn_id='azure_fileshare',\n share_name='example-file-share',\n directory_name='data-for-gcs',\n dest_gcs='gs://us-central1-example-environ-361f2312-bucket/data/from-azure/')\n\n sleep_2min = BashOperator(\n task_id='sleep_2min',\n bash_command='sleep 2m')\n\n print_dir_files = BashOperator(\n task_id='print_dir_files',\n bash_command='ls /home/airflow/gcs/data/from-azure/')\n\n\n transfer_dir_from_azure \u003e\u003e sleep_2min \u003e\u003e print_dir_files\n\nWhat's next\n\n- [Use GKE operators](/composer/docs/composer-2/use-gke-operator)"]]