이 가이드에서는 Cloud Storage에서 사용할 수 있는 Azure FileShare Storage 및 Amazon S3의 연산자를 보여줍니다. 이 외에도 Google Cloud 내의 서비스와Google Cloud이외의 서비스에서 사용할 수 있는 전송 연산자가 더 많이 있습니다.
시작하기 전에
이 가이드는 Airflow 2용입니다. 환경에서 Airflow 1을 사용하는 경우 백포트 제공업체 패키지를 사용하여 연산자를 가져오고 사용자 환경에서 필요한 연결 유형을 사용할 수 있도록 합니다.
Amazon S3에서 Cloud Storage로 전송
이 섹션에서는 Amazon S3의 데이터를 Cloud Storage 버킷에 동기화하는 방법을 보여줍니다.
Amazon 제공업체 패키지 설치
apache-airflow-providers-amazon 패키지에는 Amazon S3와 상호작용하는 연결 유형과 기능이 포함됩니다.
사용자 환경에 이 PyPI 패키지를 설치합니다.
Amazon S3에 대한 연결 구성
Amazon 제공업체 패키지는 Amazon S3의 연결 유형을 제공합니다. 이 유형의 연결을 만듭니다. 이름이 google_cloud_default인 Cloud Storage 연결이 이미 환경에 설정되어 있습니다.
다음 예시에서는 aws_s3이라는 연결을 사용합니다. 연결에는 이 이름 또는 다른 이름을 사용할 수 있습니다.
Amazon Web Services Connection에 대한 Airflow 문서에 설명된 대로 연결 매개변수를 지정합니다.
예를 들어 AWS 액세스 키와의 연결을 설정하려면 AWS에서 계정의 액세스 키를 생성한 다음 연결에 대해 AWS 액세스 키 ID를 로그인으로, AWS 보안 비밀 액세스 키를 비밀번호로 제공합니다.
Amazon S3에서 데이터 전송
나중에 다른 DAG 또는 태스크에서 동기화된 데이터에 대해 작업하려면 환경 버킷의 /data 폴더로 가져옵니다. 이 폴더는 다른 Airflow 작업자와 동기화되므로 DAG의 태스크가 여기에서 작동할 수 있습니다.
다음 DAG 예시는 다음을 수행합니다.
S3 버킷의 /data-for-gcs 디렉터리 콘텐츠를 환경 버킷의 /data/from-s3/data-for-gcs/ 폴더에 동기화합니다.
데이터가 사용자 환경의 모든 Airflow 작업자와 동기화될 때까지 2분 정도 기다립니다.
ls 명령어를 사용하여 이 디렉터리의 파일 목록을 출력합니다. 이 태스크를 데이터에 사용할 수 있는 다른 Airflow 연산자로 바꿉니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-08-29(UTC)"],[[["\u003cp\u003eThis page demonstrates how to use Google Transfer Operators in Cloud Composer 1 to transfer data from external services like Amazon S3 and Azure FileShare into Google Cloud Storage.\u003c/p\u003e\n"],["\u003cp\u003eTo use these operators, you must first install the relevant provider packages, such as \u003ccode\u003eapache-airflow-providers-amazon\u003c/code\u003e for Amazon S3 and \u003ccode\u003eapache-airflow-providers-microsoft-azure\u003c/code\u003e for Azure FileShare, in your Cloud Composer environment.\u003c/p\u003e\n"],["\u003cp\u003eA connection to the external service must be configured through the Airflow UI, where you can define parameters like access keys, connection strings, or account details, and it is recommended to store all credentials for connections in Secret Manager.\u003c/p\u003e\n"],["\u003cp\u003eExample DAGs are provided that synchronize data from both Amazon S3 and Azure FileShare to a specified folder within the Cloud Storage bucket of the environment, where the data can then be used in tasks or other DAGs.\u003c/p\u003e\n"],["\u003cp\u003eThe page emphasizes that these operators support transferring data from various services, not just the mentioned examples, and recommends the use of backport provider packages for Airflow 1 environments.\u003c/p\u003e\n"]]],[],null,["# Transfer data from other services with Google Transfer Operators\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\n[Cloud Composer 3](/composer/docs/composer-3/transfer-data-with-transfer-operators \"View this page for Cloud Composer 3\") \\| [Cloud Composer 2](/composer/docs/composer-2/transfer-data-with-transfer-operators \"View this page for Cloud Composer 2\") \\| **Cloud Composer 1**\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\nThis page demonstrates how to transfer data from other services with Google\nTransfer Operators in your DAGs.\n\nAbout Google Transfer Operators\n-------------------------------\n\n[Google Transfer Operators](https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/transfer/index.html) are a\nset of Airflow operators that you can use to pull data from other services into\nGoogle Cloud.\n\nThis guide shows operators for Azure FileShare Storage and Amazon S3 that work\nwith Cloud Storage. There are many more transfer operators that work\nwith services within Google Cloud and with services other than\nGoogle Cloud.\n\nBefore you begin\n----------------\n\n- This guide is for Airflow 2. If your environment uses Airflow 1, use [backport provider packages](/composer/docs/composer-1/backport-packages) to import operators and to make required connection types available in your environment.\n\nAmazon S3 to Cloud Storage\n--------------------------\n\nThis section demonstrates how to synchronize data from Amazon S3 to a\nCloud Storage bucket.\n\n### Install the Amazon provider package\n\nThe `apache-airflow-providers-amazon` package contains the connection\ntypes and functionality that interacts with Amazon S3.\n[Install this PyPI package](/composer/docs/composer-1/install-python-dependencies#install-pypi) in your\nenvironment.\n\n### Configure a connection to Amazon S3\n\nThe Amazon provider package provides a connection type for Amazon S3. You\ncreate a connection of this type. The connection for Cloud Storage,\nnamed `google_cloud_default` is already set up in your environment.\n\nSet up a connection to Amazon S3 in the following way:\n\n1. In [Airflow UI](/composer/docs/composer-1/access-airflow-web-interface), go to **Admin** \\\u003e **Connections**.\n2. Create a new connection.\n3. Select `Amazon S3` as the connection type.\n4. The following example uses a connection named `aws_s3`. You can use this name, or any other name for the connection.\n5. Specify connection parameters as described in the Airflow documentation for [Amazon Web Services Connection](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html). For example, to set up a connection with AWS access keys, you generate an access key for your account on AWS, then provide the AWS access key ID as a login the AWS secret access key as a password for the connection.\n\n| **Note:** We recommend to **store all credentials for connections in Secret Manager** . For more information, see [Configure Secret Manager for your environment](/composer/docs/composer-1/configure-secret-manager). For example, you can create a secret named `airflow-connections-aws_s3` that stores credentials for the `aws_s3` connection.\n\n### Transfer data from Amazon S3\n\nIf you want to operate on the synchronized data later in another DAG or task,\npull it to the `/data` folder of your environment's bucket. This folder is\nsynchronized to other Airflow workers, so that tasks in your DAG\ncan operate on it.\n\nThe following example DAG does the following:\n\n- Synchronizes contents of the `/data-for-gcs` directory from an S3 bucket to the `/data/from-s3/data-for-gcs/` folder in your environment's bucket.\n- Waits for two minutes, for the data to synchronize to all Airflow workers in your environment.\n- Outputs the list of files in this directory using the `ls` command. Replace this task with other Airflow operators that work with your data.\n\n import datetime\n import airflow\n from airflow.providers.google.cloud.transfers.s3_to_gcs import S3ToGCSOperator\n from airflow.operators.bash_operator import BashOperator\n\n with airflow.DAG(\n 'composer_sample_aws_to_gcs',\n start_date=datetime.datetime(2022, 1, 1),\n schedule_interval=None,\n ) as dag:\n\n transfer_dir_from_s3 = S3ToGCSOperator(\n task_id='transfer_dir_from_s3',\n aws_conn_id='aws_s3',\n prefix='data-for-gcs',\n bucket='example-s3-bucket-transfer-operators',\n dest_gcs='gs://us-central1-example-environ-361f2312-bucket/data/from-s3/')\n\n sleep_2min = BashOperator(\n task_id='sleep_2min',\n bash_command='sleep 2m')\n\n print_dir_files = BashOperator(\n task_id='print_dir_files',\n bash_command='ls /home/airflow/gcs/data/from-s3/data-for-gcs/')\n\n\n transfer_dir_from_s3 \u003e\u003e sleep_2min \u003e\u003e print_dir_files\n\nAzure FileShare to Cloud Storage\n--------------------------------\n\nThis section demonstrates how to synchronize data from Azure FileShare to a\nCloud Storage bucket.\n\n### Install the Microsoft Azure provider package\n\nThe `apache-airflow-providers-microsoft-azure` package contains the connection\ntypes and functionality that interacts with Microsoft Azure.\n[Install this PyPI package](/composer/docs/composer-1/install-python-dependencies#install-pypi) in your\nenvironment.\n\n### Configure a connection to Azure FileShare\n\nThe Microsoft Azure provider package provides a connection type for Azure File\nShare. You create a connection of this type. The connection for\nCloud Storage, named `google_cloud_default` is already set up in\nyour environment.\n\nSet up a connection to Azure FileShare in the following way:\n\n1. In [Airflow UI](/composer/docs/composer-1/access-airflow-web-interface), go to **Admin** \\\u003e **Connections**.\n2. Create a new connection.\n3. Select `Azure FileShare` as the connection type.\n4. The following example uses a connection named `azure_fileshare`. You can use this name, or any other name for the connection.\n5. Specify connection parameters as described in the Airflow documentation for [Microsoft Azure File Share Connection](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/connections/azure_fileshare.html). For example, you can specify a connection string for your storage account access key.\n\n| **Note:** We recommend to **store all credentials for connections in Secret Manager** . For more information, see [Configure Secret Manager for your environment](/composer/docs/composer-1/configure-secret-manager). For example, you can create a secret named `airflow-connections-azure_fileshare` that stores credentials for the `azure_fileshare` connection.\n\n### Transfer data from Azure FileShare\n\nIf you want to operate on the synchronized data later in another DAG or task,\npull it to the `/data` folder of your environment's bucket. This folder is\nsynchronized to other Airflow workers, so that tasks in your DAG\ncan operate on it.\n\nThe following DAG does the following:\n\nThe following example DAG does the following:\n\n- Synchronizes contents of the `/data-for-gcs` directory from Azure File Share to the `/data/from-azure` folder in your environment's bucket.\n- Waits for two minutes, for the data to synchronize to all Airflow workers in your environment.\n- Outputs the list of files in this directory using the `ls` command. Replace this task with other Airflow operators that work with your data.\n\n import datetime\n import airflow\n from airflow.providers.google.cloud.transfers.azure_fileshare_to_gcs import AzureFileShareToGCSOperator\n from airflow.operators.bash_operator import BashOperator\n\n with airflow.DAG(\n 'composer_sample_azure_to_gcs',\n start_date=datetime.datetime(2022, 1, 1),\n schedule_interval=None,\n ) as dag:\n\n transfer_dir_from_azure = AzureFileShareToGCSOperator(\n task_id='transfer_dir_from_azure',\n azure_fileshare_conn_id='azure_fileshare',\n share_name='example-file-share',\n directory_name='data-for-gcs',\n dest_gcs='gs://us-central1-example-environ-361f2312-bucket/data/from-azure/')\n\n sleep_2min = BashOperator(\n task_id='sleep_2min',\n bash_command='sleep 2m')\n\n print_dir_files = BashOperator(\n task_id='print_dir_files',\n bash_command='ls /home/airflow/gcs/data/from-azure/')\n\n\n transfer_dir_from_azure \u003e\u003e sleep_2min \u003e\u003e print_dir_files\n\nWhat's next\n-----------\n\n- [Use GKE operators](/composer/docs/composer-1/use-gke-operator)"]]