Azure Data Lake Storage
The Azure Data Lake Storage connector lets you connect to Azure Data Lake Storage and use SQL to retrieve and update the Azure Data Lake Storage data.
Before you begin
Before using the Azure Data Lake Storage connector, do the following tasks:
- In your Google Cloud project:
- Ensure that network connectivity is set up. For information about network patterns, see Network connectivity.
- Grant the roles/connectors.admin IAM role to the user configuring the connector.
- Grant the following IAM roles to the service account that you want to use for the connector:
roles/secretmanager.viewer
roles/secretmanager.secretAccessor
A service account is a special type of Google account intended to represent a non-human user that needs to authenticate and be authorized to access data in Google APIs. If you don't have a service account, you must create a service account. For more information, see Creating a service account.
- Enable the following services:
secretmanager.googleapis.com
(Secret Manager API)connectors.googleapis.com
(Connectors API)
To understand how to enable services, see Enabling services.
If these services or permissions have not been enabled for your project previously, you are prompted to enable them when configuring the connector.
Configure the connector
Configuring the connector requires you to create a connection to your data source (backend system). A connection is specific to a data source. It means that if you have many data sources, you must create a separate connection for each data source. To create a connection, do the following steps:
- In the Cloud console, go to the Integration Connectors > Connections page and then select or create a Google Cloud project.
- Click + Create new to open the Create Connection page.
- In the Location section, choose the location for the connection.
- Region: Select a location from the drop-down list.
Supported regions for connectors include:
For the list of all the supported regions, see Locations.
- Click Next.
- Region: Select a location from the drop-down list.
- In the Connection Details section, complete the following:
- Connector: Select Azure Data Lake Storage from the drop down list of available Connectors.
- Connector version: Select the Connector version from the drop down list of available versions.
- In the Connection Name field, enter a name for the Connection instance.
Connection names must meet the following criteria:
- Connection names can use letters, numbers, or hyphens.
- Letters must be lower-case.
- Connection names must begin with a letter and end with a letter or number.
- Connection names cannot exceed 49 characters.
- Optionally, enter a Description for the connection instance.
- Optionally, enable Cloud logging,
and then select a log level. By default, the log level is set to
Error
. - Service Account: Select a service account that has the required roles.
- Optionally, configure the Connection node settings:
- Minimum number of nodes: Enter the minimum number of connection nodes.
- Maximum number of nodes: Enter the maximum number of connection nodes.
A node is a unit (or replica) of a connection that processes transactions. More nodes are required to process more transactions for a connection and conversely, fewer nodes are required to process fewer transactions. To understand how the nodes affect your connector pricing, see Pricing for connection nodes. If you don't enter any values, by default the minimum nodes are set to 2 (for better availability) and the maximum nodes are set to 50.
- Account: This property specifies the name of the Azure Data Lake Storage account.
- Directory: This property specifies the root path to list files and folders.
- File System: This property specifies the name of the FileSystem which will be used in a Gen 2 storage account. For Example, the name of your Azure blob container.
- Chunk Size: The size of chunks (in Mb) to use when uploading large files.
- Include Sub Directories: Choose if the sub directories paths should be listed in the Resources view in the ADLSGen2 Schema.
- Optionally, click + Add label to add a label to the Connection in the form of a key/value pair.
- Click Next.
-
In the Authentication section, enter the authentication details.
- Select an Authentication type and enter the relevant details.
The following authentication types are supported by the Azure Data Lake Storage connection:
- Shared Access Signature
- Account Access Key
- Click Next.
To understand how to configure these authentication types, see Configure authentication.
- Select an Authentication type and enter the relevant details.
- Review: Review your connection and authentication details.
- Click Create.
Configure authentication
Enter the details based on the authentication you want to use.
-
Shared Access Signature
If you want to use anonymous login, select Not Available.
- Shared Access Signature: Secret Manager Secret containing the Shared Access Signature.
-
Account Access Key
If you want to use anonymous login, select Not Available.
- Account Access Key: Secret Manager Secret containing the Account Access Key.
Entities, operations, and actions
All the Integration Connectors provide a layer of abstraction for the objects of the connected application. You can access an application's objects only through this abstraction. The abstraction is exposed to you as entities, operations, and actions.
- Entity: An entity can be thought of as an object, or a collection of properties, in the
connected application or service. The definition of an entity differs from a connector to a
connector. For example, in a database connector, tables are the entities, in a
file server connector, folders are the entities, and in a messaging system connector,
queues are the entities.
However, it is possible that a connector doesn't support or have any entities, in which case the
Entities
list will be empty. - Operation: An operation is the activity that you can perform on an entity. You can perform
any of the following operations on an entity:
Selecting an entity from the available list, generates a list of operations available for the entity. For a detailed description of the operations, see the Connectors task's entity operations. However, if a connector doesn't support any of the entity operations, such unsupported operations aren't listed in the
Operations
list. - Action: An action is a first class function that is made available to the integration
through the connector interface. An action lets you make changes to an entity or entities, and
vary from connector to connector. Normally, an action will have some input parameters, and an output
parameter. However, it is possible
that a connector doesn't support any action, in which case the
Actions
list will be empty.
System limitations
The Azure Data Lake Storage connector can process 5 transactions per second, per node, and throttles any transactions beyond this limit. By default, Integration Connectors allocates 2 nodes (for better availability) for a connection.
For information on the limits applicable to Integration Connectors, see Limits.
Actions
This section lists the actions supported by the connector. To understand how to configure the actions, see Action examples.
DownloadFile action
This action lets you download the contents of a particular blob from a directory or a container.
Input parameters of the DownloadFile action
Parameter Name | Data Type | Required | Description |
---|---|---|---|
Path | String | Yes | The path of the file (including the file name) to download. For example:
|
HasBytes | Boolean | No | Whether to download content as bytes (Base64 format).
false .
|
Output parameters of the DownloadFile action
If the action is successful, it returns the contents of the file or the blob.
For example on how to configure the DownloadFile
action,
see Action examples.
CreateFile action
This action lets your create a blob or a file in a container or a directory.
Input parameters of the CreateFile action
Parameter Name | Data Type | Required | Description |
---|---|---|---|
Path | String | Yes | The path of the file which will be created. |
For example on how to configure the CreateFile
action,
see Action examples.
CopyFile action
This action lets you copy the contents of a file or a blob to another file or blob in the same container or directory.
Input parameters of the CopyFile action
Parameter Name | Data Type | Required | Description |
---|---|---|---|
SourcePath | String | Yes | The path of the file which will be copied. |
DestinationPath | String | Yes | The path of the file where it will be copied. |
For example on how to configure the CopyFile
action,
see Action examples.
DeleteObject action
This action lets you delete a file or a blob.
Input parameters of the DeleteObject action
Parameter Name | Data Type | Required | Description |
---|---|---|---|
Recursive | String | No | Set this to true to delete all the folder's content including any sub-folders. |
Path | String | Yes | The path of the file or folder to be deleted. |
DeleteType | String | Yes |
|
For example on how to configure the DeleteObject
action,
see Action examples.
LeaseBlob action
This action lets you create and manage a lock on a blob.
Input parameters of the LeaseBlob action
Parameter Name | Data Type | Required | Description |
---|---|---|---|
Path | String | Yes | The path of the file. |
LeaseAction | String | Yes | Specifies the lease action to execute. |
LeaseDuration | Integer | Yes | Specifies the duration of the lease. |
For example on how to configure the LeaseBlob
action,
see Action examples.
UploadFile Action
This action let users upload the contents to a particular blob or container.
Input parameters of the UploadFile action
Parameter Name | Data Type | Required | Description |
---|---|---|---|
Path | String | Yes | The path of the file to be uploaded. |
HasBytes | Boolean | No | Whether to upload content as bytes. |
Content | String | Yes | Content to upload. |
For example on how to configure the UploadFile
action,
see Action examples.
RenameObject Action
This action lets you rename a file or a folder.
Input parameters of the RenameObject action
Parameter Name | Data Type | Required | Description |
---|---|---|---|
Path | String | Yes | The path which will be renamed. |
RenameTo | String | Yes | The new name of the file or the folder. |
For example on how to configure the RenameObject
action,
see Action examples.
Action examples
Example - Download a file
This example downloads a binary file.
- In the
Configure connector task
dialog, clickActions
. - Select the
DownloadFile
action, and then click Done. - In the Task Input section of the Connectors task, click
connectorInputPayload
and then enter a value similar to the following in theDefault Value
field:{ "Path": "testdirectory1/test1.pdf", "HasBytes": true }
If the action is successful, the
DownloadFile
task's connectorOutputPayload
response
parameter will have a value similar to the following:
[{ "Success": "True", "ContentBytes": "UEsDBBQABgAIAAAAIQCj77sdZQEAAFIFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooA" }]
Example - Upload a file
This example uploads content as a blob.
- In the
Configure connector task
dialog, clickActions
. - Select the
UploadFile
action, and then click Done. - In the Task Input section of the Connectors task, click
connectorInputPayload
and then enter a value similar to the following in theDefault Value
field:{ "Path": "testblob4", "HasBytes": true, "Content": "abcdef\nabcdef" }
If the action is successful, the
UploadFile
task's connectorOutputPayload
response
parameter will have a value similar to the following:
[{ "Success": "true" }]
Example - Create a file
This example creates a file in the specified directory.
- In the
Configure connector task
dialog, clickActions
. - Select the
CreateFile
action, and then click Done. - In the Task Input section of the Connectors task, click
connectorInputPayload
and then enter a value similar to the following in theDefault Value
field:{ "path": "testdirectory1/testblob" }
If the action is successful, the
CreateFile
task's connectorOutputPayload
response
parameter will have a value similar to the following:
[{ "Success": "true" }]
Example - Copy a file
This example copies a file from one location to another location.
- In the
Configure connector task
dialog, clickActions
. - Select the
CopyFile
action, and then click Done. - In the Task Input section of the Connectors task, click
connectorInputPayload
and then enter a value similar to the following in theDefault Value
field:{ "SourcePath": "testdirectory1/testblob", "DestinationPath": "testblob" }
If the action is successful, the
CopyFile
task's connectorOutputPayload
response
parameter will have a value similar to the following:
[{ "Success": "true" }]
Example - Delete a blob
This example deletes the specified blob.
- In the
Configure connector task
dialog, clickActions
. - Select the
DeleteObject
action, and then click Done. - In the Task Input section of the Connectors task, click
connectorInputPayload
and then enter a value similar to the following in theDefault Value
field:{ "path": "testdirectory1/testblob" }
If the action is successful, the
DeleteObject
task's connectorOutputPayload
response
parameter will have a value similar to the following:
[{ "Success": "true" }]
Example - Lease a blob
This example leases the specified blob.
- In the
Configure connector task
dialog, clickActions
. - Select the
LeaseBlob
action, and then click Done. - In the Task Input section of the Connectors task, click
connectorInputPayload
and then enter a value similar to the following in theDefault Value
field:{ "Path": "testblob2", "LeaseAction": "Acquire", "LeaseDuration": 60.0 }
If the action is successful, the
LeaseBlob
task's connectorOutputPayload
response
parameter will have a value similar to the following:
[{ "LeaseId": "7aae9ca2-f015-41b6-9bdf-5fd3401fc493", "Success": "true" }]
Example - Rename a blob
This example renames a blob.
- In the
Configure connector task
dialog, clickActions
. - Select the
RenameObject
action, and then click Done. - In the Task Input section of the Connectors task, click
connectorInputPayload
and then enter a value similar to the following in theDefault Value
field:{ "Path": "testblob", "RenameTo": "testblob6" }
If the action is successful, the
RenameObject
task's connectorOutputPayload
response
parameter will have a value similar to the following:
[{ "Success": true }]
Entity operation examples
This section shows how to perform some of the entity operations in this connector.
Example - List all the records
This example lists all the records in the Resource
entity.
- In the
Configure connector task
dialog, clickEntities
. - Select
Resource
from theEntity
list. - Select the
List
operation, and then click Done. - Optionally, in Task Input section of the Connectors task, you can filter your result set by specifying a filter clause. Specify the filter clause value always within the single quotes (').
Example - Get a record
This example gets a record with the specified ID from the Resource
entity.
- In the
Configure connector task
dialog, clickEntities
. - Select
Resource
from theEntity
list. - Select the
Get
operation, and then click Done. - In the Task Input section of the Connectors task, click EntityId and
then enter
testdirectory1/testblob1
in the Default Value field.Here,
testdirectory1/testblob1
is a unique record ID in theResource
entity.
Use the Azure Data Lake Storage connection in an integration
After you create the connection, it becomes available in both Apigee Integration and Application Integration. You can use the connection in an integration through the Connectors task.
- To understand how to create and use the Connectors task in Apigee Integration, see Connectors task.
- To understand how to create and use the Connectors task in Application Integration, see Connectors task.
Get help from the Google Cloud community
You can post your questions and discuss this connector in the Google Cloud community at Cloud Forums.What's next
- Understand how to suspend and resume a connection.
- Understand how to monitor connector usage.
- Understand how to view connector logs.