Stay organized with collections
Save and categorize content based on your preferences.
This page explains how to parse files when you prepare data in the Wrangler
workspace of the Cloud Data Fusion Studio. Wrangler lets you parse a file before
loading it into the Wrangler workspace:
Wrangler infers data types and maps each column to the inferred data type in
the same way file source plugins do in the Pipeline Studio.
When schema inference isn't possible, you can import the schema for a file
format, such as JSON.
The recipe doesn't include the parse directive, which reduces transformation
logic during pipeline runs.
When you create a pipeline from Wrangler, the source plugin includes all the
same parsing properties and values that you set in Wrangler.
Create a file connection
To parse a file before loading it into Wrangler, you must use a file connection,
such as File, Cloud Storage, or Amazon S3.
To open the parsing options dialog, go to the Select data panel and
click the name of the file.
In the Parsing options dialog, enter the following information:
In the Format field, choose the file format of the data being
read—for example, csv. For more information, see Supported
formats.
If you choose the delimiter format, in the Delimiter field that
appears, enter the delimiter information.
If you choose CSV, TSV, or delimiter format, an Enable quoted
values field appears. If your data is wrapped in quotation marks,
select True. This setting trims quotation marks from the parsed
output. For example, the following input, 1, "a, b, c", parses
into two fields. The first field has the value: 1. The second
field has the value: a, b, c. The newline delimiter cannot be
within quotes.
If you chose text, CSV, TSV, or delimiter format, a Use first row
as header field appears. To use the first line of each file as
a column header, select True.
In the File encoding field, choose the file encoding type of the
source file—for example, UTF-8.
Optional: to import the schema or override the inferred schema for the
file, click Import Schema. You import the schema for formats, such
as JSON and some Avro files, where schema inference isn't possible. The
schema must be in the Avro format.
Click Confirm. The parsed file appears in the Wrangler workspace.
Supported formats
The following formats are supported for file parsing:
Avro
Blob (the blob format requires a schema that contains a field named body
of type bytes)
CSV
Delimited
JSON
Parquet
Text (the text format requires a schema that contains a field named body
of type string)
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eWrangler allows you to parse files before loading them, inferring data types and mapping columns automatically, similar to file source plugins in the Pipeline Studio.\u003c/p\u003e\n"],["\u003cp\u003eFile parsing in Wrangler requires the use of a file connection such as File, Cloud Storage, or Amazon S3, and offers the ability to manage these connections.\u003c/p\u003e\n"],["\u003cp\u003eThe parsing options dialog lets you specify the file format, delimiter (if applicable), enable quoted values, use the first row as a header, and select the file encoding type.\u003c/p\u003e\n"],["\u003cp\u003eYou can optionally import a schema, which is necessary for file formats like JSON where schema inference is not possible.\u003c/p\u003e\n"],["\u003cp\u003eWrangler supports parsing various file formats, including Avro, Blob, CSV, Delimited, JSON, Parquet, Text, and TSV, each with its own specific requirements.\u003c/p\u003e\n"]]],[],null,["# Parse files\n\nThis page explains how to parse files when you prepare data in the Wrangler\nworkspace of the Cloud Data Fusion Studio. Wrangler lets you parse a file before\nloading it into the Wrangler workspace:\n\n- Wrangler infers data types and maps each column to the inferred data type in the same way file source plugins do in the Pipeline Studio.\n- When schema inference isn't possible, you can import the schema for a file format, such as JSON.\n- The recipe doesn't include the parse directive, which reduces transformation logic during pipeline runs.\n- When you create a pipeline from Wrangler, the source plugin includes all the same parsing properties and values that you set in Wrangler.\n\nCreate a file connection\n------------------------\n\nTo parse a file before loading it into Wrangler, you must use a file connection,\nsuch as File, Cloud Storage, or Amazon S3.\n\n1. [Go to the Wrangler workspace in Cloud Data Fusion](/data-fusion/docs/concepts/wrangler-overview#navigate-to-wrangler).\n2. Click the **Select data** expander arrow to view the available connections.\n3. Add a connection for File, Cloud Storage, or S3. For more information, see [Create and manage connections](/data-fusion/docs/how-to/managing-connections).\n4. To open the parsing options dialog, go to the **Select data** panel and click the name of the file.\n5. In the **Parsing options** dialog, enter the following information:\n\n 1. In the **Format** field, choose the file format of the data being\n read---for example, **csv** . For more information, see [Supported\n formats](#supported-formats).\n\n - If you choose the delimiter format, in the **Delimiter** field that appears, enter the delimiter information.\n - If you choose CSV, TSV, or delimiter format, an **Enable quoted\n values** field appears. If your data is wrapped in quotation marks, select **True** . This setting trims quotation marks from the parsed output. For example, the following input, `1, \"a, b, c\"`, parses into two fields. The first field has the value: `1`. The second field has the value: `a, b, c`. The newline delimiter cannot be within quotes.\n - If you chose text, CSV, TSV, or delimiter format, a **Use first row\n as header** field appears. To use the first line of each file as a column header, select **True**.\n 2. In the **File encoding** field, choose the file encoding type of the\n source file---for example, **UTF-8**.\n\n 3. Optional: to import the schema or override the inferred schema for the\n file, click **Import Schema**. You import the schema for formats, such\n as JSON and some Avro files, where schema inference isn't possible. The\n schema must be in the Avro format.\n\n 4. Click **Confirm**. The parsed file appears in the Wrangler workspace.\n\nSupported formats\n-----------------\n\nThe following formats are supported for file parsing:\n\n- Avro\n- Blob (the blob format requires a schema that contains a field named `body` of type `bytes`)\n- CSV\n- Delimited\n- JSON\n- Parquet\n- Text (the text format requires a schema that contains a field named `body` of type `string`)\n- TSV\n\nWhat's next\n-----------\n\n- Learn more about [Wrangler directives](/data-fusion/docs/concepts/wrangler-overview#apply_directives)."]]