Stay organized with collections Save and categorize content based on your preferences.

Document splitters behavior

General splitter behavior

Splitter output contains split information for the input document, including a confidence score. The Document AI API outputs a Document JSON object, and the output format uses the entities field for representing document splits. Additional information depends on the specific type of splitter.

  • Entity.type specifies the document classification. For a full list of document types that can be identified, see the following lists.

  • Entity.pageAnchor.pageRefs[] specifies the pages that contain each sub-document. Note that pageRefs[].page is zero-based and is the index into the document.pages[] field.

The splitter is not designed to split logical documents that are over 30 pages long. Logical documents that are more than 30 pages long (e.g. a 40-page bank statement) may be split into two or more docs and classified separately.

Splitters identify page boundaries, but do not actually split the input document for you. Here is a code sample that physically splits a PDF file by using the page boundaries:

Document types identified

Procurement Document Splitter & Classifier

Show types
  • Utility statement: A bill or receipt issued by an utility company (telecommunications, gas, electric, cable service) that shows the amount owed by the customer for the services provided. This may also show the previous payments made by the customer for current or prior services.
    • Return type(s): utility_statement
  • Debit note: A document issued by a business stating a monetary amount a client owes to the business.
    • Return type(s): debit_note
  • Credit note: A document issued by a business that needs to provide a client with a discount or a refund, or to correct a previous invoicing error.
    • Return type(s): credit_note
  • Credit Card Slip: A document that shows a payment made by credit card. It typically includes the total charge amount, a tip amount (mostly US documents), and a total payment. Tip amount and total payment are usually handwritten. This doc is relevant for expense processing. It is not a suitable proof of expense in expense processing.
    • Return type(s): credit_card_slip
  • Restaurant statement: A document issued by a restaurant to a customer itemizing the specific items consumed, the taxes, total amount, tips, and amount paid.
    • Return type(s): restaurant_statement
  • Air travel statement: A document issued by an airline to a customer itemizing the specific flight and non-flight charges, and the amount paid (if available).
    • Return type(s): air_travel_statement
  • Hotel statement: A document issued by a hotel to a customer itemizing the specific charges related to a hotel stay, and the amount paid (if available).
    • Return type(s): hotel_statement
  • Car rental statement: A document issued by a car rental company to a customer itemizing the specific charges related to a car rental, and the amount paid (if available).
    • Return type(s): car_rental_statement
  • Ground transportation statement: A document issued by a ground transportation company (ride sharing, train/subway) to a customer itemizing the specific charges related to a trip, and the amount paid (if available).
    • Return type(s): ground_transportation_statement
  • Invoice statement: A document sent by the seller to the customer that requests payments for products or services and (for the purpose of our taxonomy) is not covered by any other document type definition.
    • Return type(s): invoice_statement
  • Receipt statement: A document that shows proof of payment which confirms that a customer has received the goods and services they paid a business for. Conversely, this can be a document showing the business was compensated for the goods or services they sold to a customer and (for the purpose of our taxonomy) is not covered by any other document type definition.
    • Return type(s): receipt_statement
  • If the splitter cannot identify the type of the document, it returns other.

Lending Document Splitter & Classifier

Show types
  • 1003 - Legacy Form (standard and customized versions)
    • Return type(s): 1003[1], 1003_2009
  • 1040 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1040[1], 1040_2018, 1040_2019, 1040_2020[1]
  • 1040 Schedule C - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1040sc[1], 1040sc_2018[1], 1040sc_2019, 1040sc_2020
  • 1040 Schedule E - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1040se[1], 1040se_2018[1], 1040se_2019, 1040se_2020
  • 1065 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1065[1], 1065_2018[1], 1065_2019, 1065_2020
  • 1099-DIV - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099div[1], 1099div_2018, 1099div_2019, 1099div_2020
  • 1099-G - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099g[1], 1099g_2018[1], 1099g_2019, 1099g_2020
  • 1099-INT - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099int[1], 1099int_2018, 1099int_2019, 1099int_2020
  • 1099-MISC - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099misc[1], 1099misc_2018, 1099misc_2019, 1099misc_2020
  • 1099-NEC - 2020 (standard and customized versions)
    • Return type(s): 1099nec[1], 1099nec_2020
  • 1099-R - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099r[1], 1099r_2018, 1099r_2019, 1099r_2020
  • 1120 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1120[1], 1120_2018[1], 1120_2019, 1120_2020
  • 1120S - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1120s[1], 1120s_2018[1], 1120s_2019, 1120s_2020
  • Bank Statement
    • Return type(s): account_statement_bank
  • Pay Slip
    • Return type(s): payslip
  • SSA-1099 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099ssa[1], 1099ssa_2018[1], 1099ssa_2019, 1099ssa_2020
  • US Driver License
    • Return type(s): US_Driver_License
  • US Pasport
    • Return type(s): US_Passport
  • W2 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): w2[1], w2_2018, w2_2019, w2_2020
  • W9 - Rev. 10-2018, Rev. 11-2017
    • Return type(s): w9[1], w9_2017, w9_2018
  • If the splitter cannot identify the type of the document, it returns other.

[1] The corresponding parser for this form does not support this document type. This means that the splitter can identify and classify documents of this type, but Document AI does not provide a parser to extract information.

Output examples