Stay organized with collections Save and categorize content based on your preferences.

Full processor and detail list

This page contains detailed information on all processors offered by Document AI. You can see a list of all processors by solution type.

All Document AI processors adhere to the Data Processing and Security Terms.

General processors

Document OCR (Optical Character Recognition)

Description

Identify and extract text in different types of documents.

This processor allows you to identify and extract text, including handwritten text, from documents in over 200 languages. The processor also uses machine learning to perform a quality assessment of a document based on the readability of its content.

Category General
Functions OCR, Quality Analysis
Release stage General availability
Access status Public
Type in API OCR_PROCESSOR
Supported languages
Full list of languages
  • af: Afrikaans
  • sq: Albanian
  • ar: Arabic
  • hy: Armenian
  • be: Belarusian
  • bn: Bengali
  • bg: Bulgarian
  • ca: Catalan
  • zh: Chinese
  • hr: Croatian
  • cs: Czech
  • da: Danish
  • nl: Dutch
  • en: English
  • et: Estonian
  • fil: Filipino
  • fi: Finnish
  • fr: French
  • de: German
  • el: Greek
  • gu: Gujarati
  • iw: Hebrew
  • hi: Hindi
  • hu: Hungarian
  • is: Icelandic
  • id: Indonesian
  • it: Italian
  • ja: Japanese
  • kn: Kannada
  • km: Khmer
  • ko: Korean
  • lo: Lao
  • lv: Latvian
  • lt: Lithuanian
  • mk: Macedonian
  • ms: Malay
  • ml: Malayalam
  • mr: Marathi
  • ne: Nepali
  • no: Norwegian
  • fa: Persian
  • pl: Polish
  • pt: Portuguese (Brazilian & Continental)
  • pa: Punjabi
  • ro: Romanian
  • ru: Russian
  • sr: Serbian
  • sk: Slovak
  • sl: Slovenian
  • es: Spanish
  • sv: Swedish
  • tl: Tagalog
  • ta: Tamil
  • te: Telugu
  • th: Thai
  • tr: Turkish
  • uk: Ukrainian
  • vi: Vietnamese
  • yi: Yiddish
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-ocr-v1.0-2020-09-23 Stable

None

None

pretrained-ocr-v1.1-2022-09-12 Release Candidate
Show fields
  • quality/defect_blurry
  • quality/defect_noisy
  • quality/defect_dark
  • quality/defect_faint
  • quality/defect_text_too_small
  • quality/defect_document_cutoff
  • quality/defect_text_cutoff
  • quality/defect_glare

None

Adds feature to perform quality assessment of a document based on its readability and get a quality score. This quality assessment is a quality score in [0, 1], where 1 means perfect quality. Quality score is returned in the image_quality_scores field on the Page object. All detected defects are listed as quality/defect_* and sorted in descending order by confidence value.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 500
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

Form Parser

Description

Extract form elements such as text and checkboxes.

Category General
Functions OCR, Form Parsing
Release stage General availability
Access status Public
Type in API FORM_PARSER_PROCESSOR
Notes
  • The current Form Parser model does not support checkboxes in tables, because the checkboxes are treated as key-value pairs. However the checkboxes, if recognized, may be stored in the table as Unicode characters (both checked and unchecked).
Supported languages
Full list of languages
  • af: Afrikaans
  • sq: Albanian
  • ca: Catalan
  • hr: Croatian
  • cs: Czech
  • da: Danish
  • nl: Dutch
  • en: English
  • et: Estonian
  • fil: Filipino
  • fi: Finnish
  • fr: French
  • de: German
  • hu: Hungarian
  • is: Icelandic
  • id: Indonesian
  • it: Italian
  • lv: Latvian
  • lt: Lithuanian
  • ms: Malay
  • no: Norwegian
  • pl: Polish
  • pt: Portuguese (Brazilian & Continental)
  • ro: Romanian
  • sr: Serbian
  • sk: Slovak
  • sl: Slovenian
  • es: Spanish
  • sv: Swedish
  • tl: Tagalog
  • tr: Turkish
  • vi: Vietnamese
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-form-parser-v1.0-2020-09-23 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 5
Maximum pages (batch/offline/asynchronous requests): 100
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

Intelligent Document Quality Processor

Description

Perform quality assessment of a document based on its readability and get a quality score.

Intelligent Document Quality processor uses machine learning to perform quality assessment of a document based on the readability of its content. This quality assessment is returned as a quality score [0, 1], where 1 means perfect quality. If the quality score detected is lower than 0.5, a list of negative quality reasons (sorted by the likelihood) is also returned.

Category General
Functions OCR, Quality Analysis
Release stage Preview
Access status Limited [4]
Type in API DOCUMENT_QUALITY_PROCESSOR
Notes
  • Quality score is returned in the confidence field of the entity with type="quality_score".
  • The quality/defect_* properties are sorted in descending order by confidence value.
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-document-quality-v1.0-2021-01-20 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 5
Maximum pages (batch/offline/asynchronous requests): 100
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • quality_score
  • quality/defect_blurry
  • quality/defect_dark
  • quality/defect_faint
  • quality/defect_noisy
  • quality/defect_text_too_small
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

Document Splitter

Description

Programmatically split documents on logical boundaries.

Document Splitter uses machine learning to separate documents on logical boundaries. For example, if you have one PDF document with multiple scanned files, the Document AI API will suggest the page location of a new file.

Category General
Functions OCR, Splitting
Release stage Deprecated
Access status Limited [4]
Type in API DOCUMENT_SPLIT_PROCESSOR
Notes
  • Maximum image size: 65500 x 65500 pixels
  • The splitter is not designed to split logical documents that are over 30 pages long. Logical documents that are more than 30 pages long (e.g. a 40-page bank statement) may be split into two or more docs.
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-document-split-v1.0-2020-09-20 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 2000
Uptraining
Human-in-the-Loop[2]
More information Document splitters behavior

Specialized processors

Contract parser

Description

Extract text and values from legal contracts such as agreement date, effective date, and parties.

Category Specialized
Solution type Contract
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [3]
Type in API CONTRACT_PROCESSOR
Notes
  • Duration entities such as renewal_term, notice_to_terminate_renewal, and initial_term are normalized in 'years-months-days' format. E.g. 'The initial term is five (5) months' would have the following normalized value: '0-5-0'.
  • If expiration_date is not explicit in the document, it is inferred from the effective_date and initial_term.
Supported languages
  • en: English
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-contract-v1.2-2022-06-01 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 200
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • agreement_date
  • arbitration_venue
  • confidentiality_clause
  • document_name
  • effective_date
  • expiration_date
  • governing_law
  • indemnity_clause
  • initial_term
  • litigation_venue
  • notice_to_terminate_renewal
  • non_compete_clause
  • parties
  • renewal_term
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • agreement_date
  • effective_date
  • expiration_date
  • initial_term
  • renewal_term
  • notice_to_terminate_renewal
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

France Driver License Parser

Description

Extract fields such as names, document ID, date of birth, etc.

Category Specialized
Solution type Identity
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FR_DRIVER_LICENSE_PROCESSOR
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-fr-driver-license-v1.0-2021-06-14 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 2
Maximum pages (batch/offline/asynchronous requests): 2
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • Family Name
  • Given Names
  • Document Id
  • Expiration Date
  • Date Of Birth
  • Issue Date
  • Portrait
Uptraining
Human-in-the-Loop[2]

France National ID Parser

Description

Extract fields such as names, document ID, date of birth, etc.

Category Specialized
Solution type Identity
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FR_NATIONAL_ID_PROCESSOR
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-fr-national-id-v1.0-2021-06-14 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 2
Maximum pages (batch/offline/asynchronous requests): 2
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • Family Name
  • Given Names
  • Document Id
  • Expiration Date
  • Date Of Birth
  • Issue Date
  • Address
  • Portrait
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • Date Of Birth
  • Expiration Date
  • Issue Date
Uptraining
Human-in-the-Loop[2]

France Passport Parser

Description

Extract fields such as names, document ID, date of birth, etc.

Category Specialized
Solution type Identity
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FR_PASSPORT_PROCESSOR
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-fr-passport-v1.0-2022-04-29 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 2
Maximum pages (batch/offline/asynchronous requests): 2
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • family_name
  • given_name
  • document_id
  • expiration_date
  • date_of_birth
  • issue_date
  • address
  • place_of_birth
  • portrait
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • Date Of Birth
  • Expiration Date
  • Issue Date
Uptraining
Human-in-the-Loop[2]

Identity Document Proofing Parser

Description

Predict the validity of ID documents using multiple signals.

Identity Document Proofing Processor is designed to help predict the validity of ID documents with four different signals.

The processor currently returns information from the following signals:

  • is_identity_document detection: Predicts whether an image contains a recognized identity document.
  • suspicious_words detection: Predicts whether words are present that aren't typical on IDs.
  • image_manipulation detection: Predicts whether the image was altered or tampered via an image editing tool.
  • online_duplicate detection: Predicts whether the image can be found online.

Category Specialized
Solution type Identity
Functions OCR, Quality Analysis
Release stage Preview
Access status Public
Type in API ID_PROOFING_PROCESSOR
Notes
  • The `Online Duplicate Detection` feature is currently processed in US data centers. Regional and multi-regional support is unavailable for this feature outside of the US.
  • This processor is supported by algorithms that are updated more frequently than new processor versions are released. For this reason, the processor may return different outputs over time even when using the same processor version. For example, the online duplicate detection system monitors images present on the web, and hence its behavior can change more quickly than can be tracked in processor versions.
Supported form/versions
  • Global document support
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-id-proofing-v1.0-2022-10-03 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 2
Maximum pages (batch/offline/asynchronous requests): 2
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • fraud_signals_is_identity_document
  • fraud_signals_suspicious_words
  • evidence_suspicious_word
  • evidence_inconclusive_suspicious_word
  • fraud_signals_image_manipulation
  • fraud_signals_online_duplicate
  • evidence_hostname
  • evidence_thumbnail_url
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • fraud_signals_image_manipulation
  • fraud_signals_online_duplicate
  • fraud_signals_is_identity_document
  • fraud_signals_suspicious_words
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

US Driver License Parser

Description

Extract fields such as names, document ID, date of birth, etc.

Category Specialized
Solution type Identity
Functions OCR, Entity Extraction
Release stage General availability
Access status Public
Type in API US_DRIVER_LICENSE_PROCESSOR
Supported languages
  • en: English
Supported form/versions
  • Supports all 50 States and D.C.
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-us-driver-license-v1.0-2021-06-14 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 2
Maximum pages (batch/offline/asynchronous requests): 2
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • Family Name
  • Given Names
  • Document Id
  • Expiration Date
  • Date Of Birth
  • Issue Date
  • Address
  • Portrait
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • Date Of Birth
  • Expiration Date
  • Issue Date
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

US Passport Parser

Description

Extract fields such as names, document ID, date of birth, etc.

Category Specialized
Solution type Identity
Functions OCR, Entity Extraction
Release stage General availability
Access status Public
Type in API US_PASSPORT_PROCESSOR
Supported languages
  • en: English
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-us-passport-v1.0-2021-06-14 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 2
Maximum pages (batch/offline/asynchronous requests): 2
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • Family Name
  • Given Names
  • Document Id
  • Expiration Date
  • Date Of Birth
  • Issue Date
  • MRZ Code
  • Portrait
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • Date Of Birth
  • Expiration Date
  • Issue Date
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

1003 Parser

Description

Extract over 50 fields from Fannie Mae Form 1003 (URLA).

The 1003 Form is Fannie Mae's form number for the Uniform Residential Loan Application (URLA), a borrower’s application for a mortgage. Freddie Mac's form number is Form 65; both refer to the same form.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_1003_PROCESSOR
Notes
  • Batch processing currently not available for this processor.
  • If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
  • en: English
Supported form/versions
  • Legacy Form 1003 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1003-v1.0-2020-10-01 Stable

None

None

pretrained-1003-v1.1-2021-12-10 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • AmortizationType
  • BuiltYear
  • HasPrimaryBorrowerSignaturePage1
  • HasSecondaryBorrowerSignaturePage1
  • LoanAmount
  • LoanPurpose
  • LoanTerm
  • MortgageType
  • NoteRate
  • PrimaryBorrowerCurrentJobTitleYearsTerm
  • PrimaryBorrowerDateOfBirth
  • PrimaryBorrowerDependentAge
  • PrimaryBorrowerDependentCount
  • PrimaryBorrowerDependentCountAndAge
  • PrimaryBorrowerEmployerNameAndAddress
  • PrimaryBorrowerHomePhone
  • PrimaryBorrowerJobTitle
  • PrimaryBorrowerJobYearsTerm
  • PrimaryBorrowerMarrageStatus
  • PrimaryBorrowerName
  • PrimaryBorrowerSSN
  • PrimaryBorrowerSelfEmployedIndicator
  • PrimaryBorrowerWorkPhoneNumber
  • PropertyAddress
  • PropertyUsageType
  • RefinanceLoanInfo
  • RefinancePurposeType
  • SecondaryBorrowerCurrentJobTitleYearsTerm
  • SecondaryBorrowerDateOfBirth
  • SecondaryBorrowerDependentAge
  • SecondaryBorrowerDependentCount
  • SecondaryBorrowerDependentCountAndAge
  • SecondaryBorrowerEmployerNameAndAddress
  • SecondaryBorrowerHomePhone
  • SecondaryBorrowerJobTitle
  • SecondaryBorrowerJobYearsTerm
  • SecondaryBorrowerMarrageStatus
  • SecondaryBorrowerName
  • SecondaryBorrowerSSN
  • SecondaryBorrowerSelfEmployedIndicator
  • SecondaryBorrowerWorkPhoneNumber
  • Units
Uptraining
Human-in-the-Loop[2]

1040 Parser

Description

Extract from Form 1040, including name, filing status, amounts, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_1040_PROCESSOR
Notes
  • If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
  • en: English
Supported form/versions
  • 2021 (pretrained-1040-v2.0-2022-08-24 version only)
  • 2020 (pretrained-1040-v2.0-2022-08-24 version only)
  • 2019
  • 2018
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1040-v1.0-2020-10-01 Stable

None

None

pretrained-1040-v1.1-2021-12-10 Stable

None

None

pretrained-1040-v2.0-2022-08-24 Release Candidate
Show fields
  • pensions_annuities_taxable
  • social_security_benefits_taxable
  • ira_distributions
  • ira_distributions_taxable_amount
  • qualified_dividends
  • ordinary_dividends
  • tax_exempt_interest
  • taxable_interest

None

Quality improvements.

Added support for Year 2020 and 2021.

Breaking change: the names of all extracted fields have been renamed from CamelCase to snake_case (for example, first_name instead of FirstName).

This change was made to standardize the format of field names across Document AI.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • AddressApt
  • AddressCityStateZip
  • AddressStreet
  • CapitalGain
  • EmailAddress
  • FilingStatusCheckbox
  • FirstName
  • LastName
  • Occupation
  • OtherIncome
  • PensionsAnnuities
  • PhoneNumber
  • SSN
  • SocialSecurityBenefits
  • SpouseFirstName
  • SpouseLastName
  • SpouseOccupation
  • SpouseSSN
  • TotalIncome
  • WagesSalariesTips
  • Year
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.
Sample Input File Open in new window.
Sample Output Open in new window.

1040 Schedule C Parser

Description

Extract from Form 1040 Schedule C, including name, wages, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_1040SCH_C_PROCESSOR
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1040sch-c-v1.0-2021-05-27 Stable

None

None

pretrained-1040sch-c-v1.1-2021-12-10 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • ProprietorName
  • EmployerIdentificationNumber
  • Wages
  • SocialSecurityNumber
  • BusinessName
  • BusinessAddress
  • OtherIncome
  • Depletion
  • DepreciationAndSection179ExpenseDeduction
  • DeductibleMeals
  • ExpensesForBusinessUseOfYourHome
  • NetProfitOrLoss
  • TotalMilesDrivenWithVehicleForBusiness
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.

1040 Schedule D Parser

Description

Extract from Form 1040 Schedule D, including name, gains, losses, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_1040SCH_D_PROCESSOR
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1040sch-d-v1.0-2021-11-17 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • amount_if_line_16_is_loss
  • amount_unrecaptured_section_1250_gain
  • are_lines_15_and_16_both_gains_checkbox
  • are_lines_18_and_19_both_zero_or_blank_checkbox
  • capital_gain_distributions
  • combine_net_short_and_long_term_capital_gain_loss
  • dispose_of_investment_during_tax_year_checkbox
  • long_term_capital_loss_carryover
  • long_term_gain_or_loss
  • name
  • net_long_term_capital_gain_or_loss
  • net_long_term_gain_or_loss
  • net_short_term_capital_gain_or_loss
  • net_short_term_gain_or_loss
  • qualified_dividends_checkbox
  • short_term_capital_loss_carryover
  • short_term_gain
  • social_security_number
  • totals_long_term_transactions_boxd_adjustments
  • totals_long_term_transactions_boxd_cost
  • totals_long_term_transactions_boxd_gain_or_loss
  • totals_long_term_transactions_boxd_proceeds
  • totals_long_term_transactions_boxe_adjustments
  • totals_long_term_transactions_boxe_cost
  • totals_long_term_transactions_boxe_gain_or_loss
  • totals_long_term_transactions_boxe_proceeds
  • totals_long_term_transactions_boxf_adjustments
  • totals_long_term_transactions_boxf_cost
  • totals_long_term_transactions_boxf_gain_or_loss
  • totals_long_term_transactions_boxf_proceeds
  • totals_long_term_transactions_cost
  • totals_long_term_transactions_gain_or_loss
  • totals_long_term_transactions_proceeds
  • totals_short_term_transactions_boxa_adjustments
  • totals_short_term_transactions_boxa_cost
  • totals_short_term_transactions_boxa_gain_or_loss
  • totals_short_term_transactions_boxa_proceeds
  • totals_short_term_transactions_boxb_adjustments
  • totals_short_term_transactions_boxb_cost
  • totals_short_term_transactions_boxb_gain_or_loss
  • totals_short_term_transactions_boxb_proceeds
  • totals_short_term_transactions_boxc_adjustments
  • totals_short_term_transactions_boxc_cost
  • totals_short_term_transactions_boxc_gain_or_loss
  • totals_short_term_transactions_boxc_proceeds
  • totals_short_term_transactions_cost
  • totals_short_term_transactions_gain_or_loss
  • totals_short_term_transactions_proceeds
Uptraining
Human-in-the-Loop[2]

1040 Schedule E Parser

Description

Extract from Form 1040 Schedule E, including name, expenses, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_1040SCH_E_PROCESSOR
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1040sch-e-v1.0-2021-04-14 Stable

None

None

pretrained-1040sch-e-v1.1-2021-12-10 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • NamesShownOnReturn
  • YourSocialSecurityNumber
  • PhysicalAddressOfEachProperty_A
  • PhysicalAddressOfEachProperty_B
  • PhysicalAddressOfEachProperty_C
  • TypeOfProperty_A
  • TypeOfProperty_B
  • TypeOfProperty_C
  • FairRentalDays_A
  • FairRentalDays_B
  • FairRentalDays_C
  • PersonalUseDays_A
  • PersonalUseDays_B
  • PersonalUseDays_C
  • RentsReceived_A
  • RentsReceived_B
  • RentsReceived_C
  • RoyaltiesReceived_A
  • RoyaltiesReceived_B
  • RoyaltiesReceived_C
  • Insurance_A
  • Insurance_B
  • Insurance_C
  • MortgageInterest_A
  • MortgageInterest_B
  • MortgageInterest_C
  • Taxes_A
  • Taxes_B
  • Taxes_C
  • Depletion_A
  • Depletion_B
  • Depletion_C
  • OtherExpenses_A
  • OtherExpenses_B
  • OtherExpenses_C
  • TotalExpenses_A
  • TotalExpenses_B
  • TotalExpenses_C
Uptraining
Human-in-the-Loop[2]

1099-DIV Parser

Description

Extract from Form 1099-DIV, including account number, qualified dividends, federal income tax withheld, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_1099DIV_PROCESSOR
Notes
  • If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
  • 2018 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1099div-v1.0-2021-05-27 Stable

None

None

pretrained-1099div-v1.1-2021-12-10 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • PayerTIN
  • RecipientTIN
  • RecipientName
  • RecipientAddressLine1
  • RecipientAddressLine2
  • PayerName
  • PayerAddressAndPhone
  • AccountNumber
  • TotalOrdinaryDividends
  • QualifiedDividends
  • TotalCapitalGainDistribution
  • UnrecapSec1250Gain
  • Section1202Gain
  • CollectiblesGain
  • NondividendDistributions
  • FederalIncomeTaxWithheld
  • Section199ADividends
  • CashLiquidationDistributions
  • NoncashLiquidationDistributions
  • ExemptInterestDividends
  • SpecifiedPrivateActivityBondInterestDividends
  • FatcaFilingRequirement
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

1099-G Parser

Description

Extract from Form 1099-G, including payer, recipient, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_1099G_PROCESSOR
Notes
  • If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1099g-v1.0-2021-05-27 Stable

None

None

pretrained-1099g-v1.1-2021-12-10 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • PayerName
  • PayerAddressAndPhone
  • PayerTIN
  • RecipientTIN
  • RecipientName
  • RecipientAddress
  • RecipientCityOrTownStateOrProvinceCountryAndZipOrForeignPostalCode
  • AccountNumber
  • UnemploymentCompensation
  • StateOrLocalIncomeTaxRefundsCreditsOrOffsets
  • Box2AmountIsForTaxYear
  • FederalIncomeTaxWithheld
  • RTAAPayments
  • TaxableGrants
  • AgriculturePayments
  • MarketGain
  • State_Line1
  • State_Line2
  • StateIdentificationNo_Line1
  • StateIdentificationNo_Line2
  • StateIncomeTaxWithheld_Line1
  • StateIncomeTaxWithheld_Line2
  • IsBox2TradeOrBusinessIncome
Uptraining
Human-in-the-Loop[2]

1099-INT Parser

Description

Extract from Form 1099-INT, including payer, recipient, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_1099INT_PROCESSOR
Notes
  • If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
  • 2018 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1099int-v1.0-2021-05-27 Stable

None

None

pretrained-1099int-v1.1-2021-12-10 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • PayerName
  • PayerAddressAndPhone
  • PayerTIN
  • RecipientTIN
  • RecipientName
  • RecipientAddress
  • RecipientCityOrTownStateOrProvinceCountryAndZipOrForeignPostalCode
  • AccountNumber
  • PayerRTN
  • InterestIncome
  • EarlyWithdrawalPenalty
  • InterestOnUSSavingsBondsAndTreasObligations
  • FederalIncomeTaxWithheld
  • InvestmentExpenses
  • ForeignTaxPaid
  • ForeignCountryOrUSPossession
  • TaxExemptInterest
  • SpecifiedPrivateActivityBondInterest
  • MarketDiscount
  • BondPremium
  • BondPremiumOnTreasuryObligations
  • BondPremiumOnTaxExemptBond
  • TaxExemptAndTaxCreditBondCUSIPNo
  • State_Line1
  • State_Line2
  • StateIdentificationNo_Line1
  • StateIdentificationNo_Line2
  • StateTaxWithheld_Line1
  • StateTaxWithheld_Line2
  • FatcaFilingRequirement
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

1099-MISC Parser

Description

Extract from Form 1099-MISC, including payer, recipient, amounts, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_1099MISC_PROCESSOR
Notes
  • If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported form/versions
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
  • 2018 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1099misc-v1.0-2021-05-27 Stable

None

None

pretrained-1099misc-v1.1-2021-12-10 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • PayerName
  • PayerAddressAndPhone
  • PayerTIN
  • RecipientTIN
  • RecipientName
  • RecipientAddress
  • Rents
  • Royalties
  • OtherIncome
  • FishingBoatProceeds
  • NonemployeeCompensation
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

1099-NEC Parser

Description

Extract from Form 1099-NEC, including payer, recipient, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_1099NEC_PROCESSOR
Notes
  • This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file. If your input file does not meet this assumption, please run the Lending Document Splitter & Classifier first and preprocess the input file.
Supported languages
  • en: English
Supported form/versions
  • 2021 (standard and customized versions)
  • 2020 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1099nec-v1.0-2021-08-11 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • FederalIncomeTaxWithheld
  • NonemployeeCompensation
  • PayersAddress
  • PayersName
  • RecipientAddress
  • RecipientCityStateCountry
  • RecipientName
  • RecipientStreetAddress
  • RecipientTIN
  • StateIncome_Line1
  • StateIncome_Line2
  • StateTaxWithheld_Line1
  • StateTaxWithheld_Line2
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

1099-R Parser

Description

Extract from Form 1099-R, including payer, recipient, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_1099R_PROCESSOR
Notes
  • This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file. If your input file does not meet this assumption, please run the Lending Document Splitter & Classifier first and preprocess the input file.
Supported languages
  • en: English
Supported form/versions
  • 2021 (standard and customized versions)
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1099r-v1.0-2021-08-11 Stable

None

None

pretrained-1099r-v2.0-2022-07-25 Release Candidate
Show fields
  • FormYear
  • PayerFirstName
  • PayerLastName
  • PayerMiddleInitial
  • PayerOrganizationName
  • PayerStreetAddress_Line1
  • PayerStreetAddress_Line2
  • PayerCity
  • PayerState
  • PayerZipcode
  • RecipientFirstName
  • RecipientLastName
  • RecipientMiddleInitial
  • RecipientOrganizationName
  • RecipientCity
  • RecipientState
  • RecipientZipcode
  • RecipientStreetAddress1
  • RecipientStreetAddress2

None

Quality improvements.

Uptraining supported.

Page limit increased from 10 to 15.

Breaking change: PayersName, PayersAddress, RecipientName, ReceptientCityStateCountry, RecipientAddress, RecipientStreetAddress, and EmployerNameAndAddress are no longer part of the output, and they are replaced with additional fields.(for example, PayerStreetAddress_Line1, PayerStreetAddress_Line2, PayerCity, PayerState and PayerZipcode instead of PayersAddress).

LocalTaxWithheld_Line2 is not supported in this version. Please use uptraining function to get the prediction if you are interested.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • AmountAllocableToIRRWithin5Years
  • DistributionCode
  • FederalIncomeTaxWithheld
  • GrossDistribution
  • LocalTaxWithheld_Line1
  • LocalTaxWithheld_Line2
  • Other
  • PayersAddress
  • PayersName
  • RecipientAddress
  • RecipientCityStateCountry
  • RecipientName
  • RecipientStreetAddress
  • RecipientTIN
  • StateTaxWithheld_Line1
  • StateTaxWithheld_Line2
  • TaxableAmount
  • TotalEmployeeContributions
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.

1065 Parser

Description

Extract from Form 1065, partnership name, address, assets, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_1065_PROCESSOR
Notes
  • This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file. If your input file does not meet this assumption, please run the Lending Document Splitter & Classifier first and preprocess the input file.
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1065-v1.0-2021-08-11 Stable

None

None

pretrained-1065-v2.0-2022-02-03 Stable

None

None

Quality improvements.

Breaking change: the names of all extracted fields have been renamed from CamelCase to snake_case (for example, end_of_tax_year_cash instead of EndOfTaxYear_Cash).

This change was made to standardize the format of field names across Document AI.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • AmountOwed
  • BeginningOfTaxYear_Cash
  • BeginningOfTaxYear_PartnersCapitalAccounts
  • BeginningOfTaxYear_TotalAssets
  • BeginningOfTaxYear_TotalLiabilitiesAndCapital
  • BeginningOfYearBalance
  • BusinessStartDate
  • CityStateCountry
  • EIN
  • EndOfTaxYear_Cash
  • EndOfTaxYear_PartnersCapitalAccounts
  • EndOfTaxYear_TotalAssets
  • EndOfTaxYear_TotalLiabilitiesAndCapital
  • EndOfYearBalance
  • IncomeOrLoss
  • IndividualActive_GeneralPartners
  • IndividualActive_LimitedPartners
  • IndividualPassive_GeneralPartners
  • IndividualPassive_LimitedPartners
  • NetIncomeOrLoss
  • NumberOfScheduleK1
  • OrdinaryBusinessIncomeOrLoss
  • OrdinaryBusinessIncomeOrLoss_ScheduleK
  • Overpayment
  • PartnershipName
  • Partnership_GeneralPartners
  • Partnership_LimitedPartners
  • ScheduleM1_NetIncomeOrLossPerBooks
  • ScheduleM2_NetIncomeOrLossPerBooks
  • SelfEmploymentNetEarningsOrLoss
  • StreetAddress
  • TotalAssets
  • TotalBalanceDue
  • TotalDeductions
  • TotalIncomeOrLoss
Uptraining
Human-in-the-Loop[2]

1120 Parser

Description

Extract from Form 1120, partnership name, address, assets, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_1120_PROCESSOR
Notes
  • This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file. If your input file does not meet this assumption, please run the Lending Document Splitter & Classifier first and preprocess the input file.
Supported languages
  • en: English
Supported form/versions
  • 2021 (pretrained-1120-v3.0-2022-04-26 version only)
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1120-v1.0-2021-08-11 Stable

None

None

pretrained-1120-v2.0-2022-02-03 Stable
Show fields
  • amount_owed
  • bal_begin_tax_yr
  • bal_end_tax_yr
  • begin_of_tax_yr_cash
  • begin_of_tax_yr_total_assets
  • begin_of_tax_yr_total_liab_sh_eq
  • city_state_country
  • credited_to_2021_estimated_tax
  • date_incorporated
  • employer_identification_number
  • end_of_tax_yr_cash
  • end_of_tax_yr_total_assets
  • end_of_tax_yr_total_liab_sh_eq
  • income
  • name
  • net_income_or_loss_per_books
  • over_payment
  • refunded
  • street_address
  • total_assets
  • total_deductions
  • total_income

None

Quality improvements.

Breaking change: the names of all extracted fields have been renamed from CamelCase to snake_case (for example, end_of_tax_year_cash instead of EndOfTaxYear_Cash).

This change was made to standardize the format of field names across Document AI.

pretrained-1120-v3.0-2022-04-26 Release Candidate
Show fields
  • begin_of_tax_yr_mortgages_notes_bonds_payable_below_1_year
  • beginning_date
  • capital_gain_net_income
  • city_state_country_zipcode
  • cost_of_goods_sold
  • depletion
  • depreciation
  • end_of_tax_yr_mortgages_notes_bonds_payable_below_1_year
  • ending_date
  • final_return_checkbox
  • foreign_ownership_no_checkbox
  • foreign_ownership_yes_checkbox
  • gross_income
  • net_gain_or_loss
  • net_operating_loss_deduction
  • other_income
  • other_ownership_no_checkbox_options
  • other_ownership_yes_checkbox_options
  • tax_year
  • taxable_income
  • total_tax_page1
  • travel_and_entertainment

None

Quality improvements.

Added support for Year 2021.

Entity list is in snake_case format similar to pretrained-1120-v2.0-2022-02-03.

Entity city_state_country is renamed into city_state_country_zipcode.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • AmountOwed
  • BalanceAtBeginningOfYear
  • BalanceAtEndOfYear
  • BeginningOfTheTaxYear_Cash
  • BeginningOfTheTaxYear_TotalAssets
  • BeginningOfTheTaxYear_TotalLiabilitiesAndShareholdersEquity
  • CityStateCountry
  • CreditedTo2021EstimatedTax
  • DateIncorporated
  • EIN
  • EndOfTheTaxYear_Cash
  • EndOfTheTaxYear_TotalAssets
  • EndOfTheTaxYear_TotalLiabilitiesAndShareholdersEquity
  • Income
  • Name
  • NetIncomeOrLossPerBooks
  • Overpayment
  • Refunded
  • StreetAddress
  • TotalAssets
  • TotalDeductions
  • TotalIncomeOrLoss
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.

1120S Parser

Description

Extract from Form 1120S, name, address, assets, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_1120S_PROCESSOR
Notes
  • This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file. If your input file does not meet this assumption, please run the Lending Document Splitter & Classifier first and preprocess the input file.
Supported languages
  • en: English
Supported form/versions
  • 2021 (pretrained-1120s-v2.1-2022-07-22 version only)
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-1120s-v1.0-2021-08-11 Stable

None

None

pretrained-1120s-v2.0-2022-02-03 Stable
Show fields
  • accum_adj_acc_bal_begin_tax_yr
  • accum_adj_acc_bal_end_tax_yr
  • accum_earn_prft_bal_begin_tax_yr
  • accum_earn_prft_bal_end_tax_yr
  • amount_owed
  • begin_of_tax_yr_cash
  • begin_of_tax_yr_total_assets
  • begin_of_tax_yr_total_liab_sh_eq
  • city_state_country
  • credited_to_2021_estimated_tax
  • date_incorporated
  • employer_identification_number
  • end_of_tax_yr_cash
  • end_of_tax_yr_total_assets
  • end_of_tax_yr_total_liab_sh_eq
  • form_year
  • income_or_loss
  • income_or_loss_reconciliation
  • name
  • net_income_or_loss_per_books
  • number_of_shareholders
  • ordinary_biz_income_or_loss
  • ordinary_biz_income_loss_sch_k
  • other_adj_acc_bal_begin_tax_yr
  • other_adj_acc_bal_end_tax_yr
  • over_payment
  • refunded
  • street_address
  • taxable_income_bal_begin_tax_yr
  • taxable_income_bal_end_tax_yr
  • total_assets
  • total_deductions
  • total_income_or_loss

None

Quality improvements.

Breaking change: the names of all extracted fields have been renamed from CamelCase to snake_case (for example, end_of_tax_year_cash instead of EndOfTaxYear_Cash).

This change was made to standardize the format of field names across Document AI.

pretrained-1120s-v2.1-2022-07-22 Release Candidate
Show fields
  • begin_of_tax_yr_accounts_payable
  • begin_of_tax_yr_mortgages_notes_bonds_less_than_a_yr
  • begin_of_tax_yr_other_assets
  • begin_of_tax_yr_other_current_assets
  • begin_of_tax_yr_other_current_liabilities
  • begin_of_tax_yr_tax_exempt_securities
  • begin_of_tax_yr_trade_notes_and_accounts
  • begin_of_tax_yr_us_govt_obligations
  • cost_of_goods_sold
  • depletion
  • depreciation
  • end_of_tax_yr_accounts_payable
  • end_of_tax_yr_mortgages_notes_bonds_less_than_a_yr
  • end_of_tax_yr_other_assets
  • end_of_tax_yr_other_current_assets
  • end_of_tax_yr_other_current_liabilities
  • end_of_tax_yr_tax_exempt_securities
  • end_of_tax_yr_trade_notes_and_accounts
  • end_of_tax_yr_us_govt_obligations
  • tax_year_begin_date
  • tax_year_end_date
  • travel_and_entertainment
  • other_income_or_loss

None

Quality improvements and supporting new fields.

Added support for year 2021.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • AccumulatedAdjustmentsAccount_BalanceAtBeginningOfTaxYear
  • AccumulatedAdjustmentsAccount_BalanceAtEndOfTaxYear
  • AccumulatedEarningsAndProfits_BalanceAtBeginningOfTaxYear
  • AccumulatedEarningsAndProfits_BalanceAtEndOfTaxYear
  • AmountOwed
  • BeginningOfTaxYear_Cash
  • BeginningOfTaxYear_TotalAssets
  • BeginningOfTaxYear_TotalLiabilitiesAndShareHoldersEquity
  • CityStateCountry
  • CreditedTo2021EstimatedTax
  • DateIncorporated
  • EIN
  • EndOfTaxYear_Cash
  • EndOfTaxYear_TotalAssets
  • EndOfTaxYear_TotalLiabilitiesAndShareHoldersEquity
  • IncomeOrLoss
  • IncomeOrLossReconciliation
  • Name
  • NetIncomeOrLossPerBooks
  • NumberOfShareholders
  • OrdinaryBusinessIncomeOrLoss
  • OrdinaryBusinessIncomeOrLoss_ScheduleBK
  • OtherAdjustmentsAccount_BalanceAtBeginningOfTaxYear
  • OtherAdjustmentsAccount_BalanceAtEndOfTaxYear
  • Overpayment
  • Refunded
  • ShareholdersUndistributedTaxableIncomePreviouslyTaxed_BalanceAtBeginningOfTaxYear
  • ShareholdersUndistributedTaxableIncomePreviouslyTaxed_BalanceAtEndOfTaxYear
  • StreetAddress
  • TotalAssets
  • TotalDeductions
  • TotalIncomeOrLoss
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.

Bank Statement Parser

Description

Extract from bank statements including name, account, transactions, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API BANK_STATEMENT_PROCESSOR
Notes
  • If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
  • en: English
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-bankstatement-v1.0-2021-08-08 Stable

None

None

pretrained-bankstatement-v1.1-2021-08-13 Stable

None

None

pretrained-bankstatement-v2.0-2021-12-10 Stable

None

None

pretrained-bankstatement-v3.0-2022-05-16 Release Candidate

None

None

This version assumes that the input file contains a single bank statement. Unlike the default version, this version does not check the input file for bank statements and will not return an error if no bank statements are found. If your input document contains multiple bank statements, use the Lending Document Splitter & Classifier for splitting before sending it to this processor.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 30
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • account_number
  • account_type
  • bank_address
  • bank_name
  • client_address
  • client_name
  • ending_balance
  • starting_balance
  • statement_date
  • statement_end_date
  • statement_start_date
  • table_item
    • table_item/transaction_deposit
    • table_item/transaction_deposit_date
    • table_item/transaction_deposit_description
    • table_item/transaction_withdrawal
    • table_item/transaction_withdrawal_date
    • table_item/transaction_withdrawal_description
Enriched fields

You can find more information in the Enrichment & normalization page.

Full list of enriched fields
  • bank_address
  • bank_name
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • ending_balance
  • starting_balance
  • statement_date
  • statement_end_date
  • statement_start_date
  • table_item/transaction_deposit
  • table_item/transaction_deposit_date
  • table_item/transaction_withdrawal
  • table_item/transaction_withdrawal_date
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.
Sample Input File Open in new window.
Sample Output Open in new window.

HOA Statement Parser

Description

Extract from Homeowner Association(HOA) statements including name, address, due amount, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API HOA_STATEMENT_PROCESSOR
Notes
  • This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file.
  • Extracted 'other_name' and 'other_address' represent HOA related names and addresses (such as property management company, HOA organization, payment service company, etc.) other than explictly extracted property_owner_name/address or property_address.
Supported languages
  • en: English
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-hoa-statement-v1.0-2021-12-08 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 50
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • due_amount
  • due_date
  • frequency
  • late_fee
  • other_address
  • other_name
  • paid_amount
  • paid_date
  • paid_through_amount
  • paid_through_date
  • property_address
  • property_owner_address
  • property_owner_name
  • statement_date
  • table_item
    • table_item/transaction_charge
    • table_item/transaction_date
    • table_item/transaction_description
    • table_item/transaction_payment
Enriched fields

You can find more information in the Enrichment & normalization page.

Full list of enriched fields
  • other_address
  • other_name
Uptraining
Human-in-the-Loop[2]

HUD-92900B Parser

Description

Extract from Form HUD-92900B dates and signature existence.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_HUD92900B_PROCESSOR
Supported languages
  • en: English
Supported form/versions
  • 2019 (standard version only)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-hud92900b-v1.0-2021-09-16 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • borrower1_signature_date_exists
  • borrower1_signature_exists
  • borrower2_signature_date_exists
  • borrower2_signature_exists
  • borrower3_signature_date_exists
  • borrower3_signature_exists
  • borrower4_signature_date_exists
  • borrower4_signature_exists
Uptraining
Human-in-the-Loop[2]

Lending Document Splitter & Classifier

Description

Identify documents in a large file and classify known lending document types.

Mortgage application packages and other lending documents often contain multiple documents (such as 1040 tax forms, W2, bank statements, etc.) in a single file. Lending document splitter allows you to programmatically split these combined lending documents on logical boundaries. The split files are then classified based on the document type, so that the appropriate extraction model can be applied to each file.

Category Specialized
Solution type Lending
Functions OCR, Classification, Splitting
Release stage General availability
Access status Limited [4]
Type in API LENDING_DOCUMENT_SPLIT_PROCESSOR
Notes
  • The splitter is not designed to split logical documents that are over 30 pages long. Logical documents that are more than 30 pages long (e.g. a 40-page bank statement) may be split into two or more docs and classified separately.
Supported languages
  • en: English
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-lending-document-split-v1.0-2021-12-08 Stable

None

None

pretrained-lending-document-split-v2.0-2021-12-09 Stable
Show fields
  • 1005_1996
  • 1040_2021[1]
  • 1040nr[1]
  • 1040nr_2018
  • 1040nr_2019
  • 1040nr_2020
  • 1040nr_2021[1]
  • 1040sb[1]
  • 1040sb_2018
  • 1040sb_2019
  • 1040sb_2020
  • 1040sb_2021[1]
  • 1040sc_2021[1]
  • 1040sd[1]
  • 1040sd_2018
  • 1040sd_2019
  • 1040sd_2020
  • 1040sd_2021[1]
  • 1040se_2021[1]
  • 1040sr[1]
  • 1040sr_2018
  • 1040sr_2019
  • 1040sr_2020
  • 1040sr_2021[1]
  • 1065_2021
  • 1076_2016
  • 1099div_2021[1]
  • 1099g_2021[1]
  • 1099int_2021[1]
  • 1099misc_2021[1]
  • 1099nec_2018[1]
  • 1099nec_2019[1]
  • 1099nec_2021
  • 1099r_2021[1]
  • 1099ssa_2021
  • 1120_2021[1]
  • 1120s_2021[1]
  • 1_4_Family_Rider_3170
  • 3108_Adjustable_Rate_Rider
  • 3140_Condominium_Rider
  • 3190_Balloon_Rider
  • 3890_Second_Home_Rider
  • 4506_T[1]
  • 4506_T_2018[1]
  • 4506_T_2019[1]
  • 4506_T_2020[1]
  • 4506_T_2021
  • 4506_T_EZ[1]
  • 4506_T_EZ_2018[1]
  • 4506_T_EZ_2019
  • 4506_T_EZ_2020[1]
  • 4506_T_EZ_2021
  • account_statement_investment_and_retirement
  • appraisal_ucdp_ssr
  • dhs_flood_certification
  • f11_12956_2017[1]
  • hud_54114
  • hud_92051
  • hud_92541
  • hud_92544
  • hud_92800
  • hud_92900a
  • hud_92900b
  • hud_92900lt
  • hud_92900ws
  • mortgage_statements
  • property_insurance
  • pud_rider
  • revocable_trust_rider
  • ssa_89[1]
  • ssa_89_2018[1]
  • ssa_89_2019[1]
  • ssa_89_2020
  • ssa_89_2021
  • ucc_financing_statement
  • usda_ad_3030
  • vba_26_0551_2004
  • vba_26_8923_2021
  • w2_2021[1]
  • w9_2019[1]
  • w9_2020[1]
  • w9_2021[1]

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 1250
Uptraining
Human-in-the-Loop[2]
Document types identified
Show types

This splitter can identify and classify the following types of documents and form:

  • 1003 - Legacy Form (standard and customized versions)
    • Return type(s): 1003[1], 1003_2009
  • 1040 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1040[1], 1040_2018, 1040_2019, 1040_2020[1]
  • 1040 Schedule C - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1040sc[1], 1040sc_2018[1], 1040sc_2019, 1040sc_2020
  • 1040 Schedule E - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1040se[1], 1040se_2018[1], 1040se_2019, 1040se_2020
  • 1065 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1065[1], 1065_2018[1], 1065_2019, 1065_2020
  • 1099-DIV - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099div[1], 1099div_2018, 1099div_2019, 1099div_2020
  • 1099-G - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099g[1], 1099g_2018[1], 1099g_2019, 1099g_2020
  • 1099-INT - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099int[1], 1099int_2018, 1099int_2019, 1099int_2020
  • 1099-MISC - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099misc[1], 1099misc_2018, 1099misc_2019, 1099misc_2020
  • 1099-NEC - 2020 (standard and customized versions)
    • Return type(s): 1099nec[1], 1099nec_2020
  • 1099-R - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099r[1], 1099r_2018, 1099r_2019, 1099r_2020
  • 1120 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1120[1], 1120_2018[1], 1120_2019, 1120_2020
  • 1120S - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1120s[1], 1120s_2018[1], 1120s_2019, 1120s_2020
  • Bank Statement
    • Return type(s): account_statement_bank
  • Pay Slip
    • Return type(s): payslip
  • SSA-1099 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): 1099ssa[1], 1099ssa_2018[1], 1099ssa_2019, 1099ssa_2020
  • US Driver License
    • Return type(s): US_Driver_License
  • US Pasport
    • Return type(s): US_Passport
  • W2 - 2018, 2019, 2020 (standard and customized versions)
    • Return type(s): w2[1], w2_2018, w2_2019, w2_2020
  • W9 - Rev. 10-2018, Rev. 11-2017
    • Return type(s): w9[1], w9_2017, w9_2018
  • If the splitter cannot identify the type of the document, it returns other.
Sample Input File Open in new window.
Sample Output Open in new window.
More information Document splitters behavior

Mortgage Statement Parser

Description

Extract from mortgage statements including name, address, due amount, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API MORTGAGE_STATEMENT_PROCESSOR
Notes
  • This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file.
Supported languages
  • en: English
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-mortgage-statement-v1.0-2021-10-17 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 50
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • borrower_address
  • borrower_name
  • due_date
  • fees_due
  • frequency
  • insurance_escrow_due
  • interest_due
  • interest_rate
  • loan_number
  • loan_type
  • maturity_date
  • others_due
  • past_due_amount
  • principal_balance
  • principal_due
  • property_address
  • regular_payment_amount
  • servicer_address
  • servicer_name
  • statement_date
  • table_item
    • table_item/description
    • table_item/effective_date
    • table_item/escrow_amount
    • table_item/fees_amount
    • table_item/interest_amount
    • table_item/others_amount
    • table_item/principal_amount
    • table_item/total_amount
    • table_item/total_charged
    • table_item/total_paid
    • table_item/transaction_date
    • table_item/unapplied_amount
  • tax_escrow_due
  • total_due
  • total_escrow_due
Enriched fields

You can find more information in the Enrichment & normalization page.

Full list of enriched fields
  • servicer_address
  • servicer_name
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.

Pay Slip Parser

Description

Extract from pay slips, including name, business, amounts, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API PAYSTUB_PROCESSOR
Notes
  • If the multi-page input document contains more than one valid pay slips, the processor extracts entities from only the first valid pay slip. If no pay slips are found in the input file, the processor returns an error message.
Supported languages
  • en: English
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-paystub-v1.0-2021-03-19 Stable

None

None

pretrained-paystub-v1.1-2021-08-13 Stable
Show fields
  • net_pay
  • net_pay_ytd
  • employee_account_number

None

Quality improvement and new fields support;
pretrained-paystub-v1.2-2021-12-10 Stable

None

None

pretrained-paystub-v2.0-2022-05-17 Release Candidate
Show fields
  • deduction_item
  • deduction_item/deduction_type
  • deduction_item/deduction_this_period
  • deduction_item/deduction_ytd
  • direct_deposit_item
  • direct_deposit_item/direct_deposit
  • direct_deposit_item/employee_account_number
  • earning_item
  • earning_item/earning_type
  • earning_item/earning_rate
  • earning_item/earning_hours
  • earning_item/earning_this_period
  • earning_item/earning_ytd
  • page_number
  • tax_item
  • tax_item/tax_type
  • tax_item/tax_this_period
  • tax_item/tax_ytd
  • federal_additional_tax
  • federal_allowance
  • federal_marital_status
  • state_additional_tax
  • state_allowance
  • state_marital_status

None

This version assumes that the input file contains a single pay slip. Unlike the default version, this version does not check the input file for pay slips and will not return an error if no pay slips are found. If your input document contains multiple pay slips, use the Lending Document Splitter & Classifier for splitting before sending it to this processor.

Quality improvement, new fields support and new schema. Bonus, Commissions, Holiday, Overtime, Regular Pay and Vacation are now part of earning_item/earning_this_period, and their year-to-date versions are in earning_item/earning_ytd. Direct Deposit and Employee Account Number are now nested under direct_deposit_item.

Async page limit is 10.

pretrained-paystub-v2.0-2022-07-22 Stable

None

None

Quality improvement and uptraining enhancements.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 50
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • bonus
  • bonus_ytd
  • commissions
  • commissions_ytd
  • direct_deposit
  • employee_account_number (Added in "pretrained-paystub-v1.1-2021-08-13")
  • employee_address
  • employee_name
  • employer_address
  • employer_name
  • end_date
  • gross_earnings
  • gross_earnings_ytd
  • holiday
  • holiday_ytd
  • net_pay (Added in "pretrained-paystub-v1.1-2021-08-13")
  • net_pay_ytd (Added in "pretrained-paystub-v1.1-2021-08-13")
  • overtime
  • overtime_ytd
  • pay_date
  • regular_pay
  • regular_pay_ytd
  • ssn
  • start_date
  • vacation
  • vacation_ytd
Enriched fields

You can find more information in the Enrichment & normalization page.

Full list of enriched fields
  • employer_address
  • employer_name
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • bonus
  • bonus_ytd
  • commissions
  • commissions_ytd
  • direct_deposit
  • end_date
  • gross_earnings
  • gross_earnings_ytd
  • holiday
  • holiday_ytd
  • net_pay
  • net_pay_ytd
  • overtime
  • overtime_ytd
  • pay_date
  • regular_pay
  • regular_pay_ytd
  • start_date
  • vacation
  • vacation_ytd
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.

Retirement/Investment Statement Parser

Description

Extract from Retirement/Investment statements including name, address, due amount, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API RETIREMENT_INVESTMENT_STATEMENT_PROCESSOR
Notes
  • This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file.
Supported languages
  • en: English
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-retirement-investment-statement-v1.0-2021-12-03 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 30
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • account_number
  • account_type
  • client_address
  • client_name
  • deposit_amount
  • deposit_source
  • ending_balance
  • fbo_itf_or_trust_indicator
  • financial_institution_address
  • financial_institution_name
  • starting_balance
  • statement_end_date
  • statement_start_date
  • withdrawal_amount
  • withdrawal_source
Enriched fields

You can find more information in the Enrichment & normalization page.

Full list of enriched fields
  • financial_institution_address
  • financial_institution_name
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • deposit_amount
  • ending_balance
  • starting_balance
  • statement_end_date
  • statement_start_date
  • withdrawal_amount
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.

SSA-89 Parser

Description

Extract from Form SSA-89, including name, address, SSN, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_SSA89_PROCESSOR
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-ssa89-v1.0-2021-09-16 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • agents_address
  • agents_name
  • company_address
  • company_name
  • date_of_birth
  • date_signed_exists
  • initials_exists
  • no_of_days_after_signature_that_consent_is_valid
  • reason_for_authorizing_consent_other_value
  • printed_name
  • reason_for_authorizing_consent
  • relationship
  • signature_exists
  • social_security_number
Uptraining
Human-in-the-Loop[2]

SSA-1099 Parser

Description

Extract from Form SSA-1099 including name, address, SSN, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_SSA1099_PROCESSOR
Notes
  • This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file. If your input file does not meet this assumption, please run the Lending Document Splitter & Classifier first and preprocess the input file.
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-ssa1099-v1.0-2021-08-09 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • Address
  • BenefitsPaid
  • BenefitsRepaidToSSA
  • DescriptionOfAmountInBox3
  • DescriptionOfAmountInBox4
  • Name
  • NetBenefits
  • SSN
  • VoluntaryFederalIncomeTaxWithheld
Uptraining
Human-in-the-Loop[2]

VBA26-0551 Parser

Description

Extract from Form VBA26-0551, coborrower signature, veteran signature, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage Preview
Access status Limited [4]
Type in API FORM_VBA26_0551_PROCESSOR
Supported languages
  • en: English
Supported form/versions
  • 2004 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-vba26-0551-v1.0-2021-09-16 Stable

None

None

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • delinquent_default_any_debt_federal_govt_checkbox
  • past5year_loan_foreclosure_liue_judgment_checkbox
  • signature_of_coborrower_date_exist
  • signature_of_coborrower_exist
  • signature_of_veteran_date_exist
  • signature_of_veteran_exist
Uptraining
Human-in-the-Loop[2]

W2 Parser

Description

Extract from Form W2, including employee, employer, wages, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_W2_PROCESSOR
Notes
  • If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
  • en: English
Supported form/versions
  • 2020 (standard and customized versions)
  • 2019 (standard and customized versions)
  • 2018 (standard and customized versions)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-w2-v1.0-2020-10-01 Stable

None

None

pretrained-w2-v1.1-2022-01-27 Stable

None

None

pretrained-w2-v1.2-2022-01-28 Stable
Show fields
  • AllocatedTips
  • ControlNumber
  • DependentCareBenefits
  • EIN
  • EmployeeAddress
  • EmployeeName
  • EmployerNameAndAddress
  • EmployerStateIdNumber_Line1
  • FederalIncomeTaxWithheld
  • FormYear
  • LocalIncomeTax_Line1
  • LocalityName_Line1
  • LocalWagesTipsEtc_Line1
  • MedicareTaxWithheld
  • MedicareWagesAndTips
  • NonqualifiedPlans
  • SocialSecurityTaxWithheld
  • SocialSecurityTips
  • SocialSecurityWages
  • SSN
  • State_Line1
  • StateIncomeTax_Line1
  • StateWagesTipsEtc_Line1
  • WagesTipsOtherCompensation

None

Quality improvements and supporting new fields; does not include splitter.

pretrained-w2-v2.0-2022-03-30 Release Candidate
Show fields
  • AllocatedTips
  • ControlNumber
  • DependentCareBenefits
  • EIN
  • EmployeeAddress_AdditionalStreetAddressOrPostalBox
  • EmployeeAddress_City
  • EmployeeAddress_State
  • EmployeeAddress_StreetAddressOrPostalBox
  • EmployeeAddress_Zip
  • EmployeeName_FirstName
  • EmployeeName_LastName
  • EmployeeName_MiddleNameOrInitial
  • EmployerAddress_AdditionalStreetAddressOrPostalBox
  • EmployerAddress_City
  • EmployerAddress_State
  • EmployerAddress_StreetAddressOrPostalBox
  • EmployerAddress_Zip
  • EmployerName
  • EmployerStateIdNumber_Line1
  • FederalIncomeTaxWithheld
  • FormYear
  • LocalIncomeTax_Line1
  • LocalWagesTipsEtc_Line1
  • LocalityName_Line1
  • MedicareTaxWithheld
  • MedicareWagesAndTips
  • NonqualifiedPlans
  • SSN
  • SocialSecurityTaxWithheld
  • SocialSecurityTips
  • SocialSecurityWages
  • StateIncomeTax_Line1
  • StateWagesTipsEtc_Line1
  • State_Line1
  • WagesTipsOtherCompensation
  • a_Code
  • a_Value
  • b_Code
  • b_Value
  • c_Code
  • c_Value
  • d_Code
  • d_Value

None

Quality improvements and support for box 12 fields and fine-grained predictions of EmployeeName, EmployeeAddress, and EmployerNameAndAddress, all of which are no longer part of the output and are replaced with additional fields.

pretrained-w2-v2.1-2022-06-08 Stable
Show fields
  • AllocatedTips
  • ControlNumber
  • DependentCareBenefits
  • EIN
  • EmployeeAddress_AdditionalStreetAddressOrPostalBox
  • EmployeeAddress_City
  • EmployeeAddress_State
  • EmployeeAddress_StreetAddressOrPostalBox
  • EmployeeAddress_Zip
  • EmployeeName_FirstName
  • EmployeeName_LastName
  • EmployeeName_MiddleNameOrInitial
  • EmployeeName_Suffix
  • EmployerAddress_AdditionalStreetAddressOrPostalBox
  • EmployerAddress_City
  • EmployerAddress_State
  • EmployerAddress_StreetAddressOrPostalBox
  • EmployerAddress_Zip
  • EmployerName
  • EmployerStateIdNumber_Line1
  • FederalIncomeTaxWithheld
  • FormYear
  • LocalIncomeTax_Line1
  • LocalWagesTipsEtc_Line1
  • LocalityName_Line1
  • MedicareTaxWithheld
  • MedicareWagesAndTips
  • NonqualifiedPlans
  • SSN
  • SocialSecurityTaxWithheld
  • SocialSecurityTips
  • SocialSecurityWages
  • StateIncomeTax_Line1
  • StateWagesTipsEtc_Line1
  • State_Line1
  • WagesTipsOtherCompensation
  • a_Code
  • a_Value
  • b_Code
  • b_Value
  • c_Code
  • c_Value
  • d_Code
  • d_Value

None

Similar to version pretrained-w2-v2.0-2022-03-30 with further quality enhancements and introducing one more entity EmployeeName_Suffix.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • ControlNumber
  • EIN
  • EmployeeAddress
  • EmployeeName
  • EmployerNameAndAddress
  • FederalIncomeTaxWithheld
  • MedicareTaxWithheld
  • MedicareWagesAndTips
  • SSN
  • SocialSecurityTaxWithheld
  • SocialSecurityWages
  • WagesTipsOtherCompensation
Enriched fields

You can find more information in the Enrichment & normalization page.

Full list of enriched fields
  • EmployerNameAndAddress
  • EIN
Uptraining
Human-in-the-Loop[2]
Sample Input File Open in new window.
Sample Output Open in new window.

W9 Parser

Description

Extract from Form W9 including name, address, TIN, etc.

Category Specialized
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Limited [4]
Type in API FORM_W9_PROCESSOR
Notes
  • If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
  • en: English
Supported form/versions
  • Form (Rev. 10-2018, Rev. 11-2017)
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-w9-v1.0-2020-09-25 Stable

None

None

pretrained-w9-v1.1-2021-12-10 Stable

None

None

pretrained-w9-v1.2-2022-01-27 Stable

None

None

Handles documents with variations other than the standard template and does not include splitter and classifier model (that has its own service and can be called seperately);
pretrained-w9-v2.0-2022-06-23 Release Candidate

None

None

Quality improvements.

Sync page limit is 10.

Breaking change: the names of all extracted fields have been renamed from CamelCase to snake_case (for example, business_name instead of BusinessName).

This change was made to standardize the format of field names across Document AI.

This processor assumes the input file contains the supported document from the beginning and will not classify or split the input file. If your input file does not meet this assumption, please run the Lending Document Splitter & Classifier first and preprocess the input file.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 15
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • AccountNumbers
  • Address
  • BusinessName
  • CityStateZip
  • EIN
  • ExemptFatcaCode
  • ExemptPayeeCode
  • FederalTaxClassification
  • FederalTaxClassificationOther
  • FormRevisionDate
  • HasSignature
  • HasSignatureDate
  • LlcTaxClassification
  • Name
  • SSN

Breaking change for Google Release Candidate 2022-06-23: the names of all extracted fields have been renamed from CamelCase to snake_case (for example, business_name instead of BusinessName). This change was made to standardize the format of field names across Document AI.

Enriched fields

You can find more information in the Enrichment & normalization page.

Full list of enriched fields
  • Address
  • BusinessName
  • EIN
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.

Expense Parser

Description

Extract text and values from expense documents such as expense date, supplier name, total amount, and currency.

Category Specialized
Solution type Procurement
Functions OCR, Entity Extraction
Release stage General availability
Access status Public
Type in API EXPENSE_PROCESSOR
Supported languages
Full list of languages
  • de: German
  • en: English
  • es: Spanish
  • fr: French
  • ja: Japanese
  • nl: Dutch
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-expense-v1.1-2021-04-09 Stable

None

None

Launched in April 2021. Deprecation is planned soon.
pretrained-expense-v1.2-2022-02-18 Stable

None

None

pretrained-expense-v1.3-2022-07-15 Stable
Show fields
  • credit_card_last_four_digits
  • line_item/quantity
  • payment_type
  • ja: Japanese
Support for hotel and car rental folios.
pretrained-expense-v1.4-2022-11-18 Release Candidate
Show fields
  • traveler_name
  • reservation_id
  • line_item/transaction_date
  • ja: Japanese
  • it: Italian
  • pt: Portuguese (Brazilian & Continental)
Performance improvements and support for uptraining. Maximum pages (online/synchronous requests) limit has been increased to 15.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 10
Maximum pages (batch/offline/asynchronous requests): 10
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • credit_card_last_four_digits
  • currency
  • end_date
  • net_amount
  • payment_type
  • purchase_time
  • receipt_date
  • start_date
  • supplier_address
  • supplier_city
  • supplier_name
  • tip_amount
  • total_amount
  • total_tax_amount
  • line_item
    • line_item/amount
    • line_item/description
    • line_item/product_code
Enriched fields

You can find more information in the Enrichment & normalization page.

Full list of enriched fields
  • supplier_address
  • supplier_name
  • supplier_phone
Normalized fields

You can find more information in the Enrichment & normalization page.

Full list of normalized fields
  • currency
  • total_amount
  • total_tax_amount
  • net_amount
  • receipt_date
  • purchase_time
  • start_date
  • end_date
  • line_item/amount
  • line_item/payment_date
  • line_item/payment_amount
Uptraining
Human-in-the-Loop[2]
Labeling Instructions Open in new window.
Sample Input File Open in new window.
Sample Output Open in new window.

Invoice Parser

Description

Extract text and values from invoices such as invoice number, supplier name, invoice amount, tax amount, invoice date, due date.

The invoice Parser extracts both header and line item fields, such as invoice number, supplier name, invoice amount, tax amount, invoice date, due date, and line item amounts.

Category Specialized
Solution type Procurement
Functions OCR, Entity Extraction
Release stage General availability
Access status Public
Type in API INVOICE_PROCESSOR
Supported languages
Full list of languages
  • de: German
  • en: English
  • es: Spanish
  • et: Estonian
  • fr: French
  • it: Italian
  • lv: Latvian
  • lt: Lithuanian
  • nl: Dutch
  • pt: Portuguese (Brazilian & Continental)
  • ro: Romanian
  • sv: Swedish
Processor versions
Version ID Release Channel Additional fields detected Additional languages supported Description
pretrained-invoice-v1.1-2021-04-09 Stable

None

None

pretrained-invoice-v1.2-2022-02-18 Stable

None

None

Deprecation is planned soon.
pretrained-invoice-v1.3-2022-07-15 Stable

None

  • it: Italian
  • pt: Portuguese (Brazilian & Continental)
  • ro: Romanian
  • sv: Swedish
  • et: Estonian
  • lv: Latvian
  • lt: Lithuanian
Uptrainable processor version. Maximum pages (online/synchronous requests) has been increased to 15.
pretrained-invoice-v1.4-2022-10-21 Release Candidate

None

None

Uptrainable processor version. Maximum pages (online/synchronous requests) has been increased to 15.

For more information, see Managing processor versions.

Quotas and limits
Maximum pages (online/synchronous requests): 15
Maximum pages (batch/offline/asynchronous requests): 200
Fields detected in the earliest version

You can also find this information in the Field detected page.

Full list of fields
  • amount_due
  • amount_paid_since_last_invoice
  • carrier
  • currency
  • currency_exchange_rate
  • delivery_date
  • due_date
  • freight_amount
  • invoice_date
  • invoice_id
  • line_item
    • line_item/amount
    • line_item/description
    • line_item/product_code
    • line_item/purchase_order
    • line_item/quantity
    • line_item/unit
    • line_item/unit_price
  • net_amount
  • payment_terms
  • purchase_order
  • receiver_address
  • receiver_email
  • receiver_name
  • receiver_phone
  • receiver_tax_id
  • receiver_website
  • remit_to_address
  • remit_to_name
  • ship_from_address
  • ship_from_name
  • ship_to_address
  • ship_to_name
  • supplier_address
  • supplier_email
  • supplier_iban
  • s