Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.
Enterprise Document OCR (Optical Character Recognition)
Description
Identify and extract text in different types of documents.
This processor allows you to identify and extract text, including handwritten text, from documents in more than 200 languages. The processor also uses machine learning to perform a quality assessment of a document based on the readability of its content.
Only the English language is officially supported.
Region availability is in the US, EU, northamerica-northeast1 and asia-southeast1.
Supported languages
Full list of languages
Language Name
BCP 47 Tag
Script
Handwriting supported
Afrikaans
af
Latn
Arabic
ar
Arab
Azerbaijani
az
Latn
Azerbaijani (Cyrillic)
az-Cyrl
Cyrl
Belarusian
be
Cyrl
Bulgarian
bg
Cyrl
Bosnian
bs
Latn
Catalan
ca
Latn
Cebuano
ceb
Latn
Czech
cs
Latn
Welsh
cy
Latn
Danish
da
Latn
German
de
Latn
Greek
el
Grek
English
en
Latn
Esperanto
eo
Latn
Spanish
es
Latn
Estonian
et
Latn
Basque
eu
Latn
Persian
fa
Arab
Finnish
fi
Latn
Filipino
fil
Latn
French
fr
Latn
Irish
ga
Latn
Galician
gl
Latn
Hindi
hi
Deva
Croatian
hr
Latn
Haitian Creole
ht
Latn
Hungarian
hu
Latn
Indonesian
id
Latn
Icelandic
is
Latn
Italian
it
Latn
Hebrew
iw
Hebr
Japanese
ja
Jpan
Javanese
jv
Latn
Kazakh
kk
Cyrl
Korean
ko
Kore
Kyrgyz
ky
Cyrl
Latin
la
Latn
Lithuanian
lt
Latn
Latvian
lv
Latn
Macedonian
mk
Cyrl
Mongolian
mn
Cyrl
Marathi
mr
Deva
Malay
ms
Latn
Maltese
mt
Latn
Nepali
ne
Deva
Dutch
nl
Latn
Norwegian
no
Latn
Polish
pl
Latn
Pashto
ps
Arab
Portuguese (Portugal & Brazil)
pt
Latn
Romanian
ro
Latn
Russian
ru
Cyrl
Russian (Petrine Orthography)
ru-PETR1708
Cyrl
Sanskrit
sa
Deva
Slovak
sk
Latn
Slovenian
sl
Latn
Albanian
sq
Latn
Serbian
sr
Cyrl
Swedish
sv
Latn
Swahili
sw
Latn
Tagalog
tl
Latn
Turkish
tr
Latn
Ukrainian
uk
Cyrl
Urdu
ur
Arab
Uzbek
uz
Latn
Uzbek (Cyrillic)
uz-Cyrl
Cyrl
Vietnamese
vi
Latn
Yiddish
yi
Hebr
Chinese simplified
zh-Hans
Hani
Chinese traditional
zh-Hant
Hani
Zulu
zu
Latn
Processor versions
Version ID
Release Channel
Additional fields detected
Additional languages supported
Description
pretrained-foundation-model-v1.0-2023-08-22
Stable
None
None
Production-ready candidate specialized for document use cases powered by specialized vision models and foundation models from Google. Recommended stable version.
pretrained-foundation-model-v1.1-2024-03-12
Release Candidate
None
None
Release candidate powered by Gemini 1.0 Pro LLM and newly developed technologies, including specialized language and vision models. Also includes advanced OCR features such as checkbox detection. Recommended for users who want to use the increased token limits or experiment with newer models.
pretrained-foundation-model-v1.2-2024-05-10
Release Candidate
None
None
Release candidate powered by Gemini 1.5 Pro LLM and newly developed technologies, including specialized language and vision models. Also includes advanced OCR features such as checkbox detection. Recommended for users who want to use the largest supported token limits or experiment with newer models.
pretrained-foundation-model-v1.3-2024-08-31
Release Candidate
None
None
Release candidate powered by the Gemini 1.5 Flash LLM from Google. Also includes advanced OCR features such as checkbox detection. Recommended for those who want the lowest latency.
Extract general key-value pairs (entity and checkbox), tables, and generic entities from documents in addition to OCR text.
This processor applies advanced machine learning technologies to extract key-value pairs, checkboxes, and tables from documents more than 200 languages. This processor also leverages deep learning models to extract 11 generic entities that are common in various document types.
Extracts document content elements (text, tables, and lists) and creates context-aware chunks.
Layout Parser extracts document content elements like text, tables, and lists, and creates context-aware chunks that facilitate information retrieval in generative AI and discovery applications.
If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
Language Name
BCP 47 Tag
Script
Handwriting supported
English
en
Latn
Processor versions
Version ID
Release Channel
Additional fields detected
Additional languages supported
Description
pretrained-bankstatement-v1.0-2021-08-08
Stable
None
None
pretrained-bankstatement-v1.1-2021-08-13
Stable
None
None
pretrained-bankstatement-v2.0-2021-12-10
Stable
None
None
pretrained-bankstatement-v3.0-2022-05-16
Stable
None
None
This version assumes that the input file contains a single bank statement. Unlike the default version, this version does not check the input file for bank statements and will not return an error if no bank statements are found.
If a page of a multi-page input file is the correct document type and one of the supported versions, the processor performs entity extraction on the first supported document. If the processor doesn't find any applicable documents in the input file, the processor returns an error message.
Supported languages
Language Name
BCP 47 Tag
Script
Handwriting supported
English
en
Latn
Supported form/versions
2020 (standard and customized versions)
2019 (standard and customized versions)
2018 (standard and customized versions)
Processor versions
Version ID
Release Channel
Additional fields detected
Additional languages supported
Description
pretrained-w2-v1.0-2020-10-01
Stable
None
None
pretrained-w2-v1.1-2022-01-27
Stable
None
None
pretrained-w2-v1.2-2022-01-28
Stable
Show fields
AllocatedTips
ControlNumber
DependentCareBenefits
EIN
EmployeeAddress
EmployeeName
EmployerNameAndAddress
EmployerStateIdNumber_Line1
FederalIncomeTaxWithheld
FormYear
LocalIncomeTax_Line1
LocalityName_Line1
LocalWagesTipsEtc_Line1
MedicareTaxWithheld
MedicareWagesAndTips
NonqualifiedPlans
SocialSecurityTaxWithheld
SocialSecurityTips
SocialSecurityWages
SSN
State_Line1
StateIncomeTax_Line1
StateWagesTipsEtc_Line1
WagesTipsOtherCompensation
None
Quality improvements and supporting new fields; does not include splitter.
Quality improvements and support for box 12 fields and fine-grained predictions of EmployeeName, EmployeeAddress, and EmployerNameAndAddress, all of which are no longer part of the output and are replaced with additional fields.
The Online Duplicate Detection feature is currently processed in US data centers. Regional and multi-regional support is unavailable for this feature outside of the US.
This processor is supported by algorithms that are updated more frequently than new processor versions are released. For this reason, the processor might return different outputs over time even when using the same processor version. For example, the Online Duplicate Detection system monitors images present on the web. The system's behavior can then change more quickly than can be tracked in processor versions.
Refer to notes on Responsible AI[2] and Human review.[3]
Supported languages
Language Name
BCP 47 Tag
Script
Handwriting supported
English
en
Latn
Supported form/versions
Support for US passports, passcards and driver's licenses.
If the multi-page input document contains more than one valid pay slips, the processor extracts entities from only the first valid pay slip. If no pay slips are found in the input file, the processor returns an error message.
Supported languages
Language Name
BCP 47 Tag
Script
Handwriting supported
English
en
Latn
Processor versions
Version ID
Release Channel
Additional fields detected
Additional languages supported
Description
pretrained-paystub-v1.0-2021-03-19
Stable
None
None
pretrained-paystub-v1.1-2021-08-13
Stable
Show fields
net_pay
net_pay_ytd
employee_account_number
None
Quality improvement and new fields support;
pretrained-paystub-v1.2-2021-12-10
Stable
None
None
pretrained-paystub-v2.0-2022-05-17
Release Candidate
Show fields
deduction_item
deduction_item/deduction_type
deduction_item/deduction_this_period
deduction_item/deduction_ytd
direct_deposit_item
direct_deposit_item/direct_deposit
direct_deposit_item/employee_account_number
earning_item
earning_item/earning_type
earning_item/earning_rate
earning_item/earning_hours
earning_item/earning_this_period
earning_item/earning_ytd
page_number
tax_item
tax_item/tax_type
tax_item/tax_this_period
tax_item/tax_ytd
federal_additional_tax
federal_allowance
federal_marital_status
state_additional_tax
state_allowance
state_marital_status
None
This version assumes that the input file contains a single pay slip. Unlike the default version, this version does not check the input file for pay slips and will not return an error if no pay slips are found.
Quality improvement, new fields support and new schema. Bonus, Commissions, Holiday, Overtime, Regular Pay and Vacation are now part of earning_item/earning_this_period, and their year-to-date versions are in earning_item/earning_ytd. Direct Deposit and Employee Account Number are now nested under direct_deposit_item.
Extract text and values from invoices such as invoice number, supplier name, invoice amount, tax amount, invoice date, due date.
The invoice Parser extracts both header and line item fields, such as invoice number, supplier name, invoice amount, tax amount, invoice date, due date, and line item amounts.
[1] This processor is only available to limited access customers.
To request API access, fill out and submit the
Document AI limited
access customer request form.
The form requests information about you, your company, and your use case.
Note that a Google Cloud Project ID is required for access.
To create a new Google Cloud project, or identify your existing project's
Project ID see the following
instructions.
After you submit the form, the Document AI team will
review your request to ensure you meet the criteria for access.
If approved, you will receive an email with instructions on how to access
and use this feature.
[2]
Identity Document Proofing works to extract and evaluate information from ID documents that contributes to identifying whether the input image represents an authentic ID.
At Google Cloud, we prioritize helping customers safely develop and implement AI solutions, and Identity Proofing has been developed in accordance with Google's AI Principles.
Based on Google's AI Principles and current product design, we strongly recommend using caution and carefully evaluating the potential benefits and risks of using Identity Document Proofing for the following:
Decision-making without a human in the loop for predictions that can impact human rights.
In sensitive domains including but not limited to employment, access to public services, healthcare, and safety-critical contexts.
[3] Always use Identity Proofing as part of your broader identity detection process and workflow.
It is important that you have a human reviewer in your workflow to verify whether the predicted signals are accurate. The Identity Proofing processor isn't meant to replace human review of IDs in a workflow, but rather to assist human reviewers in validating ID documents. The Identity Proofing processor shouldn't be used as an automated decision tool to determine whether an ID is valid. With human review, customers can achieve higher document processing accuracy and help businesses evaluate predictions using purpose-built tools to enable those reviews.
Make sure that you review regulations in the region where you are implementing this technology, and research existing industry guidance to learn about policy guidelines and common fairness issues. Read about fairness in machine learning, including ways to mitigate bias in training datasets, evaluate your custom models for disparities in performance, and other considerations as you use your custom model.
We encourage customers to keep fairness, interpretability, and privacy and security best practices in mind when implementing Identity Proofing. To learn more about how to implement responsible AI, read Google's recommendations for Responsible AI practices.