Stay organized with collections Save and categorize content based on your preferences.

Language support

Document AI API's text recognition feature (OCR) is able to detect text from a wide variety of languages and can detect multiple languages within a single document.

Languages detected by the Document AI API are returned in the Document object in the detectedLanguages field as a BCP-47 identifier.

For more information about OCR Language support, refer to the Cloud Vision OCR Language Support documentation.

See the tables below or the full Processor List for details.

General processors

Processor Supported Languages
Document OCR (Optical Character Recognition)
  • af: Afrikaans
  • sq: Albanian
  • ar: Arabic
  • hy: Armenian
  • be: Belarusian
  • bn: Bengali
  • bg: Bulgarian
  • ca: Catalan
  • zh: Chinese
  • hr: Croatian
  • cs: Czech
  • da: Danish
  • nl: Dutch
  • en: English
  • et: Estonian
  • fil: Filipino
  • fi: Finnish
  • fr: French
  • de: German
  • el: Greek
  • gu: Gujarati
  • iw: Hebrew
  • hi: Hindi
  • hu: Hungarian
  • is: Icelandic
  • id: Indonesian
  • it: Italian
  • ja: Japanese
  • kn: Kannada
  • km: Khmer
  • ko: Korean
  • lo: Lao
  • lv: Latvian
  • lt: Lithuanian
  • mk: Macedonian
  • ms: Malay
  • ml: Malayalam
  • mr: Marathi
  • ne: Nepali
  • no: Norwegian
  • fa: Persian
  • pl: Polish
  • pt: Portuguese (Brazilian & Continental)
  • pa: Punjabi
  • ro: Romanian
  • ru: Russian
  • sr: Serbian
  • sk: Slovak
  • sl: Slovenian
  • es: Spanish
  • sv: Swedish
  • tl: Tagalog
  • ta: Tamil
  • te: Telugu
  • th: Thai
  • tr: Turkish
  • uk: Ukrainian
  • vi: Vietnamese
  • yi: Yiddish
Form Parser
  • af: Afrikaans
  • sq: Albanian
  • ca: Catalan
  • hr: Croatian
  • cs: Czech
  • da: Danish
  • nl: Dutch
  • en: English
  • et: Estonian
  • fil: Filipino
  • fi: Finnish
  • fr: French
  • de: German
  • hu: Hungarian
  • is: Icelandic
  • id: Indonesian
  • it: Italian
  • lv: Latvian
  • lt: Lithuanian
  • ms: Malay
  • no: Norwegian
  • pl: Polish
  • pt: Portuguese (Brazilian & Continental)
  • ro: Romanian
  • sr: Serbian
  • sk: Slovak
  • sl: Slovenian
  • es: Spanish
  • sv: Swedish
  • tl: Tagalog
  • tr: Turkish
  • vi: Vietnamese

Specialized processors

Processor Supported Languages
Contract parser
  • en: English
Identity Document Proofing Parser
  • en: English
US Driver License Parser
  • en: English
US Passport Parser
  • en: English
1003 Parser
  • en: English
1040 Parser
  • en: English
1040 Schedule C Parser
  • en: English
1040 Schedule D Parser
  • en: English
1040 Schedule E Parser
  • en: English
1099-DIV Parser
  • en: English
1099-G Parser
  • en: English
1099-INT Parser
  • en: English
1099-NEC Parser
  • en: English
1099-R Parser
  • en: English
1065 Parser
  • en: English
1120 Parser
  • en: English
1120S Parser
  • en: English
Bank Statement Parser
  • en: English
HOA Statement Parser
  • en: English
HUD-92900B Parser
  • en: English
Lending Document Splitter & Classifier
  • en: English
Mortgage Statement Parser
  • en: English
Pay Slip Parser
  • en: English
Retirement/Investment Statement Parser
  • en: English
SSA-89 Parser
  • en: English
SSA-1099 Parser
  • en: English
VBA26-0551 Parser
  • en: English
W2 Parser
  • en: English
W9 Parser
  • en: English
Expense Parser
  • de: German
  • en: English
  • es: Spanish
  • fr: French
  • ja: Japanese
  • nl: Dutch
Invoice Parser
  • de: German
  • en: English
  • es: Spanish
  • et: Estonian
  • fr: French
  • it: Italian
  • lv: Latvian
  • lt: Lithuanian
  • nl: Dutch
  • pt: Portuguese (Brazilian & Continental)
  • ro: Romanian
  • sv: Swedish
Procurement Document Splitter & Classifier
  • en: English
Purchase Order Parser
  • en: English
Utility Parser
  • en: English

Custom processors

Processor Supported Languages
Custom Document Extractor
  • af: Afrikaans
  • sq: Albanian
  • ca: Catalan
  • hr: Croatian
  • cs: Czech
  • da: Danish
  • nl: Dutch
  • en: English
  • et: Estonian
  • tl: Tagalog
  • fi: Finnish
  • fr: French
  • de: German
  • hu: Hungarian
  • is: Icelandic
  • id: Indonesian
  • it: Italian
  • lv: Latvian
  • lt: Lithuanian
  • ms: Malay
  • no: Norwegian
  • pl: Polish
  • pt: Portuguese (Brazilian & Continental)
  • ro: Romanian
  • sk: Slovak
  • sl: Slovenian
  • es: Spanish
  • sv: Swedish
  • tr: Turkish
  • vi: Vietnamese
Custom Document Classifier
  • af: Afrikaans
  • sq: Albanian
  • ca: Catalan
  • hr: Croatian
  • cs: Czech
  • da: Danish
  • nl: Dutch
  • en: English
  • et: Estonian
  • tl: Tagalog
  • fi: Finnish
  • fr: French
  • de: German
  • hu: Hungarian
  • is: Icelandic
  • id: Indonesian
  • it: Italian
  • lv: Latvian
  • lt: Lithuanian
  • ms: Malay
  • no: Norwegian
  • pl: Polish
  • pt: Portuguese (Brazilian & Continental)
  • ro: Romanian
  • sk: Slovak
  • sl: Slovenian
  • es: Spanish
  • sv: Swedish
  • tr: Turkish
  • vi: Vietnamese
Custom Document Splitter
  • en: English

Handwriting recognition

The following languages are supported for handwriting recognition.

  • af: Afrikaans
  • sq: Albanian
  • be: Belarusian
  • bn: Bengali
  • bg: Bulgarian
  • ca: Catalan
  • zh: Chinese
  • hr: Croatian
  • cs: Czech
  • da: Danish
  • nl: Dutch
  • et: Estonian
  • tl: Filipino
  • fi: Finnish
  • de: German
  • el: Greek
  • hi: Hindi
  • hu: Hungarian
  • is: Icelandic
  • id: Indonesian
  • it: Italian
  • ja: Japanese
  • ko: Korean
  • lv: Latvian
  • lt: Lithuanian
  • mk: Macedonian
  • ms: Malay
  • mr: Marathi
  • ne: Nepali
  • pl: Polish
  • pt: Portuguese (Brazilian & Continental)
  • ro: Romanian
  • ru: Russian
  • sr: Serbian
  • sk: Slovak
  • sl: Slovenian
  • es: Spanish
  • sv: Swedish
  • tr: Turkish
  • uk: Ukrainian
  • vi: Vietnamese