Language support

Document AI API's text recognition feature (OCR) is able to detect text from a wide variety of languages and can detect multiple languages within a single document.

Languages detected by the Document AI API are returned in the Document object in the detectedLanguages field as a BCP-47 identifier.

For more information about OCR Language support, refer to the Cloud Vision OCR Language Support documentation.

See the tables below or the full Processor List for details.

General processors

Processor Supported Languages
Enterprise Document OCR (Optical Character Recognition)
Language Name BCP 47 Tag Script Handwriting supported
Afrikaans af Latn
Albanian sq Latn
Arabic ar Arab
Armenian hy Armn
Belarusian be Cyrl
Bangla bn Beng
Bengali bn Beng
Bulgarian bg Cyrl
Catalan ca Latn
Chinese zh Hani
Croatian hr Latn
Czech cs Latn
Danish da Latn
Dutch nl Latn
English en Latn
Estonian et Latn
Filipino fil Latn
Finnish fi Latn
French fr Latn
German de Latn
Greek el Grek
Gujarati gu Gujr
Hebrew iw Hebr
Hindi hi Deva
Hungarian hu Latn
Icelandic is Latn
Indonesian id Latn
Italian it Latn
Japanese ja Jpan
Kannada kn Knda
Khmer km Khmr
Korean ko Kore
Lao lo Laoo
Latvian lv Latn
Lithuanian lt Latn
Macedonian mk Cyrl
Malay ms Latn
Malayalam ml Mlym
Marathi mr Deva
Nepali ne Deva
Norwegian no Latn
Persian fa Arab
Polish pl Latn
Portuguese (Portugal & Brazil) pt Latn
Punjabi pa Guru
Romanian ro Latn
Russian ru Cyrl
Serbian sr Cyrl
Slovak sk Latn
Slovenian sl Latn
Spanish es Latn
Swedish sv Latn
Tagalog tl Latn
Tamil ta Taml
Telugu te Telu
Thai th Thai
Turkish tr Latn
Ukrainian uk Cyrl
Vietnamese vi Latn
Yiddish yi Hebr
Form Parser
Language Name BCP 47 Tag Script Handwriting supported
Afrikaans af Latn
Albanian sq Latn
Arabic ar Arab
Armenian hy Armn
Belarusian be Cyrl
Bangla bn Beng
Bengali bn Beng
Bulgarian bg Cyrl
Catalan ca Latn
Chinese zh Hani
Croatian hr Latn
Czech cs Latn
Danish da Latn
Dutch nl Latn
English en Latn
Estonian et Latn
Filipino fil Latn
Finnish fi Latn
French fr Latn
German de Latn
Greek el Grek
Gujarati gu Gujr
Hebrew iw Hebr
Hindi hi Deva
Hungarian hu Latn
Icelandic is Latn
Indonesian id Latn
Italian it Latn
Japanese ja Jpan
Kannada kn Knda
Khmer km Khmr
Korean ko Kore
Lao lo Laoo
Latvian lv Latn
Lithuanian lt Latn
Macedonian mk Cyrl
Malay ms Latn
Malayalam ml Mlym
Marathi mr Deva
Nepali ne Deva
Norwegian no Latn
Persian fa Arab
Polish pl Latn
Portuguese (Portugal & Brazil) pt Latn
Punjabi pa Guru
Romanian ro Latn
Russian ru Cyrl
Serbian sr Cyrl
Slovak sk Latn
Slovenian sl Latn
Spanish es Latn
Swedish sv Latn
Tagalog tl Latn
Tamil ta Taml
Telugu te Telu
Thai th Thai
Turkish tr Latn
Ukrainian uk Cyrl
Vietnamese vi Latn
Yiddish yi Hebr

Specialized processors

Processor Supported Languages
Identity Document Proofing Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
US Driver License Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
US Passport Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1003 Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1040 Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1040 Schedule C Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1040 Schedule D Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1040 Schedule E Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1099-DIV Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1099-G Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1099-INT Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1099-NEC Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1099-R Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1065 Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1120 Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
1120S Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
Bank Statement Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
HOA Statement Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
HUD-92900B Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
Lending Document Splitter & Classifier
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
Mortgage Statement Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
Pay Slip Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
Retirement/Investment Statement Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
SSA-89 Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
SSA-1099 Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
VBA26-0551 Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
W2 Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
W9 Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
Expense Parser
Language Name BCP 47 Tag Script Handwriting supported
German de Latn
English en Latn
Spanish es Latn
French fr Latn
Japanese ja Jpan
Dutch nl Latn
Invoice Parser
Language Name BCP 47 Tag Script Handwriting supported
German de Latn
English en Latn
Spanish es Latn
Estonian et Latn
French fr Latn
Italian it Latn
Latvian lv Latn
Lithuanian lt Latn
Dutch nl Latn
Portuguese (Portugal & Brazil) pt Latn
Romanian ro Latn
Swedish sv Latn
Procurement Document Splitter & Classifier
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
Purchase Order Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
Utility Parser
Language Name BCP 47 Tag Script Handwriting supported
English en Latn

Custom processors

Processor Supported Languages
Custom Document Extractor
Language Name BCP 47 Tag Script Handwriting supported
Afrikaans af Latn
Arabic ar Arab
Azerbaijani az Latn
Azerbaijani (Cyrillic) az-Cyrl Cyrl
Belarusian be Cyrl
Bulgarian bg Cyrl
Bosnian bs Latn
Catalan ca Latn
Cebuano ceb Latn
Czech cs Latn
Welsh cy Latn
Danish da Latn
German de Latn
Greek el Grek
English en Latn
Esperanto eo Latn
Spanish es Latn
Estonian et Latn
Basque eu Latn
Persian fa Arab
Finnish fi Latn
Filipino fil Latn
French fr Latn
Irish ga Latn
Galician gl Latn
Hindi hi Deva
Croatian hr Latn
Haitian Creole ht Latn
Hungarian hu Latn
Indonesian id Latn
Icelandic is Latn
Italian it Latn
Hebrew iw Hebr
Japanese ja Jpan
Javanese jv Latn
Kazakh kk Cyrl
Korean ko Kore
Kyrgyz ky Cyrl
Latin la Latn
Lithuanian lt Latn
Latvian lv Latn
Macedonian mk Cyrl
Mongolian mn Cyrl
Marathi mr Deva
Malay ms Latn
Maltese mt Latn
Nepali ne Deva
Dutch nl Latn
Norwegian no Latn
Polish pl Latn
Pashto ps Arab
Portuguese (Portugal & Brazil) pt Latn
Romanian ro Latn
Russian ru Cyrl
Russian (Petrine Orthography) ru-PETR1708 Cyrl
Sanskrit sa Deva
Slovak sk Latn
Slovenian sl Latn
Albanian sq Latn
Serbian sr Cyrl
Swedish sv Latn
Swahili sw Latn
Tagalog tl Latn
Turkish tr Latn
Ukrainian uk Cyrl
Urdu ur Arab
Uzbek uz Latn
Uzbek (Cyrillic) uz-Cyrl Cyrl
Vietnamese vi Latn
Yiddish yi Hebr
Chinese simplified zh-Hans Hani
Chinese traditional zh-Hant Hani
Zulu zu Latn
Custom Document Classifier
Language Name BCP 47 Tag Script Handwriting supported
English en Latn
Custom Document Splitter
Language Name BCP 47 Tag Script Handwriting supported
Afrikaans af Latn
Arabic ar Arab
Azerbaijani az Latn
Azerbaijani (Cyrillic) az-Cyrl Cyrl
Belarusian be Cyrl
Bulgarian bg Cyrl
Bosnian bs Latn
Catalan ca Latn
Cebuano ceb Latn
Czech cs Latn
Welsh cy Latn
Danish da Latn
German de Latn
Greek el Grek
English en Latn
Esperanto eo Latn
Spanish es Latn
Estonian et Latn
Basque eu Latn
Persian fa Arab
Finnish fi Latn
Filipino fil Latn
French fr Latn
Irish ga Latn
Galician gl Latn
Hindi hi Deva
Croatian hr Latn
Haitian Creole ht Latn
Hungarian hu Latn
Indonesian id Latn
Icelandic is Latn
Italian it Latn
Hebrew iw Hebr
Japanese ja Jpan
Javanese jv Latn
Kazakh kk Cyrl
Korean ko Kore
Kyrgyz ky Cyrl
Latin la Latn
Lithuanian lt Latn
Latvian lv Latn
Macedonian mk Cyrl
Mongolian mn Cyrl
Marathi mr Deva
Malay ms Latn
Maltese mt Latn
Nepali ne Deva
Dutch nl Latn
Norwegian no Latn
Polish pl Latn
Pashto ps Arab
Portuguese (Portugal & Brazil) pt Latn
Romanian ro Latn
Russian ru Cyrl
Russian (Petrine Orthography) ru-PETR1708 Cyrl
Sanskrit sa Deva
Slovak sk Latn
Slovenian sl Latn
Albanian sq Latn
Serbian sr Cyrl
Swedish sv Latn
Swahili sw Latn
Tagalog tl Latn
Turkish tr Latn
Ukrainian uk Cyrl
Urdu ur Arab
Uzbek uz Latn
Uzbek (Cyrillic) uz-Cyrl Cyrl
Vietnamese vi Latn
Yiddish yi Hebr
Chinese simplified zh-Hans Hani
Chinese traditional zh-Hant Hani
Zulu zu Latn
Summarizer
Language Name BCP 47 Tag Script Handwriting supported
English en Latn