Language support

Document AI API's text recognition feature (OCR) is able to detect a wide variety of languages and can detect multiple languages within a single document.

Languages detected by the Document AI API are returned in the Document object in the detectedLanguages field.

Each language code parameter typically consists of a BCP-47 identifier. This parameter can be of the form language-region, where language refers to the primary language and the optional region refers to a region (usually a country identifier) of a particular dialect. For example, Chinese can be represented as Simplified Chinese as written in the People's Republic of China (zh-Hans) or Traditional Chinese as written in Taiwan (zh-Hant).

Document OCR processor language support

Currently, English is the only language supported for Document OCR functionality.

There are three levels of language support in OCR functionality:

  1. Supported languages are those we prioritize and regularly evaluate performance against.
  2. Experimental languages are those under active development but not regularly evaluated against.
  3. Mapped languages are those supported by mapping them to another language code or to a general character recognizer. For example, "en-GB" is supported, but it is not treated any differently than "en" for the purposes of recognizing text. We make a best-effort to return the correct mapped language code in the Entity locale field, but mapped languages are more likely than fully supported or experimentally supported languages to be misidentified as a similar language.

Supported languages

The following languages are prioritized and regularly evaluated.

Language Language (English name) languageHints code Script / notes
Afrikaans Afrikaans af Latn
shqip Albanian sq Latn
العربية Arabic ar Arab; Modern Standard
Հայ Armenian hy Armn
беларускі Belorussian be Cyrl
বাংলা Bengali bn Beng
български Bulgarian bg Cyrl
Català Catalan ca Latn
普通话 Chinese zh Hans/Hant
Hrvatski Croatian hr Latn
Čeština Czech cs Latn
Dansk Danish da Latn
Nederlands Dutch nl Latn
English English en Latn; American
Eesti keel Estonian et Latn
Filipino Filipino fil (or tl) Latn
Suomi Finnish fi Latn
Français French fr Latn; European
Deutsch German de Latn
Ελληνικά Greek el Grek
ગુજરાતી Gujarati gu Gujr
עברית Hebrew iw Hebr
हिन्दी Hindi hi Deva
Magyar Hungarian hu Latn
Íslenska Icelandic is Latn
Bahasa Indonesia Indonesian id Latn
Italiano Italian it Latn
日本語 Japanese ja Jpan
ಕನ್ನಡ Kannada kn Knda
ភាសាខ្មែរ Khmer km Khmr
한국어 Korean ko Kore
ລາວ Lao lo Laoo
Latviešu Latvian lv Latn
Lietuvių Lithuanian lt Latn
Македонски Macedonian mk Cyrl
Bahasa Melayu Malay ms Latn
മലയാളം Malayalam ml Mlym
मराठी Marathi mr Deva
नेपाली Nepali ne Deva
Norsk Norwegian no Latn; Bokmål
فارسی Persian fa Arab
Polski Polish pl Latn
Português Portuguese pt Latn; Brazilian
ਪੰਜਾਬੀ Punjabi pa Guru; Gurmukhi
Română Romanian ro Latn
Русский Russian ru Cyrl
Русский (старая орфография) Russian ru-PETR1708 Cyrl; Old Orthography
Српски Serbian sr Cyrl & Latn
Српски (латиница) Serbian sr-Latn Latn
Slovenčina Slovak sk Latn
Slovenščina Slovenian sl Latn
Español Spanish es Latn; European
Svenska Swedish sv Latn
தமிழ் Tamil ta Taml
తెలుగు Telugu te Telu
ไทย Thai th Thai
Türkçe Turkish tr Latn
Українська Ukrainian uk Cyrl
Tiếng Việt Vietnamese vi Latn
Yiddish Yiddish yi Hebr

Experimental languages

The following languages are under active development and not yet regularly evaluated against.

Language Language (English name) languageHints code Script / notes
አማርኛ Amharic am Ethi
Αρχαία ελληνικά Ancient Greek grc Grek
অসমীয়া Assamese as Beng
Azərbaycan Azerbaijani az Latn
Azərbaycan (qədim yazı) Azerbaijani az-Cyrl Cyrl; old orthography
Euskara Basque eu Latn
Bosanski Bosnian bs Latn
မြန်မာ Burmese my Mymr
Cebuano Cebuano ceb Latn
ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ Cherokee chr Cher
dhivehi, dhivehi-bas Dhivehi dv Thaa
རྫོང་ཁ Dzonkha dz Tibt
Esperanto Esperanto eo Latn
Galego Galician gl Latn
ქართული Georgian ka Geor
Kreyòl Ayisyen Haitian Creole ht Latn
Gaeilge Irish ga Latn
Jawa Javanese jv Latn
Қазақ Kazakh kk Cyrl
Kirghiz Kirghiz ky Cyrl
Latine Latin la Latn
Malti Maltese mt Latn
Монгол Mongolian mn Cyrl
ଓଡ଼ିଆ Oriya or Orya
پښتو Pashto ps Arab
संस्कृतम् Sanskrit sa Deva
සිංහල Sinhala si Sinh
Swahili Swahili sw Latn
leššānā Suryāyā Syriac syr Syriac
བོད་སྐད་ Tibetan bo Tibt
ትግርኛ Tigirinya ti Ethi
اردو Urdu ur Arab
oʻzbekcha Uzbek uz Latn; Latin
oʻzbekcha Uzbek uz-Cyrl Cyrl; old orthography
Cymraeg Welsh cy Latn
IsiZulu Zulu zu Latn

Mapped languages

The following languages are mapped to another language code or mapped to a general character recognizer.

Language Language (English name) languageHints code Script / notes Mapped to
بهسا اچيه Acehnese ace Latn Latin script model
Lwo Acholi ach Latn Latin script model
Dangme Adangme ada Latn Latin script model
Akan Akan ak Latn Latin script model
Anicinâbemowin Algonquinian alg Latn Latin script model
Mapudungu Araucanian/Mapuche arn Latn Latin script model
Asturianu Asturian ast Latn Latin script model
Dene Athabaskan ath Latn Latin script model
Aymar aru Aymara ay Latn Latin script model
Bhāṣa Bali Balinese ban Latn Latin script model
Bamanankan Bambara bm Latn Latin script model
Narrow Bantu Bantu bnt Latn Latin script model
башҡорт теле Bashkir ba Cyrl Cyrillic script model
Toba–Batak Batak btk Latn Latin script model
Chibemba Bemba bem Latn Latin script model
Bikol Naga Bikol bik Latn Latin script model
Bichelamar Bislama bi Latn Latin script model
Brezhoneg Breton br Latn Latin script model
нохчийн мотт / noxçiyn mott Chechen ce Cyrl Cyrillic script model
汉语 Chinese zh-Hans Hans; Simplified; Mandarin zh
漢語 Chinese zh-Hant Hant; Traditional; Mandarin zh
普通話 Chinese zh-Hant-HK Hant; Mandarin; Hong Kong zh
Chahta' Choctaw cho Latn Latin script model
Чӑвашла Chuvash cv Cyrl Cyrillic script model
Cree–Montagnais–Naskapi Cree cr Latn Latin script model
Mvskoke Creek mus Latn Latin script model
qırımtatar tili, къырымтатар тили Crimean Tatar crh Latn Cyrillic script model
Dakhótiyapi, Dakȟótiyapi Dakota dak Latn Latin script model
Douala Duala dua Latn Latin script model
Ikɔ Efik Efik efi Latn Latin script model
English (British) English en-GB Latn; British en
Èʋegbe Ewe ee Latn Latin script model
føroyskt mál Faroese fo Latn Latin script model
Na Vosa Vakaviti Fijian fj Latn Latin script model
fɔ̀ngbè Fon fon Latn Latin script model
Français canadien French fr-CA Latn; Canadian fr
Fulani, Fulah, Peul Fulah ff Latn Latin script model
Ga gaa Latn Latin script model
Luganda Ganda lg Latn Latin script model
Basa Gayo Gayo gay Latn Latin script model
Kiribati Gilbertese gil Latn Latin script model
Gothic Gothic got Latn Latin script model
Guaraní Guarani gn Latn Latin script model
Harshen/Halshen Hausa هَرْشَن هَوْسَ Hausa ha Latn Latin script model
ʻŌlelo Hawaiʻi Hawaiian haw Latn Latin script model
Otjiherero Herero hz Latn Latin script model
Ilonggo Hiligaynon hil Latn Latin script model
Jaku Iban Iban iba Latn Latin script model
Asụsụ Igbo Igbo ig Latn Latin script model
Ilokano Iloko ilo Latn Latin script model
Taqbaylit Kabyle kab Latn Latin script model
Jingpho Kachin kac Latn Latin script model
Kalaallisut Kalaallisut kl Latn Latin script model
Kikamba Kamba kam Latn Latin script model
Kanuri Kanuri kr Latn Latin script model
Qaraqalpaq tili, Қарақалпақ тили, قاراقالپاق تىلى Kara-Kalpak kaa Cyrl/Latn Cyrillic script model
Ka Ktien Khasi Khasi kha Latn Latin script model
Gĩkũyũ Kikuyu ki Latn Latin script model
Kinyarwanda Kinyarwanda rw Latn Latin script model
коми кыв Komi kv Cyrl Cyrillic script model
Kikongo Kongo kg Latn Latin script model
Kosraean Kosraean kos Latn Latin script model
Oshikwanyama Kuanyama kj Latn Latin script model
Ngala Lingala ln Latn Latin script model
Plattdütsch, Plattdeutsch, Nedersaksisch Low German nds Latn Latin script model
siLozi Lozi loz Latn Latin script model
Kiluba Luba-Katanga lu Latn Latin script model
Dholuo Luo luo Latn Latin script model
Madhura, Basa Mathura, بَهاسَ مَدورا Madurese mad Latn Latin script model
Malagasy Malagasy mg Latn Latin script model
Mandinka, لغة مندنكا Mandingo man Latn Latin script model
Gaelg, Gailck Manx gv Latn Latin script model
Te reo Māori Maori mi Latn Latin script model
Ebon Marshallese mh Latn Latin script model
Mɛnde yia Mende men Latn Latin script model
Middle English Middle English enm Latn Latin script model
Mittelhochdeutsch Middle High German gmh Latn Latin script model
Baso Minangkabau, باسو مينڠكاباو Minangkabau min Latn Latin script model
Kanienʼkéha Mohawk moh Latn Latin script model
Nkundu Mongo lol Latn Latin script model
Nāhuatl Nahuatl nah Latn Latin script model
Diné bizaad Navajo nv Latn Latin script model
Ndonga Ndonga ng Latn Latin script model
ko e vagahau Niuē Niuean niu Latn Latin script model
Zimbabwe Ndebele North Ndebele nd Latn Latin script model
Sesotho sa Leboa Northern Sotho nso Latn Latin script model
Chichewa, Chinyanja Nyanja ny Latn Latin script model
Runyankore Nyankole nyn Latn Latin script model
Chitonga Nyasa Tonga tog Latn Latin script model
Appolo Nzima nzi Latn Latin script model
Occitan, lenga d'òc, provençal Occitan oc Latn Latin script model
Anishinaabemowin, ᐊᓂᔑᓈᐯᒧᐎᓐ Ojibwa oj Latn Latin script model
Ænglisc, Englisc, Anglisc Old English ang Latn Latin script model
Franceis, François, Romanz Old French fro Latn Latin script model
Diutisk, Althochdeutsch Old High German goh Latn Latin script model
Dǫnsk tunga Old Norse non Latn Latin script model
Occitan ancian Old Provencal pro Latn Latin script model
ирон ӕвзаг Ossetic os Cyrl Cyrillic script model
Kapampangan Pampanga pam Latn Latin script model
Salitan Pangasinan Pangasinan pag Latn Latin script model
Papiamentu Papiamento pap Latn Latin script model
Português (Portugal) Portuguese pt-PT Latn; European pt
Kechua / Runa Simi Quechua qu Latn Latin script model
Rumantsch Romansh rm Latn Latin script model
Romani čhib Romany rom Latn Latin script model
Ikirundi Rundi rn Latn Latin script model
Sakha Sakha sah Cyrl Cyrillic script model
Gagana faʻa Sāmoa Samoan sm Latn Latin script model
yângâ tî sängö Sango sg Latn Latin script model
(Braid) Scots, Lallans, Doric Scots sco Latn Latin script model
Gàidhlig Scottish Gaelic gd Latn Latin script model
chiShona Shona sn Latn Latin script model
Songhay Songhai son Latn Latin script model
Sesotho Southern Sotho st Latn Latin script model
Español (Latinoamérica) Spanish es-419 Latn; Latin American es
ᮘᮞ ᮞᮥᮔ᮪ᮓ , Basa Sunda Sundanese su Latn Latin script model
siSwati Swati ss Latn Latin script model
Reo Tahiti Tahitian ty Latn Latin script model
тоҷикӣ Tajik tg Cyrl Cyrillic script model
татар теле Tatar tt Cyrl/Latn Cyrillic script model
KʌThemnɛ Temne tem Latn Latin script model
lea faka-Tonga Tongan to Latn Latin script model
Xitsonga Tsonga ts Latn Latin script model
Setswana Tswana tn Latn Latin script model
Türkmençe Turkmen tk Latn Cyrillic script model
удмурт кыл Udmurt udm Cyrl Cyrillic script model
Tshivenḓa Venda ve Latn Latin script model
Vod Votic vot Cyrl/Latn Cyrillic script model
Frysk Western Frisian fy Latn Latin script model
Wolof Wolof wo Latn Latin script model
isiXhosa Xhosa xh Latn Latin script model
Èdè Yorùbá Yoruba yo Latn Latin script model
Diidxazá Zapotec zap Latn Latin script model

Other processor language support

Currently, English is the only language supported for all non-Document OCR processor functionality.