OCR Language Support

Cloud Vision API's text recognition feature is able to detect a wide variety of languages and can detect multiple languages within a single image.

Providing a language hint to the service is not required, but can be done if the service is having trouble detecting the language used in your image.

With the release of Handwriting OCR GA images with handwriting no longer require a handwriting languageHints flag when using DOCUMENT_TEXT_DETECTION.

Optional language hints are specified within a request's ImageContext as a list of languageHints for a TEXT_DETECTION and DOCUMENT_TEXT_DETECTION request.

Each language code parameter typically consists of a BCP-47 identifier. This parameter can be of the form language-region, where language refers to the primary language and the optional region refers to a region (usually a country identifier) of a particular dialect. For example, Chinese can be represented as Simplified Chinese as written in the People's Republic of China (zh-Hans) or Traditional Chinese as written in Taiwan (zh-Hant).

There are three levels of language support in the text recognition feature:

  1. Supported languages are those we prioritize and regularly evaluate performance against.
  2. Experimental languages are those under active development but not regularly evaluated against.
  3. Mapped languages are those supported by mapping them to another language code or to a general character recognizer. For example, "en-GB" is supported, but it is not treated any differently than "en" for the purposes of recognizing text. We make a best-effort to return the correct mapped language code in the Entity locale field, but mapped languages are more likely than fully supported or experimentally supported languages to be misidentified as a similar language.

The list of languages (with associated languageHint codes) supported by TEXT_DETECTION and DOCUMENT_TEXT_DETECTION is shown below.

If the language hint is left blank, we will attempt to auto-detect the most appropriate language. The TEXT_DETECTION endpoint will auto-detect only a subset of supported languages, while the DOCUMENT_TEXT_DETECTION endpoint will auto-detect the full set of supported languages.

Supported languages

The following languages are prioritized and regularly evaluated.

To filter by features, type or directly select the desired language from the dropdown menu:

Experimental languages

The following languages are under active development and not yet regularly evaluated against.

Mapped languages

The following languages are mapped to another language code or mapped to a general character recognizer.