Optical character recognition (OCR)

The Vision API can detect and extract text from images. There are two annotation features that support OCR:

  • TEXT_DETECTION detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.

  • DOCUMENT_TEXT_DETECTION also extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.

Specifying the language

Both types of OCR requests support one or more languageHints that specify the language of any text in the image. However, in most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting languageHints is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong). Text detection returns an error if one or more of the specified languages is not one of the supported languages.

Text detection requests

To make an OCR request, make a POST request and provide the appropriate request body:

POST https://vision.googleapis.com/v1/images:annotate?key=YOUR_API_KEY
{
  "requests": [
    {
      "image": {
        "content": "/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z"
      },
      "features": [
        {
          "type": "TEXT_DETECTION"
        }
      ]
    }
  ]
}

For document text detection, substitute "type": "DOCUMENT_TEXT_DETECTION" in the request above.

Images can be passed in one of three ways: as a base64-encoded string (shown above); as a Google Cloud Storage URI; or as a web URI. See Making requests for more information.

See the AnnotateImageRequest reference documentation for more information on configuring the request body.

Code samples

For samples in a number of programming languages, see:

Text detection responses

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.

A TEXT_DETECTION response includes the detected phrase, its bounding box, and individual words and their bounding boxes:

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "ABBEY\nROAD NW8\nCITY OF WESTMINSTER\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 45,
                "y": 43
              },
              {
                "x": 269,
                "y": 43
              },
              {
                "x": 269,
                "y": 178
              },
              {
                "x": 45,
                "y": 178
              }
            ]
          }
        },
        {
          "description": "ABBEY",
          "boundingPoly": {
            "vertices": [
              {
                "x": 45,
                "y": 50
              },
              {
                "x": 181,
                "y": 43
              },
              {
                "x": 183,
                "y": 80
              },
              {
                "x": 47,
                "y": 87
              }
            ]
          }
        },
        {
          "description": "ROAD",
          "boundingPoly": {
            "vertices": [
              {
                "x": 48,
                "y": 96
              },
              {
                "x": 155,
                "y": 96
              },
              {
                "x": 155,
                "y": 132
              },
              {
                "x": 48,
                "y": 132
              }
            ]
          }
        },
        {
          "description": "NW8",
          "boundingPoly": {
            "vertices": [
              {
                "x": 182,
                "y": 95
              },
              {
                "x": 269,
                "y": 95
              },
              {
                "x": 269,
                "y": 130
              },
              {
                "x": 182,
                "y": 130
              }
            ]
          }
        },
        {
          "description": "CITY",
          "boundingPoly": {
            "vertices": [
              {
                "x": 51,
                "y": 162
              },
              {
                "x": 85,
                "y": 161
              },
              {
                "x": 85,
                "y": 177
              },
              {
                "x": 51,
                "y": 178
              }
            ]
          }
        },
        {
          "description": "OF",
          "boundingPoly": {
            "vertices": [
              {
                "x": 95,
                "y": 162
              },
              {
                "x": 111,
                "y": 162
              },
              {
                "x": 111,
                "y": 176
              },
              {
                "x": 95,
                "y": 176
              }
            ]
          }
        },
        {
          "description": "WESTMINSTER",
          "boundingPoly": {
            "vertices": [
              {
                "x": 124,
                "y": 162
              },
              {
                "x": 249,
                "y": 160
              },
              {
                "x": 249,
                "y": 174
              },
              {
                "x": 124,
                "y": 176
              }
            ]
          }
        }
      ]
    }
  ]
}

A DOCUMENT_TEXT_DETECTION response includes additional layout information, such as page, block, paragraph, word, and break information. (The sample below uses a simplified example; the response from a dense document is too long to display on this page.)

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "O Google Cloud Platform\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 14, "y": 11
              },
              {
                "x": 279, "y": 11
              },
              {
                "x": 279, "y": 37
              },
              {
                "x": 14, "y": 37
              }
            ]
          }
        },
      ],
      "fullTextAnnotation": {
        "pages": [
          {
            "property": {
              "detectedLanguages": [
                {
                  "languageCode": "en"
                }
              ]
            },
            "width": 281,
            "height": 44,
            "blocks": [
              {
                "property": {
                  "detectedLanguages": [
                    {
                      "languageCode": "en"
                    }
                  ]
                },
                "boundingBox": {
                  "vertices": [
                    {
                      "x": 14, "y": 11
                    },
                    {
                      "x": 279, "y": 11
                    },
                    {
                      "x": 279, "y": 37
                    },
                    {
                      "x": 14, "y": 37
                    }
                  ]
                },
                "paragraphs": [
                  {
                    "property": {
                      "detectedLanguages": [
                        {
                          "languageCode": "en"
                        }
                      ]
                    },
                    "boundingBox": {
                      "vertices": [
                        {
                          "x": 14, "y": 11
                        },
                        {
                          "x": 279, "y": 11
                        },
                        {
                          "x": 279, "y": 37
                        },
                        {
                          "x": 14, "y": 37
                        }
                      ]
                    },
                    "words": [
                      {
                        "property": {
                          "detectedLanguages": [
                            {
                              "languageCode": "en"
                            }
                          ]
                        },
                        "boundingBox": {
                          "vertices": [
                            {
                              "x": 14, "y": 11
                            },
                            {
                              "x": 23, "y": 11
                            },
                            {
                              "x": 23, "y": 37
                            },
                            {
                              "x": 14, "y": 37
                            }
                          ]
                        },
                        "symbols": [
                          {
                            "property": {
                              "detectedLanguages": [
                                {
                                  "languageCode": "en"
                                }
                              ],
                              "detectedBreak": {
                                "type": "SPACE"
                              }
                            },
                            "boundingBox": {
                              "vertices": [
                                {
                                  "x": 14, "y": 11
                                },
                                {
                                  "x": 23, "y": 11
                                },
                                {
                                  "x": 23, "y": 37
                                },
                                {
                                  "x": 14, "y": 37
                                }
                              ]
                            },
                            "text": "O"
                          }
                        ]
                      },
                    ]
                  }
                ],
                "blockType": "TEXT"
              }
            ]
          }
        ],
        "text": "Google Cloud Platform\n"
      }
    }
  ]
}

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Vision API Documentation