The Vision API can detect and extract text from images. There are two annotation features that support OCR:
-
TEXT_DETECTIONdetects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.
-
DOCUMENT_TEXT_DETECTIONalso extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.
Specifying the language
Both types of OCR requests support one or more languageHints that specify
the language of any text in the image. However, in most cases, an empty value
yields the best results since it enables automatic language detection. For
languages based on the Latin alphabet, setting languageHints is not needed.
In rare cases, when the language of the text in the image is known, setting a
hint will help get better results (although it will be a significant hindrance
if the hint is wrong). Text detection returns an error if one or more of the
specified languages is not one of the
supported languages.
Text detection requests
To make an OCR request, make a POST request and provide the appropriate
request body:
POST https://vision.googleapis.com/v1/images:annotate?key=YOUR_API_KEY
{
"requests": [
{
"image": {
"content": "/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z"
},
"features": [
{
"type": "TEXT_DETECTION"
}
]
}
]
}
For document text detection, substitute "type": "DOCUMENT_TEXT_DETECTION" in
the request above.
Images can be passed in one of three ways: as a base64-encoded string (shown above); as a Google Cloud Storage URI; or as a web URI. See Making requests for more information.
See the AnnotateImageRequest
reference documentation for more information on configuring the request body.
Code samples
For samples in a number of programming languages, see:
Text detection responses
If the request is successful, the server returns a 200 OK HTTP status code and
the response in JSON format.
A TEXT_DETECTION response includes the detected phrase, its bounding box,
and individual words and their bounding boxes:
{
"responses": [
{
"textAnnotations": [
{
"locale": "en",
"description": "ABBEY\nROAD NW8\nCITY OF WESTMINSTER\n",
"boundingPoly": {
"vertices": [
{
"x": 45,
"y": 43
},
{
"x": 269,
"y": 43
},
{
"x": 269,
"y": 178
},
{
"x": 45,
"y": 178
}
]
}
},
{
"description": "ABBEY",
"boundingPoly": {
"vertices": [
{
"x": 45,
"y": 50
},
{
"x": 181,
"y": 43
},
{
"x": 183,
"y": 80
},
{
"x": 47,
"y": 87
}
]
}
},
{
"description": "ROAD",
"boundingPoly": {
"vertices": [
{
"x": 48,
"y": 96
},
{
"x": 155,
"y": 96
},
{
"x": 155,
"y": 132
},
{
"x": 48,
"y": 132
}
]
}
},
{
"description": "NW8",
"boundingPoly": {
"vertices": [
{
"x": 182,
"y": 95
},
{
"x": 269,
"y": 95
},
{
"x": 269,
"y": 130
},
{
"x": 182,
"y": 130
}
]
}
},
{
"description": "CITY",
"boundingPoly": {
"vertices": [
{
"x": 51,
"y": 162
},
{
"x": 85,
"y": 161
},
{
"x": 85,
"y": 177
},
{
"x": 51,
"y": 178
}
]
}
},
{
"description": "OF",
"boundingPoly": {
"vertices": [
{
"x": 95,
"y": 162
},
{
"x": 111,
"y": 162
},
{
"x": 111,
"y": 176
},
{
"x": 95,
"y": 176
}
]
}
},
{
"description": "WESTMINSTER",
"boundingPoly": {
"vertices": [
{
"x": 124,
"y": 162
},
{
"x": 249,
"y": 160
},
{
"x": 249,
"y": 174
},
{
"x": 124,
"y": 176
}
]
}
}
]
}
]
}
A DOCUMENT_TEXT_DETECTION response includes additional layout information,
such as page, block, paragraph, word, and break information. (The sample below
uses a simplified example; the response from a dense document is too long to
display on this page.)
{
"responses": [
{
"textAnnotations": [
{
"locale": "en",
"description": "O Google Cloud Platform\n",
"boundingPoly": {
"vertices": [
{
"x": 14, "y": 11
},
{
"x": 279, "y": 11
},
{
"x": 279, "y": 37
},
{
"x": 14, "y": 37
}
]
}
},
],
"fullTextAnnotation": {
"pages": [
{
"property": {
"detectedLanguages": [
{
"languageCode": "en"
}
]
},
"width": 281,
"height": 44,
"blocks": [
{
"property": {
"detectedLanguages": [
{
"languageCode": "en"
}
]
},
"boundingBox": {
"vertices": [
{
"x": 14, "y": 11
},
{
"x": 279, "y": 11
},
{
"x": 279, "y": 37
},
{
"x": 14, "y": 37
}
]
},
"paragraphs": [
{
"property": {
"detectedLanguages": [
{
"languageCode": "en"
}
]
},
"boundingBox": {
"vertices": [
{
"x": 14, "y": 11
},
{
"x": 279, "y": 11
},
{
"x": 279, "y": 37
},
{
"x": 14, "y": 37
}
]
},
"words": [
{
"property": {
"detectedLanguages": [
{
"languageCode": "en"
}
]
},
"boundingBox": {
"vertices": [
{
"x": 14, "y": 11
},
{
"x": 23, "y": 11
},
{
"x": 23, "y": 37
},
{
"x": 14, "y": 37
}
]
},
"symbols": [
{
"property": {
"detectedLanguages": [
{
"languageCode": "en"
}
],
"detectedBreak": {
"type": "SPACE"
}
},
"boundingBox": {
"vertices": [
{
"x": 14, "y": 11
},
{
"x": 23, "y": 11
},
{
"x": 23, "y": 37
},
{
"x": 14, "y": 37
}
]
},
"text": "O"
}
]
},
]
}
],
"blockType": "TEXT"
}
]
}
],
"text": "Google Cloud Platform\n"
}
}
]
}