Detect text in images

Optical Character Recognition (OCR)

The Vision API can detect and extract text from images. There are two annotation features that support optical character recognition (OCR):

  • TEXT_DETECTION detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.

    Road sign image

  • DOCUMENT_TEXT_DETECTION also extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.

    Dense image with annotations

    Learn more about DOCUMENT_TEXT_DETECTION for handwriting extraction and text extraction from files (PDF/TIFF).

Text detection requests

Set up your GCP project and authentication

Detect text in a local image

The Vision API can perform feature detection on a local image file by sending the contents of the image file as a base64 encoded string in the body of your request.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • base64-encoded-image: The base64 representation (ASCII string) of your binary image data. This string should look similar to the following string:
    • /9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==
    Visit the base64 encode topic for more information.

HTTP method and URL:

POST https://vision.googleapis.com/v1/images:annotate

Request JSON body:

{
  "requests": [
    {
      "image": {
        "content": "base64-encoded-image"
      },
      "features": [
        {
          "type": "TEXT_DETECTION"
        }
      ]
    }
  ]
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://vision.googleapis.com/v1/images:annotate

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://vision.googleapis.com/v1/images:annotate" | Select-Object -Expand Content

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.

A TEXT_DETECTION response includes the detected phrase, its bounding box, and individual words and their bounding boxes.

C#

Before trying this sample, follow the C# setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API C# API reference documentation .

// Load an image from a local file.
var image = Image.FromFile(filePath);
var client = ImageAnnotatorClient.Create();
var response = client.DetectText(image);
foreach (var annotation in response)
{
    if (annotation.Description != null)
        Console.WriteLine(annotation.Description);
}

Go

Before trying this sample, follow the Go setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Go API reference documentation .


// detectText gets text from the Vision API for an image at the given file path.
func detectText(w io.Writer, file string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	f, err := os.Open(file)
	if err != nil {
		return err
	}
	defer f.Close()

	image, err := vision.NewImageFromReader(f)
	if err != nil {
		return err
	}
	annotations, err := client.DetectTexts(ctx, image, nil, 10)
	if err != nil {
		return err
	}

	if len(annotations) == 0 {
		fmt.Fprintln(w, "No text found.")
	} else {
		fmt.Fprintln(w, "Text:")
		for _, annotation := range annotations {
			fmt.Fprintf(w, "%q\n", annotation.Description)
		}
	}

	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries. For more information, see the Vision API Java API reference documentation.

public static void detectText(String filePath, PrintStream out) throws Exception, IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));

  Image img = Image.newBuilder().setContent(imgBytes).build();
  Feature feat = Feature.newBuilder().setType(Type.TEXT_DETECTION).build();
  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
  requests.add(request);

  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();

    for (AnnotateImageResponse res : responses) {
      if (res.hasError()) {
        out.printf("Error: %s\n", res.getError().getMessage());
        return;
      }

      // For full list of available annotations, see http://g.co/cloud/vision/docs
      for (EntityAnnotation annotation : res.getTextAnnotationsList()) {
        out.printf("Text: %s\n", annotation.getDescription());
        out.printf("Position : %s\n", annotation.getBoundingPoly());
      }
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .

const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const fileName = 'Local image file, e.g. /path/to/image.png';

// Performs text detection on the local file
const [result] = await client.textDetection(fileName);
const detections = result.textAnnotations;
console.log('Text:');
detections.forEach(text => console.log(text));

PHP

Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .

namespace Google\Cloud\Samples\Vision;

use Google\Cloud\Vision\V1\ImageAnnotatorClient;

// $path = 'path/to/your/image.jpg';

function detect_text($path)
{
    $imageAnnotator = new ImageAnnotatorClient();

    # annotate the image
    $image = file_get_contents($path);
    $response = $imageAnnotator->textDetection($image);
    $texts = $response->getTextAnnotations();

    printf('%d texts found:' . PHP_EOL, count($texts));
    foreach ($texts as $text) {
        print($text->getDescription() . PHP_EOL);

        # get bounds
        $vertices = $text->getBoundingPoly()->getVertices();
        $bounds = [];
        foreach ($vertices as $vertex) {
            $bounds[] = sprintf('(%d,%d)', $vertex->getX(), $vertex->getY());
        }
        print('Bounds: ' . join(', ', $bounds) . PHP_EOL);
    }

    $imageAnnotator->close();
}

Python

Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .

def detect_text(path):
    """Detects text in the file."""
    from google.cloud import vision
    import io
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.text_detection(image=image)
    texts = response.text_annotations
    print('Texts:')

    for text in texts:
        print('\n"{}"'.format(text.description))

        vertices = (['({},{})'.format(vertex.x, vertex.y)
                    for vertex in text.bounding_poly.vertices])

        print('bounds: {}'.format(','.join(vertices)))

Ruby

Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .

# image_path = "Path to local image file, eg. './image.png'"

require "google/cloud/vision"

image_annotator = Google::Cloud::Vision::ImageAnnotator.new

response = image_annotator.text_detection(
  image:       image_path,
  max_results: 1 # optional, defaults to 10
)

response.responses.each do |res|
  res.text_annotations.each do |text|
    puts text.description
  end
end

Detect text in a remote image

For your convenience, the Vision API can perform feature detection directly on an image file located in Google Cloud Storage or on the Web without the need to send the contents of the image file in the body of your request.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • cloud-storage-image-uri: the path to a valid image file in a Cloud Storage bucket. You must at least have read privileges to the file. Example:
    • gs://cloud-samples-data/vision/ocr/sign.jpg

HTTP method and URL:

POST https://vision.googleapis.com/v1/images:annotate

Request JSON body:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "cloud-storage-image-uri"
        }
       },
       "features": [
         {
           "type": "TEXT_DETECTION"
         }
       ]
    }
  ]
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://vision.googleapis.com/v1/images:annotate

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://vision.googleapis.com/v1/images:annotate" | Select-Object -Expand Content

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.

A TEXT_DETECTION response includes the detected phrase, its bounding box, and individual words and their bounding boxes.

C#

Before trying this sample, follow the C# setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API C# API reference documentation .

// Specify a Google Cloud Storage uri for the image
// or a publicly accessible HTTP or HTTPS uri.
var image = Image.FromUri(uri);
var client = ImageAnnotatorClient.Create();
var response = client.DetectText(image);
foreach (var annotation in response)
{
    if (annotation.Description != null)
        Console.WriteLine(annotation.Description);
}

Go

Before trying this sample, follow the Go setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Go API reference documentation .


// detectText gets text from the Vision API for an image at the given file path.
func detectTextURI(w io.Writer, file string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	image := vision.NewImageFromURI(file)
	annotations, err := client.DetectTexts(ctx, image, nil, 10)
	if err != nil {
		return err
	}

	if len(annotations) == 0 {
		fmt.Fprintln(w, "No text found.")
	} else {
		fmt.Fprintln(w, "Text:")
		for _, annotation := range annotations {
			fmt.Fprintf(w, "%q\n", annotation.Description)
		}
	}

	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries. For more information, see the Vision API Java API reference documentation.

public static void detectTextGcs(String gcsPath, PrintStream out) throws Exception, IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ImageSource imgSource = ImageSource.newBuilder().setGcsImageUri(gcsPath).build();
  Image img = Image.newBuilder().setSource(imgSource).build();
  Feature feat = Feature.newBuilder().setType(Type.TEXT_DETECTION).build();
  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
  requests.add(request);

  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();

    for (AnnotateImageResponse res : responses) {
      if (res.hasError()) {
        out.printf("Error: %s\n", res.getError().getMessage());
        return;
      }

      // For full list of available annotations, see http://g.co/cloud/vision/docs
      for (EntityAnnotation annotation : res.getTextAnnotationsList()) {
        out.printf("Text: %s\n", annotation.getDescription());
        out.printf("Position : %s\n", annotation.getBoundingPoly());
      }
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .

// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const bucketName = 'Bucket where the file resides, e.g. my-bucket';
// const fileName = 'Path to file within bucket, e.g. path/to/image.png';

// Performs text detection on the gcs file
const [result] = await client.textDetection(`gs://${bucketName}/${fileName}`);
const detections = result.textAnnotations;
console.log('Text:');
detections.forEach(text => console.log(text));

PHP

Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .

namespace Google\Cloud\Samples\Vision;

use Google\Cloud\Vision\V1\ImageAnnotatorClient;

// $path = 'gs://path/to/your/image.jpg'

function detect_text_gcs($path)
{
    $imageAnnotator = new ImageAnnotatorClient();

    # annotate the image
    $response = $imageAnnotator->textDetection($path);
    $texts = $response->getTextAnnotations();

    printf('%d texts found:' . PHP_EOL, count($texts));
    foreach ($texts as $text) {
        print($text->getDescription() . PHP_EOL);

        # get bounds
        $vertices = $text->getBoundingPoly()->getVertices();
        $bounds = [];
        foreach ($vertices as $vertex) {
            $bounds[] = sprintf('(%d,%d)', $vertex->getX(), $vertex->getY());
        }
        print('Bounds: ' . join(', ', $bounds) . PHP_EOL);
    }

    if ($error = $response->getError()) {
        print('API Error: ' . $error->getMessage() . PHP_EOL);
    }

    $imageAnnotator->close();
}

Python

Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .

def detect_text_uri(uri):
    """Detects text in the file located in Google Cloud Storage or on the Web.
    """
    from google.cloud import vision
    client = vision.ImageAnnotatorClient()
    image = vision.types.Image()
    image.source.image_uri = uri

    response = client.text_detection(image=image)
    texts = response.text_annotations
    print('Texts:')

    for text in texts:
        print('\n"{}"'.format(text.description))

        vertices = (['({},{})'.format(vertex.x, vertex.y)
                    for vertex in text.bounding_poly.vertices])

        print('bounds: {}'.format(','.join(vertices)))

Ruby

Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .

# image_path = "Google Cloud Storage URI, eg. 'gs://my-bucket/image.png'"

require "google/cloud/vision"

image_annotator = Google::Cloud::Vision::ImageAnnotator.new

response = image_annotator.text_detection(
  image:       image_path,
  max_results: 1 # optional, defaults to 10
)

response.responses.each do |res|
  res.text_annotations.each do |text|
    puts text.description
  end
end

gcloud command

To perform text detection, use the gcloud ml vision detect-text command as shown in the following example:

gcloud ml vision detect-text gs://cloud-samples-data/vision/ocr/sign.jpg

Specify the language (optional)

Both types of OCR requests support one or more languageHints that specify the language of any text in the image. However, in most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting languageHints is not needed. In rare cases, when the language of the text in the image is known, setting a hint helps get better results (although it can be a significant hindrance if the hint is wrong). Text detection returns an error if one or more of the specified languages is not one of the supported languages.

If you choose to provide a language hint, modify the body of your request (request.json file) to provide the string of one of the supported languages in the imageContext.languageHints field as shown below:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "image-url"
        }
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ],
      "imageContext": {
        "languageHints": ["en-t-i0-handwrit"]
      }
    }
  ]
}
How do language hints work?

The languageHint format follows the BCP47 language code formatting guidelines. The BCP47 specified format is as follows:

language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse].

For example, the language hint "en-t-i0-handwrit" specifies English language (en), transform extension singleton (t), input method engine transform extension code (i0), and handwriting transform code (handwrit). This roughly says the language is "English transformed from handwriting." You do not need to specify a script code because Latn is implied by the "en" language.

Multi-regional support

You can now specify continent-level data storage and OCR processing. The following regions are currently supported:

  • us: USA country only
  • eu: The European Union

Locations

Cloud Vision offers you some control over where the resources for your project are stored and processed. In particular, you can configure Cloud Vision to store and process your data only in the European Union.

By default Cloud Vision stores and processes resources in a Global location, which means that Cloud Vision doesn't guarantee that your resources will remain within a particular location or region. If you choose the European Union location, Google will store your data and process it only in the European Union. You and your users can access the data from any location.

Setting the location using the API

Cloud Vision supports both a global API endpoint (vision.googleapis.com) and a European Union endpoint (eu-vision.googleapis.com). To store and process your data in the European Union only, use the URI eu-vision.googleapis.com in place of vision.googleapis.com for your REST API calls:

  • https://eu-vision.googleapis.com/v1/images:annotate
  • https://eu-vision.googleapis.com/v1/images:asyncBatchAnnotate
  • https://eu-vision.googleapis.com/v1/files:annotate
  • https://eu-vision.googleapis.com/v1/files:asyncBatchAnnotate

Setting the location using the client libraries

The Vision API client libraries accesses the global API endpoint (vision.googleapis.com) by default. To store and process your data in the European Union only, you need to explicitly set the endpoint (eu-vision.googleapis.com). The code samples below show how to configure this setting.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • cloud-storage-image-uri: the path to a valid image file in a Cloud Storage bucket. You must at least have read privileges to the file. Example:
    • gs://cloud-samples-data/vision/ocr/sign.jpg

HTTP method and URL:

POST https://eu-vision.googleapis.com/v1/images:annotate

Request JSON body:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "cloud-storage-image-uri"
        }
       },
       "features": [
         {
           "type": "TEXT_DETECTION"
         }
       ]
    }
  ]
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://eu-vision.googleapis.com/v1/images:annotate

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://eu-vision.googleapis.com/v1/images:annotate" | Select-Object -Expand Content

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.

A TEXT_DETECTION response includes the detected phrase, its bounding box, and individual words and their bounding boxes.

C#

// Instantiate a client connected to the 'eu' location.
var client = new ImageAnnotatorClientBuilder
{
    Endpoint = new ServiceEndpoint("eu-vision.googleapis.com")
}.Build();

Java

Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries. For more information, see the Vision API Java API reference documentation.

ImageAnnotatorSettings settings =
    ImageAnnotatorSettings.newBuilder().setEndpoint("eu-vision.googleapis.com:443").build();

// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
ImageAnnotatorClient client = ImageAnnotatorClient.create(settings);

Node.js

// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');

// Specifies the location of the api endpoint
const clientOptions = {apiEndpoint: 'eu-vision.googleapis.com'};

// Creates a client
const client = new vision.ImageAnnotatorClient(clientOptions);

PHP

# set endpoint
$options = ['api_endpoint' => $endpoint];

$client = new ImageAnnotatorClient($options);

Python

from google.cloud import vision

client_options = {'api_endpoint': 'eu-vision.googleapis.com'}

client = vision.ImageAnnotatorClient(client_options=client_options)

Ruby

require "google/cloud/vision"

# Specify the service address at client construction.
image_annotator = Google::Cloud::Vision::ImageAnnotator.new service_address: "eu-vision.googleapis.com"

response = image_annotator.text_detection(
  image:       "gs://cloud-samples-data/vision/text/screen.jpg",
  max_results: 1
)

response.responses.each do |res|
  res.text_annotations.each do |text|
    puts text.description
  end
end

Try it

Try text detection and document text detection below. You can use the image specified already (gs://cloud-samples-data/vision/ocr/sign.jpg) by clicking Execute, or you can specify your own image in its place.

To try document text detection, update the value of type to DOCUMENT_TEXT_DETECTION.

Road sign image

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Vision API Documentation
Need help? Visit our support page.