Detect handwriting in images

Handwriting detection with Optical Character Recognition (OCR)

The Vision API can detect and extract text from images:

  • DOCUMENT_TEXT_DETECTION extracts text from an image (or file); the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.

    One specific use of DOCUMENT_TEXT_DETECTION is to detect handwriting in an image.

    handwriting image

Document text detection requests

Set up your GCP project and authentication

Detect document text in a local image

The Vision API can perform feature detection on a local image file by sending the contents of the image file as a base64 encoded string in the body of your request.

Command-line

To make a handwriting detection request using curl from the Linux or MacOS command line, make a POST request to the https://vision.googleapis.com/v1/images:annotate endpoint and specify DOCUMENT_TEXT_DETECTION as the value of features.type, as shown in the following example:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
https://vision.googleapis.com/v1/images:annotate -d "{
  'requests': [
    {
      'image': {
        'content': '/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z'
      },
      'features': [
        {
          'type': 'DOCUMENT_TEXT_DETECTION'
        }
      ]
    }
  ]
}"

See the AnnotateImageRequest reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.

C#

Before trying this sample, follow the C# setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API C# API reference documentation .

// Load an image from a local file.
var image = Image.FromFile(filePath);
var client = ImageAnnotatorClient.Create();
var response = client.DetectDocumentText(image);
foreach (var page in response.Pages)
{
    foreach (var block in page.Blocks)
    {
        foreach (var paragraph in block.Paragraphs)
        {
            Console.WriteLine(string.Join("\n", paragraph.Words));
        }
    }
}

Go

Before trying this sample, follow the Go setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Go API reference documentation .

// detectDocumentText gets the full document text from the Vision API for an image at the given file path.
func detectDocumentText(w io.Writer, file string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	f, err := os.Open(file)
	if err != nil {
		return err
	}
	defer f.Close()

	image, err := vision.NewImageFromReader(f)
	if err != nil {
		return err
	}
	annotation, err := client.DetectDocumentText(ctx, image, nil)
	if err != nil {
		return err
	}

	if annotation == nil {
		fmt.Fprintln(w, "No text found.")
	} else {
		fmt.Fprintln(w, "Document Text:")
		fmt.Fprintf(w, "%q\n", annotation.Text)

		fmt.Fprintln(w, "Pages:")
		for _, page := range annotation.Pages {
			fmt.Fprintf(w, "\tConfidence: %f, Width: %d, Height: %d\n", page.Confidence, page.Width, page.Height)
			fmt.Fprintln(w, "\tBlocks:")
			for _, block := range page.Blocks {
				fmt.Fprintf(w, "\t\tConfidence: %f, Block type: %v\n", block.Confidence, block.BlockType)
				fmt.Fprintln(w, "\t\tParagraphs:")
				for _, paragraph := range block.Paragraphs {
					fmt.Fprintf(w, "\t\t\tConfidence: %f", paragraph.Confidence)
					fmt.Fprintln(w, "\t\t\tWords:")
					for _, word := range paragraph.Words {
						symbols := make([]string, len(word.Symbols))
						for i, s := range word.Symbols {
							symbols[i] = s.Text
						}
						wordText := strings.Join(symbols, "")
						fmt.Fprintf(w, "\t\t\t\tConfidence: %f, Symbols: %s\n", word.Confidence, wordText)
					}
				}
			}
		}
	}

	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Java API reference documentation .

public static void detectDocumentText(String filePath, PrintStream out) throws Exception,
     IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));

  Image img = Image.newBuilder().setContent(imgBytes).build();
  Feature feat = Feature.newBuilder().setType(Type.DOCUMENT_TEXT_DETECTION).build();
  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
  requests.add(request);

  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();
    client.close();

    for (AnnotateImageResponse res : responses) {
      if (res.hasError()) {
        out.printf("Error: %s\n", res.getError().getMessage());
        return;
      }

      // For full list of available annotations, see http://g.co/cloud/vision/docs
      TextAnnotation annotation = res.getFullTextAnnotation();
      for (Page page: annotation.getPagesList()) {
        String pageText = "";
        for (Block block : page.getBlocksList()) {
          String blockText = "";
          for (Paragraph para : block.getParagraphsList()) {
            String paraText = "";
            for (Word word: para.getWordsList()) {
              String wordText = "";
              for (Symbol symbol: word.getSymbolsList()) {
                wordText = wordText + symbol.getText();
                out.format("Symbol text: %s (confidence: %f)\n", symbol.getText(),
                    symbol.getConfidence());
              }
              out.format("Word text: %s (confidence: %f)\n\n", wordText, word.getConfidence());
              paraText = String.format("%s %s", paraText, wordText);
            }
            // Output Example using Paragraph:
            out.println("\nParagraph: \n" + paraText);
            out.format("Paragraph Confidence: %f\n", para.getConfidence());
            blockText = blockText + paraText;
          }
          pageText = pageText + blockText;
        }
      }
      out.println("\nComplete annotation:");
      out.println(annotation.getText());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .

// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const fileName = 'Local image file, e.g. /path/to/image.png';

// Read a local image as a text document
const [result] = await client.documentTextDetection(fileName);
const fullTextAnnotation = result.fullTextAnnotation;
console.log(`Full text: ${fullTextAnnotation.text}`);
fullTextAnnotation.pages.forEach(page => {
  page.blocks.forEach(block => {
    console.log(`Block confidence: ${block.confidence}`);
    block.paragraphs.forEach(paragraph => {
      console.log(`Paragraph confidence: ${paragraph.confidence}`);
      paragraph.words.forEach(word => {
        const wordText = word.symbols.map(s => s.text).join('');
        console.log(`Word text: ${wordText}`);
        console.log(`Word confidence: ${word.confidence}`);
        word.symbols.forEach(symbol => {
          console.log(`Symbol text: ${symbol.text}`);
          console.log(`Symbol confidence: ${symbol.confidence}`);
        });
      });
    });
  });
});

PHP

Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .

namespace Google\Cloud\Samples\Vision;

use Google\Cloud\Vision\V1\ImageAnnotatorClient;

// $path = 'path/to/your/image.jpg'

function detect_document_text($path)
{
    $imageAnnotator = new ImageAnnotatorClient();

    # annotate the image
    $image = file_get_contents($path);
    $response = $imageAnnotator->documentTextDetection($image);
    $annotation = $response->getFullTextAnnotation();

    # print out detailed and structured information about document text
    if ($annotation) {
        foreach ($annotation->getPages() as $page) {
            foreach ($page->getBlocks() as $block) {
                $block_text = '';
                foreach ($block->getParagraphs() as $paragraph) {
                    foreach ($paragraph->getWords() as $word) {
                        foreach ($word->getSymbols() as $symbol) {
                            $block_text .= $symbol->getText();
                        }
                        $block_text .= ' ';
                    }
                    $block_text .= "\n";
                }
                printf('Block content: %s', $block_text);
                printf('Block confidence: %f' . PHP_EOL,
                    $block->getConfidence());

                # get bounds
                $vertices = $block->getBoundingBox()->getVertices();
                $bounds = [];
                foreach ($vertices as $vertex) {
                    $bounds[] = sprintf('(%d,%d)', $vertex->getX(),
                        $vertex->getY());
                }
                print('Bounds: ' . join(', ',$bounds) . PHP_EOL);
                print(PHP_EOL);
            }
        }
    } else {
        print('No text found' . PHP_EOL);
    }

    $imageAnnotator->close();
}

Python

Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .

def detect_document(path):
    """Detects document features in an image."""
    from google.cloud import vision
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

Ruby

Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .

# image_path = "Path to local image file, eg. './image.png'"

require "google/cloud/vision"

image_annotator = Google::Cloud::Vision::ImageAnnotator.new

response = image_annotator.document_text_detection image: image_path

text = ""
response.responses.each do |res|
  res.text_annotations.each do |annotation|
    text << annotation.description
  end
end

puts text

Detect document text in a remote image

For your convenience, the Vision API can perform feature detection directly on an image file located in Google Cloud Storage or on the Web without the need to send the contents of the image file in the body of your request.

Command-line

To make a handwriting detection request using curl from the Linux or MacOS command line, make a POST request to the https://vision.googleapis.com/v1/images:annotate endpoint and specify DOCUMENT_TEXT_DETECTION as the value of features.type, as shown in the following example:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://vision.googleapis.com/v1/images:annotate -d "{
  'requests': [
    {
      'image': {
        'source': {
          'imageUri': 'gs://vision-api-handwriting-ocr-bucket/handwriting_image.png'
        }
      },
      'features': [
        {
          'type': 'DOCUMENT_TEXT_DETECTION'
        }
      ]
    }
  ]
}"

Images can be passed in one of three ways: as a base64-encoded string (shown above); as a Google Cloud Storage URI; or as a web URI. See Making requests for more information.

See the AnnotateImageRequest reference documentation for more information on configuring the request body.

C#

Before trying this sample, follow the C# setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API C# API reference documentation .

// Specify a Google Cloud Storage uri for the image
// or a publicly accessible HTTP or HTTPS uri.
var image = Image.FromUri(uri);
var client = ImageAnnotatorClient.Create();
var response = client.DetectDocumentText(image);
foreach (var page in response.Pages)
{
    foreach (var block in page.Blocks)
    {
        foreach (var paragraph in block.Paragraphs)
        {
            Console.WriteLine(string.Join("\n", paragraph.Words));
        }
    }
}

Go

Before trying this sample, follow the Go setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Go API reference documentation .

// detectDocumentText gets the full document text from the Vision API for an image at the given file path.
func detectDocumentTextURI(w io.Writer, file string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	image := vision.NewImageFromURI(file)
	annotation, err := client.DetectDocumentText(ctx, image, nil)
	if err != nil {
		return err
	}

	if annotation == nil {
		fmt.Fprintln(w, "No text found.")
	} else {
		fmt.Fprintln(w, "Document Text:")
		fmt.Fprintf(w, "%q\n", annotation.Text)

		fmt.Fprintln(w, "Pages:")
		for _, page := range annotation.Pages {
			fmt.Fprintf(w, "\tConfidence: %f, Width: %d, Height: %d\n", page.Confidence, page.Width, page.Height)
			fmt.Fprintln(w, "\tBlocks:")
			for _, block := range page.Blocks {
				fmt.Fprintf(w, "\t\tConfidence: %f, Block type: %v\n", block.Confidence, block.BlockType)
				fmt.Fprintln(w, "\t\tParagraphs:")
				for _, paragraph := range block.Paragraphs {
					fmt.Fprintf(w, "\t\t\tConfidence: %f", paragraph.Confidence)
					fmt.Fprintln(w, "\t\t\tWords:")
					for _, word := range paragraph.Words {
						symbols := make([]string, len(word.Symbols))
						for i, s := range word.Symbols {
							symbols[i] = s.Text
						}
						wordText := strings.Join(symbols, "")
						fmt.Fprintf(w, "\t\t\t\tConfidence: %f, Symbols: %s\n", word.Confidence, wordText)
					}
				}
			}
		}
	}

	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Java API reference documentation .

public static void detectDocumentTextGcs(String gcsPath, PrintStream out) throws Exception,
    IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ImageSource imgSource = ImageSource.newBuilder().setGcsImageUri(gcsPath).build();
  Image img = Image.newBuilder().setSource(imgSource).build();
  Feature feat = Feature.newBuilder().setType(Type.DOCUMENT_TEXT_DETECTION).build();
  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
  requests.add(request);

  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();
    client.close();

    for (AnnotateImageResponse res : responses) {
      if (res.hasError()) {
        out.printf("Error: %s\n", res.getError().getMessage());
        return;
      }
      // For full list of available annotations, see http://g.co/cloud/vision/docs
      TextAnnotation annotation = res.getFullTextAnnotation();
      for (Page page: annotation.getPagesList()) {
        String pageText = "";
        for (Block block : page.getBlocksList()) {
          String blockText = "";
          for (Paragraph para : block.getParagraphsList()) {
            String paraText = "";
            for (Word word: para.getWordsList()) {
              String wordText = "";
              for (Symbol symbol: word.getSymbolsList()) {
                wordText = wordText + symbol.getText();
                out.format("Symbol text: %s (confidence: %f)\n", symbol.getText(),
                    symbol.getConfidence());
              }
              out.format("Word text: %s (confidence: %f)\n\n", wordText, word.getConfidence());
              paraText = String.format("%s %s", paraText, wordText);
            }
            // Output Example using Paragraph:
            out.println("\nParagraph: \n" + paraText);
            out.format("Paragraph Confidence: %f\n", para.getConfidence());
            blockText = blockText + paraText;
          }
          pageText = pageText + blockText;
        }
      }
      out.println("\nComplete annotation:");
      out.println(annotation.getText());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .

// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const bucketName = 'Bucket where the file resides, e.g. my-bucket';
// const fileName = 'Path to file within bucket, e.g. path/to/image.png';

// Read a remote image as a text document
const [result] = await client.documentTextDetection(
  `gs://${bucketName}/${fileName}`
);
const fullTextAnnotation = result.fullTextAnnotation;
console.log(fullTextAnnotation.text);

PHP

Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .

namespace Google\Cloud\Samples\Vision;

use Google\Cloud\Vision\V1\ImageAnnotatorClient;

// $path = 'gs://path/to/your/image.jpg'

function detect_document_text_gcs($path)
{
    $imageAnnotator = new ImageAnnotatorClient();

    # annotate the image
    $response = $imageAnnotator->documentTextDetection($path);
    $annotation = $response->getFullTextAnnotation();

    # print out detailed and structured information about document text
    if ($annotation) {
        foreach ($annotation->getPages() as $page) {
            foreach ($page->getBlocks() as $block) {
                $block_text = '';
                foreach ($block->getParagraphs() as $paragraph) {
                    foreach ($paragraph->getWords() as $word) {
                        foreach ($word->getSymbols() as $symbol) {
                            $block_text .= $symbol->getText();
                        }
                        $block_text .= ' ';
                    }
                    $block_text .= "\n";
                }
                printf('Block content: %s', $block_text);
                printf('Block confidence: %f' . PHP_EOL,
                    $block->getConfidence());

                # get bounds
                $vertices = $block->getBoundingBox()->getVertices();
                $bounds = [];
                foreach ($vertices as $vertex) {
                    $bounds[] = sprintf('(%d,%d)', $vertex->getX(),
                        $vertex->getY());
                }
                print('Bounds: ' . join(', ',$bounds) . PHP_EOL);

                print(PHP_EOL);
            }
        }
    } else {
        print('No text found' . PHP_EOL);
    }

    $imageAnnotator->close();
}

Python

Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .

def detect_document_uri(uri):
    """Detects document features in the file located in Google Cloud
    Storage."""
    from google.cloud import vision
    client = vision.ImageAnnotatorClient()
    image = vision.types.Image()
    image.source.image_uri = uri

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

Ruby

Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .

# image_path = "Google Cloud Storage URI, eg. 'gs://my-bucket/image.png'"

require "google/cloud/vision"

image_annotator = Google::Cloud::Vision::ImageAnnotator.new

response = image_annotator.document_text_detection image: image_path

text = ""
response.responses.each do |res|
  res.text_annotations.each do |annotation|
    text << annotation.description
  end
end

puts text

PowerShell

To make a handwriting detection request using Windows PowerShell, make a POST request to the https://vision.googleapis.com/v1/images:annotate endpoint and specify DOCUMENT_TEXT_DETECTION as the value of features.type, as shown in the following example:

$cred = gcloud auth application-default print-access-token
$headers = @{ Authorization = "Bearer $cred" }

Invoke-WebRequest `
  -Method Post `
  -Headers $headers `
  -ContentType: "application/json; charset=utf-8" `
  -Body "{
      'requests': [
        {
          'image': {
            'source': {
              'imageUri': 'gs://vision-api-handwriting-ocr-bucket/handwriting_image.png'
            }
          },
          'features': [
            {
              'type': 'DOCUMENT_TEXT_DETECTION'
            }
          ]
        }
      ]
    }" `
  -Uri "https://vision.googleapis.com/v1/images:annotate" | Select-Object -Expand Content

Images can be passed in one of three ways: as a base64-encoded string; as a Google Cloud Storage URI; or as a publicly-accessible HTTPS or HTTP URL. See Making requests for more information.

See the AnnotateImageRequest reference documentation for more information on configuring the request body.

GCLOUD COMMAND

To perform handwriting detection, use the gcloud ml vision detect-document command as shown in the following example:

gcloud ml vision detect-document gs://vision-api-handwriting-ocr-bucket/handwriting_image.png

Specify the language (optional)

Both types of OCR requests support one or more languageHints that specify the language of any text in the image. However, in most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting languageHints is not needed. In rare cases, when the language of the text in the image is known, setting a hint helps get better results (although it can be a significant hindrance if the hint is wrong). Text detection returns an error if one or more of the specified languages is not one of the supported languages.

If you choose to provide a language hint, provide the string of one of the supported languages in the imageContext.languageHints field as shown below:

curl -X POST \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json" \
    "https://vision.googleapis.com/v1/files:asyncBatchAnnotate" -d "{
      "requests": [
        {
          "image": {
            "source": {
              "imageUri": "image-url"
            }
          },
          "features": [
            {
              "type": "DOCUMENT_TEXT_DETECTION"
            }
          ],
          "imageContext": {
            "languageHints": ["en-t-i0-handwrit"]
          }
        }
      ]
    }"
How do language hints work?

The languageHint format follows the BCP47 language code formatting guidelines. The BCP47 specified format is as follows:

language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse].

For example, the language hint "en-t-i0-handwrit" specifies English language (en), transform extension singleton (t), input method engine transform extension code (i0), and handwriting transform code (handwrit). This roughly says the language is "English transformed from handwriting." You do not need to specify a script code because Latn is implied by the "en" language.

Try it

Try text detection and document text detection below. You can use the image specified already (gs://vision-api-handwriting-ocr-bucket/handwriting_image.png) by clicking Execute, or you can specify your own image in its place.

handwriting image

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Vision API
Need help? Visit our support page.