Detect handwriting in images

Handwriting detection with Optical Character Recognition (OCR)

The Vision API can detect and extract text from images:

  • DOCUMENT_TEXT_DETECTION extracts text from an image (or file); the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.

    One specific use of DOCUMENT_TEXT_DETECTION is to detect handwriting in an image.

    handwriting image

Document text detection requests

Set up your GCP project and authentication

Detect document text in a local image

The Vision API can perform feature detection on a local image file by sending the contents of the image file as a base64 encoded string in the body of your request.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • base64-encoded-image: the base64 representation (ASCII string) of your binary image data. This string should look similar to the following string:
    • /9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==
    Visit the base64 encode topic for more information.

HTTP method and URL:

POST https://vision.googleapis.com/v1/images:annotate

Request JSON body:

{
  "requests": [
    {
      "image": {
        "content": "base64-encoded-image"
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ]
    }
  ]
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://vision.googleapis.com/v1/images:annotate

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ Authorization = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://vision.googleapis.com/v1/images:annotate" | Select-Object -Expand Content

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.

C#

Before trying this sample, follow the C# setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API C# API reference documentation .

// Load an image from a local file.
var image = Image.FromFile(filePath);
var client = ImageAnnotatorClient.Create();
var response = client.DetectDocumentText(image);
foreach (var page in response.Pages)
{
    foreach (var block in page.Blocks)
    {
        foreach (var paragraph in block.Paragraphs)
        {
            Console.WriteLine(string.Join("\n", paragraph.Words));
        }
    }
}

Go

Before trying this sample, follow the Go setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Go API reference documentation .


// detectDocumentText gets the full document text from the Vision API for an image at the given file path.
func detectDocumentText(w io.Writer, file string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	f, err := os.Open(file)
	if err != nil {
		return err
	}
	defer f.Close()

	image, err := vision.NewImageFromReader(f)
	if err != nil {
		return err
	}
	annotation, err := client.DetectDocumentText(ctx, image, nil)
	if err != nil {
		return err
	}

	if annotation == nil {
		fmt.Fprintln(w, "No text found.")
	} else {
		fmt.Fprintln(w, "Document Text:")
		fmt.Fprintf(w, "%q\n", annotation.Text)

		fmt.Fprintln(w, "Pages:")
		for _, page := range annotation.Pages {
			fmt.Fprintf(w, "\tConfidence: %f, Width: %d, Height: %d\n", page.Confidence, page.Width, page.Height)
			fmt.Fprintln(w, "\tBlocks:")
			for _, block := range page.Blocks {
				fmt.Fprintf(w, "\t\tConfidence: %f, Block type: %v\n", block.Confidence, block.BlockType)
				fmt.Fprintln(w, "\t\tParagraphs:")
				for _, paragraph := range block.Paragraphs {
					fmt.Fprintf(w, "\t\t\tConfidence: %f", paragraph.Confidence)
					fmt.Fprintln(w, "\t\t\tWords:")
					for _, word := range paragraph.Words {
						symbols := make([]string, len(word.Symbols))
						for i, s := range word.Symbols {
							symbols[i] = s.Text
						}
						wordText := strings.Join(symbols, "")
						fmt.Fprintf(w, "\t\t\t\tConfidence: %f, Symbols: %s\n", word.Confidence, wordText)
					}
				}
			}
		}
	}

	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries. For more information, see the Vision API Java API reference documentation.

public static void detectDocumentText(String filePath, PrintStream out) throws Exception,
     IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));

  Image img = Image.newBuilder().setContent(imgBytes).build();
  Feature feat = Feature.newBuilder().setType(Type.DOCUMENT_TEXT_DETECTION).build();
  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
  requests.add(request);

  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();
    client.close();

    for (AnnotateImageResponse res : responses) {
      if (res.hasError()) {
        out.printf("Error: %s\n", res.getError().getMessage());
        return;
      }

      // For full list of available annotations, see http://g.co/cloud/vision/docs
      TextAnnotation annotation = res.getFullTextAnnotation();
      for (Page page: annotation.getPagesList()) {
        String pageText = "";
        for (Block block : page.getBlocksList()) {
          String blockText = "";
          for (Paragraph para : block.getParagraphsList()) {
            String paraText = "";
            for (Word word: para.getWordsList()) {
              String wordText = "";
              for (Symbol symbol: word.getSymbolsList()) {
                wordText = wordText + symbol.getText();
                out.format("Symbol text: %s (confidence: %f)\n", symbol.getText(),
                    symbol.getConfidence());
              }
              out.format("Word text: %s (confidence: %f)\n\n", wordText, word.getConfidence());
              paraText = String.format("%s %s", paraText, wordText);
            }
            // Output Example using Paragraph:
            out.println("\nParagraph: \n" + paraText);
            out.format("Paragraph Confidence: %f\n", para.getConfidence());
            blockText = blockText + paraText;
          }
          pageText = pageText + blockText;
        }
      }
      out.println("\nComplete annotation:");
      out.println(annotation.getText());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .


// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const fileName = 'Local image file, e.g. /path/to/image.png';

// Read a local image as a text document
const [result] = await client.documentTextDetection(fileName);
const fullTextAnnotation = result.fullTextAnnotation;
console.log(`Full text: ${fullTextAnnotation.text}`);
fullTextAnnotation.pages.forEach(page => {
  page.blocks.forEach(block => {
    console.log(`Block confidence: ${block.confidence}`);
    block.paragraphs.forEach(paragraph => {
      console.log(`Paragraph confidence: ${paragraph.confidence}`);
      paragraph.words.forEach(word => {
        const wordText = word.symbols.map(s => s.text).join('');
        console.log(`Word text: ${wordText}`);
        console.log(`Word confidence: ${word.confidence}`);
        word.symbols.forEach(symbol => {
          console.log(`Symbol text: ${symbol.text}`);
          console.log(`Symbol confidence: ${symbol.confidence}`);
        });
      });
    });
  });
});

PHP

Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .

namespace Google\Cloud\Samples\Vision;

use Google\Cloud\Vision\V1\ImageAnnotatorClient;

// $path = 'path/to/your/image.jpg'

function detect_document_text($path)
{
    $imageAnnotator = new ImageAnnotatorClient();

    # annotate the image
    $image = file_get_contents($path);
    $response = $imageAnnotator->documentTextDetection($image);
    $annotation = $response->getFullTextAnnotation();

    # print out detailed and structured information about document text
    if ($annotation) {
        foreach ($annotation->getPages() as $page) {
            foreach ($page->getBlocks() as $block) {
                $block_text = '';
                foreach ($block->getParagraphs() as $paragraph) {
                    foreach ($paragraph->getWords() as $word) {
                        foreach ($word->getSymbols() as $symbol) {
                            $block_text .= $symbol->getText();
                        }
                        $block_text .= ' ';
                    }
                    $block_text .= "\n";
                }
                printf('Block content: %s', $block_text);
                printf('Block confidence: %f' . PHP_EOL,
                    $block->getConfidence());

                # get bounds
                $vertices = $block->getBoundingBox()->getVertices();
                $bounds = [];
                foreach ($vertices as $vertex) {
                    $bounds[] = sprintf('(%d,%d)', $vertex->getX(),
                        $vertex->getY());
                }
                print('Bounds: ' . join(', ',$bounds) . PHP_EOL);
                print(PHP_EOL);
            }
        }
    } else {
        print('No text found' . PHP_EOL);
    }

    $imageAnnotator->close();
}

Python

Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .

def detect_document(path):
    """Detects document features in an image."""
    from google.cloud import vision
    import io
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

Ruby

Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .

# image_path = "Path to local image file, eg. './image.png'"

require "google/cloud/vision"

image_annotator = Google::Cloud::Vision::ImageAnnotator.new

response = image_annotator.document_text_detection image: image_path

text = ""
response.responses.each do |res|
  res.text_annotations.each do |annotation|
    text << annotation.description
  end
end

puts text

Detect document text in a remote image

For your convenience, the Vision API can perform feature detection directly on an image file located in Google Cloud Storage or on the Web without the need to send the contents of the image file in the body of your request.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • cloud-storage-image-uri: the path to a valid image file in a Google Cloud Storage bucket. You must at least have read priveleges to the file. Example:
    • gs://vision-api-handwriting-ocr-bucket/handwriting_image.png

HTTP method and URL:

POST https://vision.googleapis.com/v1/images:annotate

Request JSON body:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "cloud-storage-image-uri"
        }
       },
       "features": [
         {
           "type": "DOCUMENT_TEXT_DETECTION"
         }
       ]
    }
  ]
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://vision.googleapis.com/v1/images:annotate

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ Authorization = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://vision.googleapis.com/v1/images:annotate" | Select-Object -Expand Content

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.

C#

Before trying this sample, follow the C# setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API C# API reference documentation .

// Specify a Google Cloud Storage uri for the image
// or a publicly accessible HTTP or HTTPS uri.
var image = Image.FromUri(uri);
var client = ImageAnnotatorClient.Create();
var response = client.DetectDocumentText(image);
foreach (var page in response.Pages)
{
    foreach (var block in page.Blocks)
    {
        foreach (var paragraph in block.Paragraphs)
        {
            Console.WriteLine(string.Join("\n", paragraph.Words));
        }
    }
}

Go

Before trying this sample, follow the Go setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Go API reference documentation .


// detectDocumentText gets the full document text from the Vision API for an image at the given file path.
func detectDocumentTextURI(w io.Writer, file string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	image := vision.NewImageFromURI(file)
	annotation, err := client.DetectDocumentText(ctx, image, nil)
	if err != nil {
		return err
	}

	if annotation == nil {
		fmt.Fprintln(w, "No text found.")
	} else {
		fmt.Fprintln(w, "Document Text:")
		fmt.Fprintf(w, "%q\n", annotation.Text)

		fmt.Fprintln(w, "Pages:")
		for _, page := range annotation.Pages {
			fmt.Fprintf(w, "\tConfidence: %f, Width: %d, Height: %d\n", page.Confidence, page.Width, page.Height)
			fmt.Fprintln(w, "\tBlocks:")
			for _, block := range page.Blocks {
				fmt.Fprintf(w, "\t\tConfidence: %f, Block type: %v\n", block.Confidence, block.BlockType)
				fmt.Fprintln(w, "\t\tParagraphs:")
				for _, paragraph := range block.Paragraphs {
					fmt.Fprintf(w, "\t\t\tConfidence: %f", paragraph.Confidence)
					fmt.Fprintln(w, "\t\t\tWords:")
					for _, word := range paragraph.Words {
						symbols := make([]string, len(word.Symbols))
						for i, s := range word.Symbols {
							symbols[i] = s.Text
						}
						wordText := strings.Join(symbols, "")
						fmt.Fprintf(w, "\t\t\t\tConfidence: %f, Symbols: %s\n", word.Confidence, wordText)
					}
				}
			}
		}
	}

	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries. For more information, see the Vision API Java API reference documentation.

public static void detectDocumentTextGcs(String gcsPath, PrintStream out) throws Exception,
    IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ImageSource imgSource = ImageSource.newBuilder().setGcsImageUri(gcsPath).build();
  Image img = Image.newBuilder().setSource(imgSource).build();
  Feature feat = Feature.newBuilder().setType(Type.DOCUMENT_TEXT_DETECTION).build();
  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
  requests.add(request);

  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();
    client.close();

    for (AnnotateImageResponse res : responses) {
      if (res.hasError()) {
        out.printf("Error: %s\n", res.getError().getMessage());
        return;
      }
      // For full list of available annotations, see http://g.co/cloud/vision/docs
      TextAnnotation annotation = res.getFullTextAnnotation();
      for (Page page: annotation.getPagesList()) {
        String pageText = "";
        for (Block block : page.getBlocksList()) {
          String blockText = "";
          for (Paragraph para : block.getParagraphsList()) {
            String paraText = "";
            for (Word word: para.getWordsList()) {
              String wordText = "";
              for (Symbol symbol: word.getSymbolsList()) {
                wordText = wordText + symbol.getText();
                out.format("Symbol text: %s (confidence: %f)\n", symbol.getText(),
                    symbol.getConfidence());
              }
              out.format("Word text: %s (confidence: %f)\n\n", wordText, word.getConfidence());
              paraText = String.format("%s %s", paraText, wordText);
            }
            // Output Example using Paragraph:
            out.println("\nParagraph: \n" + paraText);
            out.format("Paragraph Confidence: %f\n", para.getConfidence());
            blockText = blockText + paraText;
          }
          pageText = pageText + blockText;
        }
      }
      out.println("\nComplete annotation:");
      out.println(annotation.getText());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Node.js API reference documentation .


// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const bucketName = 'Bucket where the file resides, e.g. my-bucket';
// const fileName = 'Path to file within bucket, e.g. path/to/image.png';

// Read a remote image as a text document
const [result] = await client.documentTextDetection(
  `gs://${bucketName}/${fileName}`
);
const fullTextAnnotation = result.fullTextAnnotation;
console.log(fullTextAnnotation.text);

PHP

Before trying this sample, follow the PHP setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API PHP API reference documentation .

namespace Google\Cloud\Samples\Vision;

use Google\Cloud\Vision\V1\ImageAnnotatorClient;

// $path = 'gs://path/to/your/image.jpg'

function detect_document_text_gcs($path)
{
    $imageAnnotator = new ImageAnnotatorClient();

    # annotate the image
    $response = $imageAnnotator->documentTextDetection($path);
    $annotation = $response->getFullTextAnnotation();

    # print out detailed and structured information about document text
    if ($annotation) {
        foreach ($annotation->getPages() as $page) {
            foreach ($page->getBlocks() as $block) {
                $block_text = '';
                foreach ($block->getParagraphs() as $paragraph) {
                    foreach ($paragraph->getWords() as $word) {
                        foreach ($word->getSymbols() as $symbol) {
                            $block_text .= $symbol->getText();
                        }
                        $block_text .= ' ';
                    }
                    $block_text .= "\n";
                }
                printf('Block content: %s', $block_text);
                printf('Block confidence: %f' . PHP_EOL,
                    $block->getConfidence());

                # get bounds
                $vertices = $block->getBoundingBox()->getVertices();
                $bounds = [];
                foreach ($vertices as $vertex) {
                    $bounds[] = sprintf('(%d,%d)', $vertex->getX(),
                        $vertex->getY());
                }
                print('Bounds: ' . join(', ',$bounds) . PHP_EOL);

                print(PHP_EOL);
            }
        }
    } else {
        print('No text found' . PHP_EOL);
    }

    $imageAnnotator->close();
}

Python

Before trying this sample, follow the Python setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Python API reference documentation .

def detect_document_uri(uri):
    """Detects document features in the file located in Google Cloud
    Storage."""
    from google.cloud import vision
    client = vision.ImageAnnotatorClient()
    image = vision.types.Image()
    image.source.image_uri = uri

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

Ruby

Before trying this sample, follow the Ruby setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Ruby API reference documentation .

# image_path = "Google Cloud Storage URI, eg. 'gs://my-bucket/image.png'"

require "google/cloud/vision"

image_annotator = Google::Cloud::Vision::ImageAnnotator.new

response = image_annotator.document_text_detection image: image_path

text = ""
response.responses.each do |res|
  res.text_annotations.each do |annotation|
    text << annotation.description
  end
end

puts text

GCLOUD COMMAND

To perform handwriting detection, use the gcloud ml vision detect-document command as shown in the following example:

gcloud ml vision detect-document gs://vision-api-handwriting-ocr-bucket/handwriting_image.png

Specify the language (optional)

Both types of OCR requests support one or more languageHints that specify the language of any text in the image. However, in most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting languageHints is not needed. In rare cases, when the language of the text in the image is known, setting a hint helps get better results (although it can be a significant hindrance if the hint is wrong). Text detection returns an error if one or more of the specified languages is not one of the supported languages.

If you choose to provide a language hint, modify the body of your request (request.json file) to provide the string of one of the supported languages in the imageContext.languageHints field as shown below:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "image-url"
        }
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ],
      "imageContext": {
        "languageHints": ["en-t-i0-handwrit"]
      }
    }
  ]
}
How do language hints work?

The languageHint format follows the BCP47 language code formatting guidelines. The BCP47 specified format is as follows:

language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse].

For example, the language hint "en-t-i0-handwrit" specifies English language (en), transform extension singleton (t), input method engine transform extension code (i0), and handwriting transform code (handwrit). This roughly says the language is "English transformed from handwriting." You do not need to specify a script code because Latn is implied by the "en" language.

Try it

Try text detection and document text detection below. You can use the image specified already (gs://vision-api-handwriting-ocr-bucket/handwriting_image.png) by clicking Execute, or you can specify your own image in its place.

handwriting image

Hai trovato utile questa pagina? Facci sapere cosa ne pensi:

Invia feedback per...

Cloud Vision API
Hai bisogno di assistenza? Visita la nostra pagina di assistenza.