The Vision API can detect any Vision API feature from PDF and TIFF files stored in Cloud Storage.
Feature detection from PDF and TIFF must be requested using the
files:asyncBatchAnnotate
function, which performs an offline (asynchronous)
request and provides its status using the operations
resources.
Output from a PDF/TIFF request is written to a JSON file created in the specified Cloud Storage bucket.
Limitations
The Vision API accepts PDF/TIFF files up to 2000 pages. Larger files will return an error.
Authentication
API keys are not supported for files:asyncBatchAnnotate
requests. See
Using a service account for
instructions on authenticating with a service account.
The account used for authentication must have access to the Cloud
Storage bucket that you specify for the output (roles/editor
or
roles/storage.objectCreator
or above).
You can use an API key to query the status of the operation; see Using an API key for instructions.
Feature detection requests
Currently PDF/TIFF document detection is only available for files stored in Cloud Storage buckets. Response JSON files are similarly saved to a Cloud Storage bucket.
Command-line
To perform PDF/TIFF document text detection, make a POST request and provide the appropriate request body:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ https://vision.googleapis.com/v1/files:asyncBatchAnnotate -d "{ 'requests':[ { 'inputConfig': { 'gcsSource': { 'uri': 'gs://your-source-bucket-name/folder/multi-page-file.pdf' }, 'mimeType': 'application/pdf' }, 'features': [ { 'type': 'DOCUMENT_TEXT_DETECTION' } ], 'outputConfig': { 'gcsDestination': { 'uri': 'gs://your-bucket-name/folder/' }, 'batchSize': 1 } } ] }"
Where:
inputConfig
- replaces theimage
field used in other Vision API requests. It contains two child fields:gcsSource.uri
- the Cloud Storage URI of the PDF or TIFF file (accessible to the user or service account making the request)mimeType
- one of the accepted file types:application/pdf
orimage/tiff
.
outputConfig
- specifies output details. It contains two child field:gcsDestination.uri
- a valid Cloud Storage URI. The bucket must be writeable by the user or service account making the request. The filename will beoutput-x-to-y
, wherex
andy
represent the PDF/TIFF page numbers included in that output file. If the file exists, its contents will be overwritten.batchSize
- specifies how many pages of output should be included in each output JSON file.
Response:
A successful asyncBatchAnnotate
request returns a response with a single name
field:
{ "name": "projects/usable-auth-library/operations/1efec2285bd442df" }
This name represents a long-running operation with an associated ID
(for example, 1efec2285bd442df
), which can be queried using the
v1.operations
API.
To retrieve your Vision annotation response, send a GET request to the
v1.operations
endpoint, passing the operation ID in the URL.
curl -X GET -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json" \ https://vision.googleapis.com/v1/operations/1efec2285bd442df
If the operation is in progress:
{ "name": "operations/1efec2285bd442df", "metadata": { "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata", "state": "RUNNING", "createTime": "2019-05-15T21:10:08.401917049Z", "updateTime": "2019-05-15T21:10:33.700763554Z" } }
Once the operation has completed, the state
shows as DONE
and your
results are written to the Cloud Storage file you specified:
{ "name": "operations/1efec2285bd442df", "metadata": { "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata", "state": "DONE", "createTime": "2019-05-15T20:56:30.622473785Z", "updateTime": "2019-05-15T20:56:41.666379749Z" }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.vision.v1.AsyncBatchAnnotateFilesResponse", "responses": [ { "outputConfig": { "gcsDestination": { "uri": "gs://your-bucket-name/folder/" }, "batchSize": 1 } } ] } }
The JSON in your output file is similar to that of an image's
document text detection request, with the addition of a
context
field showing the location of the PDF or TIFF that was specified and
the number of pages in the file:
output-1-to-1.json
Go
Before trying this sample, follow the Go setup instructions in the Vision quickstart using client libraries. For more information, see the Vision Go API reference documentation.
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vision quickstart using client libraries. For more information, see the Vision Java API reference documentation.
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vision quickstart using client libraries. For more information, see the Vision Node.js API reference documentation.
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
Before trying this sample, follow the Python setup instructions in the Vision quickstart using client libraries. For more information, see the Vision Python API reference documentation.
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
gcloud
The gcloud
command you use depend on the file type.
To perform PDF text detection, use the
gcloud ml vision detect-text-pdf
command as shown in the following example:gcloud ml vision detect-text-pdf gs://my_bucket/input_file gs://my_bucket/out_put_prefix
To perform TIFF text detection, use the
gcloud ml vision detect-text-tiff
command as shown in the following example:gcloud ml vision detect-text-tiff gs://my_bucket/input_file gs://my_bucket/out_put_prefix
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Vision reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Vision reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Vision reference documentation for Ruby.