Prueba Gemini 1.5 Pro, nuestro modelo multimodal más avanzado en Vertex AI, y descubre lo que puedes compilar con una ventana de contexto de un millón de tokens. Prueba Gemini 1.5 Pro, nuestro modelo multimodal más avanzado en Vertex AI, y descubre lo que puedes compilar con una ventana de contexto de un millón de tokens.

Detecta texto en archivos (PDF/TIFF)

La API de Vision puede detectar y transcribir texto de archivos PDF y TIFF almacenados en Cloud Storage.

La detección de texto en documentos PDF y TIFF se debe solicitar mediante la función files:asyncBatchAnnotate, que realiza una solicitud sin conexión (asíncrona) y proporciona su estado mediante los recursos operations.

La salida de una solicitud de PDF o TIFF se escribe en un archivo JSON que se crea en el bucket de Cloud Storage especificado.

Limitaciones

La API de Vision acepta archivos PDF o TIFF de hasta 2,000 páginas. Los archivos más grandes mostrarán un error.

Authentication

Las claves de API no son compatibles con las solicitudes files:asyncBatchAnnotate. Consulta Usa una cuenta de servicio si deseas obtener instrucciones sobre la autenticación con una cuenta de servicio.

La cuenta que se use en la autenticación debe tener acceso al bucket de Cloud Storage que especifiques para la salida (roles/editor, roles/storage.objectCreator o uno superior).

Puedes usar una clave de API a fin de consultar el estado de la operación. Consulta Usa una clave de API para obtener instrucciones.

Solicitudes de detección de texto en documentos

Por el momento, la detección en documentos PDF o TIFF solo está disponible para archivos almacenados en depósitos de Cloud Storage. Los archivos JSON de respuesta se guardan de manera similar en un bucket de Cloud Storage.

Página del PDF del censo estadounidense de 2010 — `gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf`, ***Fuente***: Oficina del Censo de los Estados Unidos.

Nota: Esta función muestra resultados con normalizedVertices [0,1] y no con valores de píxeles reales (vertices).

REST

Antes de usar cualquiera de los datos de solicitud a continuación, realiza los siguientes reemplazos:

CLOUD_STORAGE_BUCKET: Es un bucket o directorio de Cloud Storage para guardar archivos de salida, que se expresa de la siguiente manera:
- gs://bucket/directory/
El usuario que realice la solicitud debe tener permiso de escritura en el bucket.
CLOUD_STORAGE_FILE_URI: Es la ruta a un archivo válido (PDF/TIFF) en un bucket de Cloud Storage. Como mínimo, debes tener privilegios de lectura del archivo. Ejemplo:
- ```
gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf
```
FEATURE_TYPE: Es un tipo de función válido. Para las solicitudes files:asyncBatchAnnotate, puedes usar los siguientes tipos de funciones:
- DOCUMENT_TEXT_DETECTION
- TEXT_DETECTION
PROJECT_ID es el ID del proyecto de Google Cloud.

Consideraciones específicas del campo:

inputConfig: Reemplaza el campo image que se usó en otras solicitudes a la API de Vision. Contiene dos campos secundarios:
- gcsSource.uri: Es el URI de Google Cloud Storage del archivo PDF o TIFF (al que puede acceder el usuario o la cuenta de servicio que realiza la solicitud)
- mimeType: Es uno de los tipos de archivo aceptados, application/pdf o image/tiff.
outputConfig: Especifica los detalles de la salida. Contiene dos campos secundarios:
- gcsDestination.uri: Es un URI de Google Cloud Storage válido. El usuario o la cuenta de servicio que realiza la solicitud debe poder escribir el bucket. El nombre del archivo será output-x-to-y, en el que x y y representan los números de página del archivo PDF o TIFF que se incluyen en ese archivo de salida. Si existe un archivo, se sobrescribirán sus contenidos.
- batchSize: Especifica cuántas páginas de salida se deben incluir en cada archivo JSON de salida.

Método HTTP y URL:

POST https://vision.googleapis.com/v1/files:asyncBatchAnnotate

Cuerpo JSON de la solicitud:

{
  "requests":[
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "CLOUD_STORAGE_FILE_URI"
        },
        "mimeType": "application/pdf"
      },
      "features": [
        {
          "type": "FEATURE_TYPE"
        }
      ],
      "outputConfig": {
        "gcsDestination": {
          "uri": "CLOUD_STORAGE_BUCKET"
        },
        "batchSize": 1
      }
    }
  ]
}

Para enviar tu solicitud, elige una de estas opciones:

curl

Nota: Con el siguiente comando, se supone que accediste a la CLI de gcloud con tu cuenta de usuario a través de la ejecución de gcloud init o gcloud auth login, o a través del uso de Cloud Shell, que accede de forma automática a la CLI de gcloud. Para comprobar la cuenta activa actual, ejecuta gcloud auth list.

Guarda el cuerpo de la solicitud en un archivo llamado request.json y ejecuta el siguiente comando:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_ID" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vision.googleapis.com/v1/files:asyncBatchAnnotate"

PowerShell

Nota: El siguiente comando supone que accediste a la CLI de gcloud con tu cuenta de usuario mediante la ejecución de gcloud init o gcloud auth login. Para comprobar la cuenta activa actual, ejecuta gcloud auth list.

Guarda el cuerpo de la solicitud en un archivo llamado request.json y ejecuta el siguiente comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vision.googleapis.com/v1/files:asyncBatchAnnotate" | Select-Object -Expand Content

Respuesta:

Una solicitud asyncBatchAnnotate correcta muestra una respuesta con un solo campo de nombre:

{
  "name": "projects/usable-auth-library/operations/1efec2285bd442df"
}

Este nombre representa una operación de larga duración con un ID asociado (por ejemplo, 1efec2285bd442df), que se puede consultar mediante la API de v1.operations.

Para recuperar tu respuesta de anotación de Vision, envía una solicitud GET al extremo v1.operations y pasa el ID de operación en la URL:

GET https://vision.googleapis.com/v1/operations/operation-id

Por ejemplo:

curl -X GET -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
https://vision.googleapis.com/v1/projects/project-id/locations/location-id/operations/1efec2285bd442df

Si la operación se encuentra en curso, usa lo siguiente:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "RUNNING",
    "createTime": "2019-05-15T21:10:08.401917049Z",
    "updateTime": "2019-05-15T21:10:33.700763554Z"
  }
}

Una vez que la operación se completa, el state se muestra como DONE, y los resultados se escriben en el archivo de Google Cloud Storage que especificaste:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "DONE",
    "createTime": "2019-05-15T20:56:30.622473785Z",
    "updateTime": "2019-05-15T20:56:41.666379749Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.AsyncBatchAnnotateFilesResponse",
    "responses": [
      {
        "outputConfig": {
          "gcsDestination": {
            "uri": "gs://your-bucket-name/folder/"
          },
          "batchSize": 1
        }
      }
    ]
  }
}

El JSON en el archivo de salida es similar a la [solicitud de detección de texto del documento](/vision/docs/ocr) de una imagen y, además, tiene un campo context que muestra la ubicación del PDF o TIFF que se especificó y la cantidad de páginas en el archivo:

output-1-to-1.json

Archivo completo

    

{
 "inputConfig": {
 "gcsSource": {
 "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010 .pdf"
 }, "mimeType": "application/pdf"
 }, "respuestas": [
 {
"fullTextAnnotation": {
 "páginas ": [
 "
property": {
 "detectedLanguages": [
 {
 "languageCode": "en",
"confidence": 0.94
 } ], }, "width": 612, "height": 792, "blocks": [ { "boundingBox": {
 "normalizedVertices": [
 {
"x": 0.12;09749, "y": 0.147498
 }, ...
 {
 "x": 0.1229097, "y": 0.1199495
 }
 ]
 }, "párrafos": [
 {
 ...
 }, "palabras": [
 {
 ...
 }, "símbolos": [
 {
 ...
 "text": "C",
 " confianza": 0.99
 }, {
 "property": {
 "detectedLanguages: [
{
"languageCode": "en"
 }
 ]
 }, "text": "O",
"confidence": 0.99
 ,}
 ...
 }
 101 }. ], "text": "CONTENTS\n.\n1-1\nII-1\nIII-1\nLista de tablas estadísticas...
        \nCómo usar este informe del censo.\nGuía para la búsqueda de tablas .\nUsuario
        Notas .......\nTablas estadísticas.........\nApéndices
        \nA Términos y conceptos geográficos .........\nB Definiciones de
        Características características.\nProcedimientos de recopilación y procesamiento de datos...
        \nQuestionnaire. .........\nE Maps .................\nF Operational
Overview and accuracy of the Data......\nG Residence Regla y residencias en la situación de residencia del censo de 2010 de Estados Unidos...
        \nSe pueden ver los acentos con HEX } <www.census.gov\n/prod/cen2010/cph-1-a.pdf>.\nContents\n"
 },
      "context": {
 "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf", "pageNumber": 1
 }
 }
}

Go

Antes de probar este código de muestra, sigue las instrucciones de configuración para Go que se encuentran en la Guía de inicio rápido de Vision sobre cómo usar las bibliotecas cliente. Si quieres obtener más información, consulta la documentación de referencia de la API de Vision para Go.

Para autenticarte en Vision, configura las credenciales predeterminadas de la aplicación. Si deseas obtener más información, consulta Configura la autenticación para un entorno de desarrollo local.


// detectAsyncDocumentURI performs Optical Character Recognition (OCR) on a
// PDF file stored in GCS.
func detectAsyncDocumentURI(w io.Writer, gcsSourceURI, gcsDestinationURI string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	request := &visionpb.AsyncBatchAnnotateFilesRequest{
		Requests: []*visionpb.AsyncAnnotateFileRequest{
			{
				Features: []*visionpb.Feature{
					{
						Type: visionpb.Feature_DOCUMENT_TEXT_DETECTION,
					},
				},
				InputConfig: &visionpb.InputConfig{
					GcsSource: &visionpb.GcsSource{Uri: gcsSourceURI},
					// Supported MimeTypes are: "application/pdf" and "image/tiff".
					MimeType: "application/pdf",
				},
				OutputConfig: &visionpb.OutputConfig{
					GcsDestination: &visionpb.GcsDestination{Uri: gcsDestinationURI},
					// How many pages should be grouped into each json output file.
					BatchSize: 2,
				},
			},
		},
	}

	operation, err := client.AsyncBatchAnnotateFiles(ctx, request)
	if err != nil {
		return err
	}

	fmt.Fprintf(w, "Waiting for the operation to finish.")

	resp, err := operation.Wait(ctx)
	if err != nil {
		return err
	}

	fmt.Fprintf(w, "%v", resp)

	return nil
}

Java

Antes de probar este código de muestra, sigue las instrucciones de configuración para Java que se encuentran la Guía de inicio rápido de la API de Vision sobre cómo usar las bibliotecas cliente. Si quieres obtener más información, consulta la documentación de referencia de la API de Vision para Java.

/**
 * Performs document text OCR with PDF/TIFF as source files on Google Cloud Storage.
 *
 * @param gcsSourcePath The path to the remote file on Google Cloud Storage to detect document
 *     text on.
 * @param gcsDestinationPath The path to the remote file on Google Cloud Storage to store the
 *     results on.
 * @throws Exception on errors while closing the client.
 */
public static void detectDocumentsGcs(String gcsSourcePath, String gcsDestinationPath)
    throws Exception {

  // Initialize client that will be used to send requests. This client only needs to be created
  // once, and can be reused for multiple requests. After completing all of your requests, call
  // the "close" method on the client to safely clean up any remaining background resources.
  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    List<AsyncAnnotateFileRequest> requests = new ArrayList<>();

    // Set the GCS source path for the remote file.
    GcsSource gcsSource = GcsSource.newBuilder().setUri(gcsSourcePath).build();

    // Create the configuration with the specified MIME (Multipurpose Internet Mail Extensions)
    // types
    InputConfig inputConfig =
        InputConfig.newBuilder()
            .setMimeType(
                "application/pdf") // Supported MimeTypes: "application/pdf", "image/tiff"
            .setGcsSource(gcsSource)
            .build();

    // Set the GCS destination path for where to save the results.
    GcsDestination gcsDestination =
        GcsDestination.newBuilder().setUri(gcsDestinationPath).build();

    // Create the configuration for the System.output with the batch size.
    // The batch size sets how many pages should be grouped into each json System.output file.
    OutputConfig outputConfig =
        OutputConfig.newBuilder().setBatchSize(2).setGcsDestination(gcsDestination).build();

    // Select the Feature required by the vision API
    Feature feature = Feature.newBuilder().setType(Feature.Type.DOCUMENT_TEXT_DETECTION).build();

    // Build the OCR request
    AsyncAnnotateFileRequest request =
        AsyncAnnotateFileRequest.newBuilder()
            .addFeatures(feature)
            .setInputConfig(inputConfig)
            .setOutputConfig(outputConfig)
            .build();

    requests.add(request);

    // Perform the OCR request
    OperationFuture<AsyncBatchAnnotateFilesResponse, OperationMetadata> response =
        client.asyncBatchAnnotateFilesAsync(requests);

    System.out.println("Waiting for the operation to finish.");

    // Wait for the request to finish. (The result is not used, since the API saves the result to
    // the specified location on GCS.)
    List<AsyncAnnotateFileResponse> result =
        response.get(180, TimeUnit.SECONDS).getResponsesList();

    // Once the request has completed and the System.output has been
    // written to GCS, we can list all the System.output files.
    Storage storage = StorageOptions.getDefaultInstance().getService();

    // Get the destination location from the gcsDestinationPath
    Pattern pattern = Pattern.compile("gs://([^/]+)/(.+)");
    Matcher matcher = pattern.matcher(gcsDestinationPath);

    if (matcher.find()) {
      String bucketName = matcher.group(1);
      String prefix = matcher.group(2);

      // Get the list of objects with the given prefix from the GCS bucket
      Bucket bucket = storage.get(bucketName);
      com.google.api.gax.paging.Page<Blob> pageList = bucket.list(BlobListOption.prefix(prefix));

      Blob firstOutputFile = null;

      // List objects with the given prefix.
      System.out.println("Output files:");
      for (Blob blob : pageList.iterateAll()) {
        System.out.println(blob.getName());

        // Process the first System.output file from GCS.
        // Since we specified batch size = 2, the first response contains
        // the first two pages of the input file.
        if (firstOutputFile == null) {
          firstOutputFile = blob;
        }
      }

      // Get the contents of the file and convert the JSON contents to an AnnotateFileResponse
      // object. If the Blob is small read all its content in one request
      // (Note: the file is a .json file)
      // Storage guide: https://cloud.google.com/storage/docs/downloading-objects
      String jsonContents = new String(firstOutputFile.getContent());
      Builder builder = AnnotateFileResponse.newBuilder();
      JsonFormat.parser().merge(jsonContents, builder);

      // Build the AnnotateFileResponse object
      AnnotateFileResponse annotateFileResponse = builder.build();

      // Parse through the object to get the actual response for the first page of the input file.
      AnnotateImageResponse annotateImageResponse = annotateFileResponse.getResponses(0);

      // Here we print the full text from the first page.
      // The response contains more information:
      // annotation/pages/blocks/paragraphs/words/symbols
      // including confidence score and bounding boxes
      System.out.format("%nText: %s%n", annotateImageResponse.getFullTextAnnotation().getText());
    } else {
      System.out.println("No MATCH");
    }
  }
}

Node.js

Antes de probar este código de muestra, sigue las instrucciones de configuración para Node.js que se encuentran en la Guía de inicio rápido de Vision sobre cómo usar las bibliotecas cliente. Si quieres obtener más información, consulta la documentación de referencia de la API de Vision para Node.js.


// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision').v1;

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// Bucket where the file resides
// const bucketName = 'my-bucket';
// Path to PDF file within bucket
// const fileName = 'path/to/document.pdf';
// The folder to store the results
// const outputPrefix = 'results'

const gcsSourceUri = `gs://${bucketName}/${fileName}`;
const gcsDestinationUri = `gs://${bucketName}/${outputPrefix}/`;

const inputConfig = {
  // Supported mime_types are: 'application/pdf' and 'image/tiff'
  mimeType: 'application/pdf',
  gcsSource: {
    uri: gcsSourceUri,
  },
};
const outputConfig = {
  gcsDestination: {
    uri: gcsDestinationUri,
  },
};
const features = [{type: 'DOCUMENT_TEXT_DETECTION'}];
const request = {
  requests: [
    {
      inputConfig: inputConfig,
      features: features,
      outputConfig: outputConfig,
    },
  ],
};

const [operation] = await client.asyncBatchAnnotateFiles(request);
const [filesResponse] = await operation.promise();
const destinationUri =
  filesResponse.responses[0].outputConfig.gcsDestination.uri;
console.log('Json saved to: ' + destinationUri);

Python

Antes de probar este código de muestra, sigue las instrucciones de configuración para Python que se encuentran en la Guía de inicio rápido de Vision sobre cómo usar las bibliotecas cliente. Si quieres obtener más información, consulta la documentación de referencia de la API de Vision para Python.

def async_detect_document(gcs_source_uri, gcs_destination_uri):
    """OCR with PDF/TIFF as source files on GCS"""
    import json
    import re
    from google.cloud import vision
    from google.cloud import storage

    # Supported mime_types are: 'application/pdf' and 'image/tiff'
    mime_type = "application/pdf"

    # How many pages should be grouped into each json output file.
    batch_size = 2

    client = vision.ImageAnnotatorClient()

    feature = vision.Feature(type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)

    gcs_source = vision.GcsSource(uri=gcs_source_uri)
    input_config = vision.InputConfig(gcs_source=gcs_source, mime_type=mime_type)

    gcs_destination = vision.GcsDestination(uri=gcs_destination_uri)
    output_config = vision.OutputConfig(
        gcs_destination=gcs_destination, batch_size=batch_size
    )

    async_request = vision.AsyncAnnotateFileRequest(
        features=[feature], input_config=input_config, output_config=output_config
    )

    operation = client.async_batch_annotate_files(requests=[async_request])

    print("Waiting for the operation to finish.")
    operation.result(timeout=420)

    # Once the request has completed and the output has been
    # written to GCS, we can list all the output files.
    storage_client = storage.Client()

    match = re.match(r"gs://([^/]+)/(.+)", gcs_destination_uri)
    bucket_name = match.group(1)
    prefix = match.group(2)

    bucket = storage_client.get_bucket(bucket_name)

    # List objects with the given prefix, filtering out folders.
    blob_list = [
        blob
        for blob in list(bucket.list_blobs(prefix=prefix))
        if not blob.name.endswith("/")
    ]
    print("Output files:")
    for blob in blob_list:
        print(blob.name)

    # Process the first output file from GCS.
    # Since we specified batch_size=2, the first response contains
    # the first two pages of the input file.
    output = blob_list[0]

    json_string = output.download_as_bytes().decode("utf-8")
    response = json.loads(json_string)

    # The actual response for the first page of the input file.
    first_page_response = response["responses"][0]
    annotation = first_page_response["fullTextAnnotation"]

    # Here we print the full text from the first page.
    # The response contains more information:
    # annotation/pages/blocks/paragraphs/words/symbols
    # including confidence scores and bounding boxes
    print("Full text:\n")
    print(annotation["text"])

gcloud

El comando de gcloud que uses dependerá del tipo de archivo.

Si quieres realizar la detección de texto en PDF, usa el comando gcloud ml vision detect-text-pdf como se muestra en el siguiente ejemplo:
```
gcloud ml vision detect-text-pdf gs://my_bucket/input_file  gs://my_bucket/out_put_prefix
```
Si quieres realizar la detección de texto en TIFF, usa el comando gcloud ml vision detect-text-tiff como se muestra en el siguiente ejemplo:
```
gcloud ml vision detect-text-tiff gs://my_bucket/input_file  gs://my_bucket/out_put_prefix
```

Lenguajes adicionales

C#: sigue lasinstrucciones de configuración de C# en la página Bibliotecas cliente y, luego, visita la documentación de referencia de Vision para .NET.

PHP: sigue las instrucciones de configuración de PHP en la página Bibliotecas cliente y, luego, visita la documentación de referencia de Vision para PHP.

Ruby: sigue las instrucciones de configuración de Ruby en la página Bibliotecas cliente y, luego, visita la documentación de referencia de Vision para Ruby.

Compatibilidad multirregional

En este momento, esta función solo se aplica a la característica de OCR (tipos TEXT_DETECTION o DOCUMENT_TEXT_DETECTION).

Ahora puedes especificar el almacenamiento de datos a nivel de continente y el procesamiento de OCR. En este momento, se admiten las siguientes regiones:

us: Solo países de EE.UU.
eu: La Unión Europea

Ubicaciones

Cloud Vision te ofrece cierto control sobre dónde se almacenan y procesan los recursos de tu proyecto. En particular, puedes configurar Cloud Vision para almacenar y procesar los datos solo en la Unión Europea.

De forma predeterminada, Cloud Vision almacena y procesa recursos en una ubicación global, lo que significa que Cloud Vision no garantiza que tus recursos permanezcan dentro de una región o ubicación en particular. Si eliges la ubicación de la Unión Europea, Google almacenará los datos y solo se procesarán en esa ubicación. Tú y tus usuarios pueden acceder a los datos desde cualquier ubicación.

Configura la ubicación con la API

La API de Vision admite un extremo de API global (vision.googleapis.com) y también dos extremos basados en regiones: un extremo de la Unión Europea (eu-vision.googleapis.com) y un extremo de Estados Unidos (us-vision.googleapis.com). Usa estos extremos para el procesamiento específico de la región. Por ejemplo, para almacenar y procesar los datos solo en la Unión Europea, usa el URI eu-vision.googleapis.com en lugar de vision.googleapis.com para las llamadas a la API de REST:

https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/images:annotate
https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/images:asyncBatchAnnotate
https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/files:annotate
https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/files:asyncBatchAnnotate

Para almacenar y procesar tus datos solo en Estados Unidos, usa el extremo de EE. UU. (us-vision.googleapis.com) con los métodos anteriores.

Configura la ubicación con las bibliotecas cliente

Las bibliotecas cliente de la API de Vision acceden al extremo global de la API (vision.googleapis.com) de forma predeterminada. Para almacenar y procesar tus datos solo en la Unión Europea, debes establecer el extremo (eu-vision.googleapis.com) de manera explícita. En las siguientes muestras de código, se muestra cómo establecer esta configuración.

Nota: Esta función muestra resultados con normalizedVertices [0,1] y no con valores de píxeles reales (vertices).

REST

Antes de usar cualquiera de los datos de solicitud a continuación, realiza los siguientes reemplazos:

REGION_ID: Uno de los identificadores de ubicación regional válidos:
- us: Solo países de EE.UU.
- eu: La Unión Europea
CLOUD_STORAGE_IMAGE_URI: La ruta a un archivo de imagen válido en un depósito de Cloud Storage. Como mínimo, debes tener privilegios de lectura en el archivo. Ejemplo:
- ```
gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf
```
CLOUD_STORAGE_BUCKET: Es un bucket o directorio de Cloud Storage para guardar archivos de salida, que se expresa de la siguiente manera:
- gs://bucket/directory/
El usuario que realice la solicitud debe tener permiso de escritura en el bucket.
FEATURE_TYPE: Es un tipo de función válido. Para las solicitudes files:asyncBatchAnnotate, puedes usar los siguientes tipos de funciones:
- DOCUMENT_TEXT_DETECTION
- TEXT_DETECTION
PROJECT_ID es el ID del proyecto de Google Cloud.

Consideraciones específicas del campo:

inputConfig: Reemplaza el campo image que se usó en otras solicitudes a la API de Vision. Contiene dos campos secundarios:
- gcsSource.uri: Es el URI de Google Cloud Storage del archivo PDF o TIFF (al que puede acceder el usuario o la cuenta de servicio que realiza la solicitud)
- mimeType: Es uno de los tipos de archivo aceptados, application/pdf o image/tiff.
outputConfig: Especifica los detalles de la salida. Contiene dos campos secundarios:
- gcsDestination.uri: Es un URI de Google Cloud Storage válido. El usuario o la cuenta de servicio que realiza la solicitud debe poder escribir el bucket. El nombre del archivo será output-x-to-y, en el que x y y representan los números de página del archivo PDF o TIFF que se incluyen en ese archivo de salida. Si existe un archivo, se sobrescribirán sus contenidos.
- batchSize: Especifica cuántas páginas de salida se deben incluir en cada archivo JSON de salida.

Método HTTP y URL:

POST https://REGION_ID-vision.googleapis.com/v1/projects/PROJECT_ID/locations/REGION_ID/files:asyncBatchAnnotate

Cuerpo JSON de la solicitud:

{
  "requests":[
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "CLOUD_STORAGE_IMAGE_URI"
        },
        "mimeType": "application/pdf"
      },
      "features": [
        {
          "type": "FEATURE_TYPE"
        }
      ],
      "outputConfig": {
        "gcsDestination": {
          "uri": "CLOUD_STORAGE_BUCKET"
        },
        "batchSize": 1
      }
    }
  ]
}

Para enviar tu solicitud, elige una de estas opciones:

curl

Guarda el cuerpo de la solicitud en un archivo llamado request.json y ejecuta el siguiente comando:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_ID" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://REGION_ID-vision.googleapis.com/v1/projects/PROJECT_ID/locations/REGION_ID/files:asyncBatchAnnotate"

PowerShell

Guarda el cuerpo de la solicitud en un archivo llamado request.json y ejecuta el siguiente comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://REGION_ID-vision.googleapis.com/v1/projects/PROJECT_ID/locations/REGION_ID/files:asyncBatchAnnotate" | Select-Object -Expand Content

Respuesta:

Una solicitud asyncBatchAnnotate correcta muestra una respuesta con un solo campo de nombre:

{
  "name": "projects/usable-auth-library/operations/1efec2285bd442df"
}

Este nombre representa una operación de larga duración con un ID asociado (por ejemplo, 1efec2285bd442df), que se puede consultar mediante la API de v1.operations.

Para recuperar tu respuesta de anotación de Vision, envía una solicitud GET al extremo v1.operations y pasa el ID de operación en la URL:

GET https://vision.googleapis.com/v1/operations/operation-id

Por ejemplo:

curl -X GET -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
https://vision.googleapis.com/v1/projects/project-id/locations/location-id/operations/1efec2285bd442df

Si la operación se encuentra en curso, usa lo siguiente:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "RUNNING",
    "createTime": "2019-05-15T21:10:08.401917049Z",
    "updateTime": "2019-05-15T21:10:33.700763554Z"
  }
}

Una vez que la operación se completa, el state se muestra como DONE, y los resultados se escriben en el archivo de Google Cloud Storage que especificaste:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "DONE",
    "createTime": "2019-05-15T20:56:30.622473785Z",
    "updateTime": "2019-05-15T20:56:41.666379749Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.AsyncBatchAnnotateFilesResponse",
    "responses": [
      {
        "outputConfig": {
          "gcsDestination": {
            "uri": "gs://your-bucket-name/folder/"
          },
          "batchSize": 1
        }
      }
    ]
  }
}

El JSON en tu archivo de salida es similar al de la respuesta de detección de texto en documentos de una imagen si usaste la función DOCUMENT_TEXT_DETECTION, o al de la respuesta de detección de texto si usaste la función TEXT_DETECTION. El resultado tendrá un campo context adicional que muestra la ubicación del PDF o TIFF que se especificó y el número de páginas en el archivo:

output-1-to-1.json

Archivo completo

    

{
 "inputConfig": {
 "gcsSource": {
 "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010 .pdf"
 }, "mimeType": "application/pdf"
 }, "respuestas": [
 {
"fullTextAnnotation": {
 "páginas ": [
 "
property": {
 "detectedLanguages": [
 {
 "languageCode": "en",
"confidence": 0.94
 } ], }, "width": 612, "height": 792, "blocks": [ { "boundingBox": {
 "normalizedVertices": [
 {
"x": 0.12;09749, "y": 0.147498
 }, ...
 {
 "x": 0.1229097, "y": 0.1199495
 }
 ]
 }, "párrafos": [
 {
 ...
 }, "palabras": [
 {
 ...
 }, "símbolos": [
 {
 ...
 "text": "C",
 " confianza": 0.99
 }, {
 "property": {
 "detectedLanguages: [
{
"languageCode": "en"
 }
 ]
 }, "text": "O",
"confidence": 0.99
 ,}
 ...
 }
 101 }. ], "text": "CONTENTS\n.\n1-1\nII-1\nIII-1\nLista de tablas estadísticas...
        \nCómo usar este informe del censo.\nGuía para la búsqueda de tablas .\nUsuario
        Notas .......\nTablas estadísticas.........\nApéndices
        \nA Términos y conceptos geográficos .........\nB Definiciones de
        Características características.\nProcedimientos de recopilación y procesamiento de datos...
        \nQuestionnaire. .........\nE Maps .................\nF Operational
Overview and accuracy of the Data......\nG Residence Regla y residencias en la situación de residencia del censo de 2010 de Estados Unidos...
        \nSe pueden ver los acentos con HEX } <www.census.gov\n/prod/cen2010/cph-1-a.pdf>.\nContents\n"
 },
      "context": {
 "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf", "pageNumber": 1
 }
 }
}

Go

import (
	"context"
	"fmt"

	vision "cloud.google.com/go/vision/apiv1"
	"google.golang.org/api/option"
)

// setEndpoint changes your endpoint.
func setEndpoint(endpoint string) error {
	// endpoint := "eu-vision.googleapis.com:443"

	ctx := context.Background()
	client, err := vision.NewImageAnnotatorClient(ctx, option.WithEndpoint(endpoint))
	if err != nil {
		return fmt.Errorf("NewImageAnnotatorClient: %w", err)
	}
	defer client.Close()

	return nil
}

Java

ImageAnnotatorSettings settings =
    ImageAnnotatorSettings.newBuilder().setEndpoint("eu-vision.googleapis.com:443").build();

// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
ImageAnnotatorClient client = ImageAnnotatorClient.create(settings);

Node.js

// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');

async function setEndpoint() {
  // Specifies the location of the api endpoint
  const clientOptions = {apiEndpoint: 'eu-vision.googleapis.com'};

  // Creates a client
  const client = new vision.ImageAnnotatorClient(clientOptions);

  // Performs text detection on the image file
  const [result] = await client.textDetection('./resources/wakeupcat.jpg');
  const labels = result.textAnnotations;
  console.log('Text:');
  labels.forEach(label => console.log(label.description));
}
setEndpoint();

Python

from google.cloud import vision

client_options = {"api_endpoint": "eu-vision.googleapis.com"}

client = vision.ImageAnnotatorClient(client_options=client_options)

Pruébalo tú mismo

Si es la primera vez que usas Google Cloud, crea una cuenta para evaluar el rendimiento de API de Cloud Vision en situaciones reales. Los clientes nuevos también obtienen $300 en créditos gratuitos para ejecutar, probar y, además, implementar cargas de trabajo.

Prueba gratis la API de Cloud Vision