이 페이지는 Cloud Translation API를 통해 번역되었습니다.

Enterprise Document OCR

Document AI의 일부로 Enterprise Document OCR을 사용하여 다양한 문서에서 텍스트 및 레이아웃 정보를 감지하고 추출할 수 있습니다. 구성 가능한 기능을 사용하면 특정 문서 처리 요구사항에 맞게 시스템을 조정할 수 있습니다.

개요

알고리즘 또는 머신러닝을 기반으로 한 데이터 입력, 데이터 정확성 개선 및 확인과 같은 작업에 Enterprise Document OCR을 사용할 수 있습니다. Enterprise Document OCR을 사용하여 다음과 같은 작업을 처리할 수도 있습니다.

텍스트 디지털화: 검색, 규칙 기반 문서 처리 파이프라인 또는 맞춤 모델 생성을 위해 문서에서 텍스트 및 레이아웃 데이터를 추출합니다.
대규모 언어 모델 애플리케이션 사용: LLM의 문맥 이해 및 OCR의 텍스트 및 레이아웃 추출 기능을 사용하여 질문과 답변을 자동화합니다. 데이터에서 유용한 정보를 도출하고 워크플로를 간소화하세요.
보관처리: 종이 문서를 기계가 읽을 수 있는 텍스트로 디지털화하여 문서 접근성을 개선합니다.

사용 사례에 가장 적합한 OCR 선택

해결책	제품	설명	사용 사례
Document AI	Enterprise Document OCR	문서 사용 사례를 위한 전문 모델입니다. 고급 기능에는 이미지 품질 점수, 언어 힌트, 회전 수정 등이 포함됩니다.	문서에서 텍스트를 추출할 때 권장됩니다. 사용 사례로는 PDF, 이미지 형태의 스캔한 문서 또는 Microsoft DocX 파일이 있습니다.
Document AI	OCR 부가기능	특정 요구사항을 위한 프리미엄 기능 Enterprise Document OCR 버전 2.0 이상과만 호환됩니다.	수학 공식을 감지하고 인식하거나, 글꼴 스타일 정보를 수신하거나, 체크박스 추출을 사용 설정해야 합니다.
Cloud Vision API	텍스트 감지	Google Cloud 표준 OCR 모델을 기반으로 전 세계에서 사용할 수 있는 REST API입니다. 기본 할당량은 분당 1,800개 요청입니다.	지연 시간이 짧고 용량이 큰 일반 텍스트 추출 사용 사례
Cloud Vision	OCR Google Distributed Cloud (지원 중단됨)	GKE Enterprise를 사용하여 GKE 클러스터에 컨테이너로 배포할 수 있는 Google Cloud Marketplace 애플리케이션입니다.	데이터 상주 또는 규정 준수 요구사항을 충족하기 위해

감지 및 추출

Enterprise Document OCR은 PDF 및 이미지에서 블록, 단락, 선, 단어, 기호를 감지할 뿐만 아니라 문서를 기울기 보정하여 정확성을 높일 수 있습니다.

지원되는 레이아웃 감지 및 추출 속성:

인쇄된 텍스트	필기	단락	차단	선	Word	기호 수준	페이지 번호
기본값	기본값	기본값	기본값	기본값	기본값	구성 가능	기본값

구성 가능한 Enterprise Document OCR 기능에는 다음이 포함됩니다.

디지털 PDF에서 삽입된 텍스트 또는 기본 텍스트 추출: 이 기능은 회전된 텍스트, 극단적인 글꼴 크기 또는 스타일, 부분적으로 숨겨진 텍스트의 경우에도 소스 문서에 표시된 그대로 텍스트와 기호를 추출합니다.
회전 수정: Enterprise Document OCR을 사용하여 문서 이미지를 사전 처리하여 추출 품질이나 처리에 영향을 줄 수 있는 회전 문제를 수정합니다.
이미지 품질 점수: 문서 전달에 도움이 되는 품질 측정항목을 수신합니다. 이미지 품질 점수는 흐릿함, 평소보다 작은 글꼴, 빛 반사 등 8가지 측정기준에 따른 페이지 수준의 품질 측정항목을 제공합니다.
페이지 범위 지정: OCR을 위한 입력 문서의 페이지 범위를 지정합니다. 이렇게 하면 불필요한 페이지에 대한 지출과 처리 시간이 절약됩니다.
언어 감지: 추출된 텍스트에 사용된 언어를 감지합니다.
언어 및 필기 힌트: 데이터 세트의 알려진 특성을 기반으로 OCR 모델에 언어 또는 필기 힌트를 제공하여 정확성을 개선합니다.

OCR 구성을 사용 설정하는 방법은 OCR 구성 사용 설정을 참고하세요.

OCR 부가기능

Enterprise Document OCR은 필요에 따라 개별 처리 요청에서 사용 설정할 수 있는 분석 기능(선택사항)을 제공합니다.

다음 부가기능은 안정화 버전 pretrained-ocr-v2.0-2023-06-02 및 pretrained-ocr-v2.1-2024-08-07 버전, 출시 후보 pretrained-ocr-v2.1.1-2025-01-31 버전에서 사용할 수 있습니다.

수학 OCR: LaTeX 형식의 문서에서 수학 공식을 식별하고 추출합니다.
체크박스 추출: 체크박스를 감지하고 Enterprise Document OCR 응답에서 상태(선택됨/선택 해제됨)를 추출합니다.
글꼴 스타일 감지: 글꼴 유형, 글꼴 스타일, 필기, 두께, 색상 등 단어 수준의 글꼴 속성을 식별합니다.

나열된 부가기능을 사용 설정하는 방법은 OCR 부가기능 사용 설정을 참고하세요.

지원되는 파일 형식

Enterprise Document OCR은 PDF, GIF, TIFF, JPEG, PNG, BMP, WebP 파일 형식을 지원합니다. 자세한 내용은 지원되는 파일을 참고하세요.

Enterprise Document OCR은 동기화 시 최대 15페이지, 비동기 시 최대 30페이지의 DocX 파일도 지원합니다. DocX 지원은 비공개 프리뷰 버전입니다. 액세스를 요청하려면 DocX 지원 요청 양식을 제출하세요 .

고급 버전 관리

고급 버전 관리는 미리보기 상태입니다. 기본 AI/ML OCR 모델을 업그레이드하면 OCR 동작이 변경될 수 있습니다. 엄격한 일관성이 필요한 경우 동결된 모델 버전을 사용하여 최대 18개월 동안 기존 OCR 모델에 동작을 고정합니다. 이렇게 하면 OCR 함수 결과에 동일한 이미지가 표시됩니다. 프로세서 버전에 관한 표를 참고하세요.

프로세서 버전

다음 프로세서 버전은 이 기능과 호환됩니다. 자세한 내용은 프로세서 버전 관리를 참고하세요.

버전 ID	출시 채널	설명
`pretrained-ocr-v1.0-2020-09-23`	정식	사용하지 않는 것이 좋으며 2025년 4월 30일부터 미국 (US) 및 유럽연합 (EU)에서 지원 중단됩니다.
`pretrained-ocr-v1.1-2022-09-12`	정식	사용하지 않는 것이 좋으며 2025년 4월 30일부터 미국 (US) 및 유럽연합 (EU)에서 지원 중단됩니다.
`pretrained-ocr-v1.2-2022-11-10`	정식	v1.0의 동결된 모델 버전: 버전 스냅샷의 모델 파일, 구성, 바이너리가 컨테이너 이미지에 최대 18개월 동안 동결됩니다.
`pretrained-ocr-v2.0-2023-06-02`	정식	문서 사용 사례에 특화된 프로덕션 준비 모델입니다. 모든 OCR 부가기능에 대한 액세스 권한이 포함됩니다.
`pretrained-ocr-v2.1-2024-08-07`	정식	v2.1의 주요 개선사항은 인쇄된 텍스트 인식 개선, 체크박스 감지 정확성 개선, 읽기 순서 정확성 개선입니다.
`pretrained-ocr-v2.1.1-2025-01-31`	출시 후보	v2.1.1은 V2.1과 유사하며 `US`, `EU`, `asia-southeast1`를 제외한 모든 리전에서 사용할 수 있습니다.

Enterprise Document OCR을 사용하여 문서 처리

이 빠른 시작에서는 Enterprise Document OCR을 소개합니다. 사용 가능한 OCR 구성을 사용 설정 또는 사용 중지하여 워크플로에 맞게 문서 OCR 결과를 최적화하는 방법을 보여줍니다.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Document AI API.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Document AI API.

Enable the API

Enterprise Document OCR 프로세서 만들기

먼저 Enterprise Document OCR 프로세서를 만듭니다. 자세한 내용은 프로세서 만들기 및 관리를 참고하세요.

OCR 구성

ProcessDocumentRequest 또는 BatchProcessDocumentsRequest의 ProcessOptions.ocrConfig에서 각 필드를 설정하여 모든 OCR 구성을 사용 설정할 수 있습니다.

자세한 내용은 처리 요청 보내기를 참고하세요.

이미지 품질 분석

지능형 문서 품질 분석은 머신러닝을 사용하여 콘텐츠의 가독성을 기반으로 문서의 품질을 평가합니다. 이 품질 평가는 품질평가점수 [0, 1]로 반환되며 1은 완벽한 품질을 의미합니다. 감지된 품질평가점수가 0.5 미만인 경우 부정적인 품질 사유 목록 (가능성 기준으로 정렬)이 함께 반환됩니다. 0.5보다 큰 가능성은 양성 감지로 간주됩니다.

문서에 결함이 있는 것으로 간주되면 API는 다음과 같은 8가지 문서 결함 유형을 반환합니다.

quality/defect_blurry
quality/defect_noisy
quality/defect_dark
quality/defect_faint
quality/defect_text_too_small
quality/defect_document_cutoff
quality/defect_text_cutoff
quality/defect_glare

현재 문서 품질 분석에는 몇 가지 제한사항이 있습니다.

결함이 없는 디지털 문서에서 거짓양성 감지를 반환할 수 있습니다. 이 기능은 스캔된 문서나 사진에 가장 적합합니다.
빛 반사 결함은 국소적입니다. 이러한 이미지는 전체 문서의 가독성을 방해하지 않을 수 있습니다.

입력

처리 요청에서 ProcessOptions.ocrConfig.enableImageQualityScores를 true로 설정하여 사용 설정합니다. 이 추가 기능은 프로세스 호출에 OCR 처리와 비슷한 지연 시간을 추가합니다.

  {
    "rawDocument": {
      "mimeType": "MIME_TYPE",
      "content": "IMAGE_CONTENT"
    },
    "processOptions": {
      "ocrConfig": {
        "enableImageQualityScores": true
      }
    }
  }

출력

결함 감지 결과는 Document.pages[].imageQualityScores[]에 표시됩니다.

  {
    "pages": [
      {
        "imageQualityScores": {
          "qualityScore": 0.7811847,
          "detectedDefects": [
            {
              "type": "quality/defect_document_cutoff",
              "confidence": 1.0
            },
            {
              "type": "quality/defect_glare",
              "confidence": 0.97849524
            },
            {
              "type": "quality/defect_text_cutoff",
              "confidence": 0.5
            }
          ]
        }
      }
    ]
  }

전체 출력 예시는 샘플 프로세서 출력을 참고하세요.

언어 힌트

OCR 프로세서는 OCR 엔진 성능을 개선하기 위해 정의한 언어 힌트를 지원합니다. 언어 힌트를 적용하면 OCR이 추론된 언어 대신 선택한 언어에 맞게 최적화할 수 있습니다.

입력

BCP-47 언어 코드 목록으로 ProcessOptions.ocrConfig.hints[].languageHints[]를 설정하여 사용 설정합니다.

  {
    "rawDocument": {
      "mimeType": "MIME_TYPE",
      "content": "IMAGE_CONTENT"
    },
    "processOptions": {
      "ocrConfig": {
        "hints": {
          "languageHints": ["en", "es"]
        }
      }
    }
  }

전체 출력 예시는 샘플 프로세서 출력을 참고하세요.

기호 감지

문서 응답에서 기호 (또는 개별 문자) 수준으로 데이터를 채웁니다.

입력

처리 요청에서 ProcessOptions.ocrConfig.enableSymbol를 true로 설정하여 사용 설정합니다.

  {
    "rawDocument": {
      "mimeType": "MIME_TYPE",
      "content": "IMAGE_CONTENT"
    },
    "processOptions": {
      "ocrConfig": {
        "enableSymbol": true
      }
    }
  }

출력

이 기능을 사용 설정하면 Document.pages[].symbols[] 필드가 채워집니다.

전체 출력 예시는 샘플 프로세서 출력을 참고하세요.

기본 제공 PDF 파싱

디지털 PDF 파일에서 삽입된 텍스트를 추출합니다. 사용 설정하면 디지털 텍스트가 있는 경우 내장된 디지털 PDF 모델이 자동으로 사용됩니다. 디지털이 아닌 텍스트가 있는 경우 광학 OCR 모델이 자동으로 사용됩니다. 사용자는 두 텍스트 결과가 병합된 결과를 수신합니다.

입력

처리 요청에서 ProcessOptions.ocrConfig.enableNativePdfParsing를 true로 설정하여 사용 설정합니다.

  {
    "rawDocument": {
      "mimeType": "MIME_TYPE",
      "content": "IMAGE_CONTENT"
    },
    "processOptions": {
      "ocrConfig": {
        "enableNativePdfParsing": true
      }
    }
  }

Character-in-the-box 감지

기본적으로 Enterprise Document OCR에는 상자 내에 있는 문자의 텍스트 추출 품질을 개선하기 위한 감지기가 사용 설정되어 있습니다. 예를 들면 다음과 같습니다.

enterprise-document-ocr-1

상자 내부의 문자에 OCR 품질 문제가 있는 경우 이 기능을 사용 중지할 수 있습니다.

입력

처리 요청에서 ProcessOptions.ocrConfig.disableCharacterBoxesDetection를 true로 설정하여 사용 중지합니다.

  {
    "rawDocument": {
      "mimeType": "MIME_TYPE",
      "content": "IMAGE_CONTENT"
    },
    "processOptions": {
      "ocrConfig": {
        "disableCharacterBoxesDetection": true
      }
    }
  }

기존 레이아웃

휴리스틱 레이아웃 감지 알고리즘이 필요한 경우 기존 레이아웃을 사용 설정할 수 있습니다. 기존 레이아웃은 현재의 ML 기반 레이아웃 감지 알고리즘의 대안 역할을 합니다. 권장되는 구성은 아닙니다. 고객은 문서 워크플로에 따라 가장 적합한 레이아웃 알고리즘을 선택할 수 있습니다.

입력

처리 요청에서 ProcessOptions.ocrConfig.advancedOcrOptions를 ["legacy_layout"]로 설정하여 사용 설정합니다.

  {
    "rawDocument": {
      "mimeType": "MIME_TYPE",
      "content": "IMAGE_CONTENT"
    },
    "processOptions": {
      "ocrConfig": {
          "advancedOcrOptions": ["legacy_layout"]
      }
    }
  }

페이지 범위 지정

기본적으로 OCR은 문서의 모든 페이지에서 텍스트 및 레이아웃 정보를 추출합니다. 특정 페이지 번호 또는 페이지 범위를 선택하고 해당 페이지에서만 텍스트를 추출할 수 있습니다.

ProcessOptions에서 이를 구성하는 방법은 세 가지가 있습니다.

두 번째 페이지와 다섯 번째 페이지만 처리하려면 다음 단계를 따르세요.

  {
    "individualPageSelector": {"pages": [2, 5]}
  }

처음 세 페이지만 처리하려면 다음 단계를 따르세요.

  {
    "fromStart": 3
  }

마지막 4페이지만 처리하려면 다음 단계를 따르세요.

  {
    "fromEnd": 4
  }

응답에서 각 Document.pages[].pageNumber는 요청에 지정된 동일한 페이지에 해당합니다.

OCR 부가기능 사용

이러한 엔터프라이즈 문서 OCR 선택적 분석 기능은 필요에 따라 개별 처리 요청에서 사용 설정할 수 있습니다.

수학 OCR

수학 OCR은 경계 상자 좌표와 함께 LaTeX로 표시된 수학 방정식과 같은 수학 공식을 감지, 인식, 추출합니다.

다음은 LaTeX 표현의 예입니다.

이미지 감지됨
LaTeX로 변환

입력

처리 요청에서 ProcessOptions.ocrConfig.premiumFeatures.enableMathOcr를 true로 설정하여 사용 설정합니다.

  {
    "rawDocument": {
      "mimeType": "MIME_TYPE",
      "content": "IMAGE_CONTENT"
    },
    "processOptions": {
      "ocrConfig": {
          "premiumFeatures": {
            "enableMathOcr": true
          }
      }
    }
  }

출력

수학 OCR 출력이 "type": "math_formula"과 함께 Document.pages[].visualElements[]에 표시됩니다.

"visualElements": [
  {
    "layout": {
      "textAnchor": {
        "textSegments": [
          {
            "endIndex": "46"
          }
        ]
      },
      "confidence": 1,
      "boundingPoly": {
        "normalizedVertices": [
          {
            "x": 0.14662756,
            "y": 0.27891156
          },
          {
            "x": 0.9032258,
            "y": 0.27891156
          },
          {
            "x": 0.9032258,
            "y": 0.8027211
          },
          {
            "x": 0.14662756,
            "y": 0.8027211
          }
        ]
      },
      "orientation": "PAGE_UP"
    },
    "type": "math_formula"
  }
]

이 링크에서 전체 Document JSON 출력을 확인할 수 있습니다 .

선택 표시 추출

사용 설정하면 모델은 문서의 모든 체크박스와 라디오 버튼을 경계 상자 좌표와 함께 추출하려고 시도합니다.

입력

처리 요청에서 ProcessOptions.ocrConfig.premiumFeatures.enableSelectionMarkDetection를 true로 설정하여 사용 설정합니다.

  {
    "rawDocument": {
      "mimeType": "MIME_TYPE",
      "content": "IMAGE_CONTENT"
    },
    "processOptions": {
      "ocrConfig": {
          "premiumFeatures": {
            "enableSelectionMarkDetection": true
          }
      }
    }
  }

출력

체크박스 출력은 "type": "unfilled_checkbox" 또는 "type": "filled_checkbox"와 함께 Document.pages[].visualElements[]에 표시됩니다.

"visualElements": [
  {
    "layout": {
      "confidence": 0.89363575,
      "boundingPoly": {
        "vertices": [
          {
            "x": 11,
            "y": 24
          },
          {
            "x": 37,
            "y": 24
          },
          {
            "x": 37,
            "y": 56
          },
          {
            "x": 11,
            "y": 56
          }
        ],
        "normalizedVertices": [
          {
            "x": 0.017488075,
            "y": 0.38709676
          },
          {
            "x": 0.05882353,
            "y": 0.38709676
          },
          {
            "x": 0.05882353,
            "y": 0.9032258
          },
          {
            "x": 0.017488075,
            "y": 0.9032258
          }
        ]
      }
    },
    "type": "unfilled_checkbox"
  },
  {
    "layout": {
      "confidence": 0.9148201,
      "boundingPoly": ...
    },
    "type": "filled_checkbox"
  }
],

이 링크에서 전체 Document JSON 출력을 확인할 수 있습니다 .

글꼴 스타일 감지

서체 스타일 감지를 사용 설정하면 Enterprise Document OCR에서 서체 속성을 추출하여 후처리를 개선하는 데 사용할 수 있습니다.

토큰 (단어) 수준에서는 다음 속성이 감지됩니다.

필기 입력 감지
글꼴 스타일
글꼴 크기
글꼴 유형
글꼴 색상
글꼴 두께
문자 간격
굵게
기울임꼴
밑줄 표시
텍스트 색상 (RGBa)
배경 색상 (RGBa)

입력

처리 요청에서 ProcessOptions.ocrConfig.premiumFeatures.computeStyleInfo를 true로 설정하여 사용 설정합니다.

  {
    "rawDocument": {
      "mimeType": "MIME_TYPE",
      "content": "IMAGE_CONTENT"
    },
    "processOptions": {
      "ocrConfig": {
          "premiumFeatures": {
            "computeStyleInfo": true
          }
      }
    }
  }

출력

서체 스타일 출력은 Document.pages[].tokens[].styleInfo에 StyleInfo 유형으로 표시됩니다.

"tokens": [
  {
    "styleInfo": {
      "fontSize": 3,
      "pixelFontSize": 13,
      "fontType": "SANS_SERIF",
      "bold": true,
      "fontWeight": 564,
      "textColor": {
        "red": 0.16862746,
        "green": 0.16862746,
        "blue": 0.16862746
      },
      "backgroundColor": {
        "red": 0.98039216,
        "green": 0.9882353,
        "blue": 0.99215686
      }
    }
  },
  ...
]

이 링크에서 전체 Document JSON 출력을 확인할 수 있습니다 .

문서 객체를 Vision AI API 형식으로 변환

Document AI 도구 상자에는 Document AI API Document 형식을 Vision AI AnnotateFileResponse 형식으로 변환하는 도구가 포함되어 있어 사용자가 문서 OCR 프로세서와 Vision AI API 간의 응답을 비교할 수 있습니다. 다음은 샘플 코드입니다.

Vision AI API 응답과 Document AI API 응답 및 변환기 간에 알려진 불일치:

Vision AI API 응답은 이미지 요청의 경우 vertices만 채우고 PDF 요청의 경우 normalized_vertices만 채웁니다. Document AI 응답과 변환기는 vertices와 normalized_vertices를 모두 채웁니다.
Vision AI API 응답은 단어의 마지막 기호에 detected_break를 채웁니다. Document AI API 응답 및 변환기는 단어와 단어의 마지막 기호에 detected_break를 채웁니다.
Vision AI API 응답은 항상 기호 필드를 채웁니다. 기본적으로 Document AI 응답은 기호 필드를 채우지 않습니다. Document AI 응답과 변환기에서 기호 필드가 채워지도록 하려면 enable_symbol 기능을 자세히 설정하세요.

코드 샘플

다음 코드 샘플은 OCR 구성 및 부가기능을 사용 설정하는 처리 요청을 전송한 다음 필드를 읽고 터미널에 출력하는 방법을 보여줍니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION: 프로세서의 위치입니다(예:
- us - 미국
- eu - 유럽 연합
PROJECT_ID: Google Cloud 프로젝트 ID입니다.
PROCESSOR_ID: 맞춤 프로세서의 ID입니다.
PROCESSOR_VERSION: 프로세서 버전 식별자입니다. 자세한 내용은 프로세서 버전 선택을 참고하세요. 예를 들면 다음과 같습니다.
- pretrained-TYPE-vX.X-YYYY-MM-DD
- stable
- rc
skipHumanReview: 사람의 검토를 사용 중지하는 불리언입니다. 인간 참여형 (Human-In-The-Loop) 프로세서에서만 지원됩니다.
- true - 사람의 검토를 건너뜁니다.
- false - 수동 검토를 사용 설정합니다 (기본값).
MIME_TYPE^†: 유효한 MIME 유형 옵션 중 하나입니다.
IMAGE_CONTENT^†: 유효한 인라인 문서 콘텐츠 중 하나로, 바이트 스트림으로 표시됩니다. JSON 표현의 경우 바이너리 이미지 데이터의 base64 인코딩 (ASCII 문자열)입니다. 이 문자열은 다음 문자열과 유사하게 표시됩니다.
- /9j/4QAYRXhpZgAA...9tAVx/zDQDlGxn//2Q==
자세한 내용은 Base64 인코딩 주제를 참고하세요.
FIELD_MASK: Document 출력에 포함할 필드를 지정합니다. 정규화된 필드 이름을 쉼표로 구분한 FieldMask 형식의 목록입니다.
- 예: text,entities,pages.pageNumber
OCR 구성
- ENABLE_NATIVE_PDF_PARSING: (불리언) PDF에서 임베딩된 텍스트를 추출합니다(있는 경우).
- ENABLE_IMAGE_QUALITY_SCORES: (불리언) 지능형 문서 품질 점수를 사용 설정합니다.
- ENABLE_SYMBOL: (불리언) 기호 (문자) OCR 정보를 포함합니다.
- DISABLE_CHARACTER_BOXES_DETECTION: (불리언) OCR 엔진에서 문자 상자 감지기를 사용 중지합니다.
- LANGUAGE_HINTS: OCR에 사용할 BCP-47 언어 코드 목록입니다.
- ADVANCED_OCR_OPTIONS: OCR 동작을 추가로 미세 조정하는 고급 OCR 옵션 목록입니다. 현재 유효한 값은 다음과 같습니다.
  - legacy_layout: 현재의 ML 기반 레이아웃 감지 알고리즘의 대안으로 사용되는 휴리스틱 레이아웃 감지 알고리즘입니다.
프리미엄 OCR 부가기능
- ENABLE_SELECTION_MARK_DETECTION: (불리언) OCR 엔진에서 선택 표시 인식기를 사용 설정합니다.
- COMPUTE_STYLE_INFO (불리언) 글꼴 식별 모델을 사용 설정하고 글꼴 스타일 정보를 반환합니다.
- ENABLE_MATH_OCR: (불리언) LaTeX 수학 공식을 추출할 수 있는 모델을 사용 설정합니다.
INDIVIDUAL_PAGES: 처리할 개별 페이지 목록입니다.
- 또는 필드 fromStart 또는 fromEnd를 제공하여 문서의 시작 부분 또는 끝 부분에서 특정 수의 페이지를 처리할 수 있습니다.

† 이 콘텐츠는 inlineDocument 객체에서 base64로 인코딩된 콘텐츠를 사용하여 지정할 수도 있습니다.

HTTP 메서드 및 URL:

POST https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/processors/PROCESSOR_ID/processorVersions/PROCESSOR_VERSION:process

JSON 요청 본문:

{
  "skipHumanReview": skipHumanReview,
  "rawDocument": {
    "mimeType": "MIME_TYPE",
    "content": "IMAGE_CONTENT"
  },
  "fieldMask": "FIELD_MASK",
  "processOptions": {
    "ocrConfig": {
      "enableNativePdfParsing": ENABLE_NATIVE_PDF_PARSING,
      "enableImageQualityScores": ENABLE_IMAGE_QUALITY_SCORES,
      "enableSymbol": ENABLE_SYMBOL,
      "disableCharacterBoxesDetection": DISABLE_CHARACTER_BOXES_DETECTION,
      "hints": {
        "languageHints": [
          "LANGUAGE_HINTS"
        ]
      },
      "advancedOcrOptions": ["ADVANCED_OCR_OPTIONS"],
      "premiumFeatures": {
        "enableSelectionMarkDetection": ENABLE_SELECTION_MARK_DETECTION,
        "computeStyleInfo": COMPUTE_STYLE_INFO,
        "enableMathOcr": ENABLE_MATH_OCR,
      }
    },
    "individualPageSelector" {
      "pages": [INDIVIDUAL_PAGES]
    }
  }
}

요청을 보내려면 다음 옵션 중 하나를 선택합니다.

curl

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/processors/PROCESSOR_ID/processorVersions/PROCESSOR_VERSION:process"

PowerShell

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/processors/PROCESSOR_ID/processorVersions/PROCESSOR_VERSION:process" | Select-Object -Expand Content

요청이 성공하면 서버가 200 OK HTTP 상태 코드와 응답을 JSON 형식으로 반환합니다. 응답 본문에는 Document 인스턴스가 포함됩니다.

Python

자세한 내용은 Document AI Python API 참조 문서를 참고하세요.

Document AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.


from typing import Optional, Sequence

from google.api_core.client_options import ClientOptions
from google.cloud import documentai

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION" # Format is "us" or "eu"
# processor_id = "YOUR_PROCESSOR_ID" # Create processor before running sample
# processor_version = "rc" # Refer to https://cloud.google.com/document-ai/docs/manage-processor-versions for more information
# file_path = "/path/to/local/pdf"
# mime_type = "application/pdf" # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types


def process_document_ocr_sample(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
) -> None:
    # Optional: Additional configurations for Document OCR Processor.
    # For more information: https://cloud.google.com/document-ai/docs/enterprise-document-ocr
    process_options = documentai.ProcessOptions(
        ocr_config=documentai.OcrConfig(
            enable_native_pdf_parsing=True,
            enable_image_quality_scores=True,
            enable_symbol=True,
            # OCR Add Ons https://cloud.google.com/document-ai/docs/ocr-add-ons
            premium_features=documentai.OcrConfig.PremiumFeatures(
                compute_style_info=True,
                enable_math_ocr=False,  # Enable to use Math OCR Model
                enable_selection_mark_detection=True,
            ),
        )
    )
    # Online processing request to Document AI
    document = process_document(
        project_id,
        location,
        processor_id,
        processor_version,
        file_path,
        mime_type,
        process_options=process_options,
    )

    text = document.text
    print(f"Full document text: {text}\n")
    print(f"There are {len(document.pages)} page(s) in this document.\n")

    for page in document.pages:
        print(f"Page {page.page_number}:")
        print_page_dimensions(page.dimension)
        print_detected_languages(page.detected_languages)

        print_blocks(page.blocks, text)
        print_paragraphs(page.paragraphs, text)
        print_lines(page.lines, text)
        print_tokens(page.tokens, text)

        if page.symbols:
            print_symbols(page.symbols, text)

        if page.image_quality_scores:
            print_image_quality_scores(page.image_quality_scores)

        if page.visual_elements:
            print_visual_elements(page.visual_elements, text)


def print_page_dimensions(dimension: documentai.Document.Page.Dimension) -> None:
    print(f"    Width: {str(dimension.width)}")
    print(f"    Height: {str(dimension.height)}")


def print_detected_languages(
    detected_languages: Sequence[documentai.Document.Page.DetectedLanguage],
) -> None:
    print("    Detected languages:")
    for lang in detected_languages:
        print(f"        {lang.language_code} ({lang.confidence:.1%} confidence)")


def print_blocks(blocks: Sequence[documentai.Document.Page.Block], text: str) -> None:
    print(f"    {len(blocks)} blocks detected:")
    first_block_text = layout_to_text(blocks[0].layout, text)
    print(f"        First text block: {repr(first_block_text)}")
    last_block_text = layout_to_text(blocks[-1].layout, text)
    print(f"        Last text block: {repr(last_block_text)}")


def print_paragraphs(
    paragraphs: Sequence[documentai.Document.Page.Paragraph], text: str
) -> None:
    print(f"    {len(paragraphs)} paragraphs detected:")
    first_paragraph_text = layout_to_text(paragraphs[0].layout, text)
    print(f"        First paragraph text: {repr(first_paragraph_text)}")
    last_paragraph_text = layout_to_text(paragraphs[-1].layout, text)
    print(f"        Last paragraph text: {repr(last_paragraph_text)}")


def print_lines(lines: Sequence[documentai.Document.Page.Line], text: str) -> None:
    print(f"    {len(lines)} lines detected:")
    first_line_text = layout_to_text(lines[0].layout, text)
    print(f"        First line text: {repr(first_line_text)}")
    last_line_text = layout_to_text(lines[-1].layout, text)
    print(f"        Last line text: {repr(last_line_text)}")


def print_tokens(tokens: Sequence[documentai.Document.Page.Token], text: str) -> None:
    print(f"    {len(tokens)} tokens detected:")
    first_token_text = layout_to_text(tokens[0].layout, text)
    first_token_break_type = tokens[0].detected_break.type_.name
    print(f"        First token text: {repr(first_token_text)}")
    print(f"        First token break type: {repr(first_token_break_type)}")
    if tokens[0].style_info:
        print_style_info(tokens[0].style_info)

    last_token_text = layout_to_text(tokens[-1].layout, text)
    last_token_break_type = tokens[-1].detected_break.type_.name
    print(f"        Last token text: {repr(last_token_text)}")
    print(f"        Last token break type: {repr(last_token_break_type)}")
    if tokens[-1].style_info:
        print_style_info(tokens[-1].style_info)


def print_symbols(
    symbols: Sequence[documentai.Document.Page.Symbol], text: str
) -> None:
    print(f"    {len(symbols)} symbols detected:")
    first_symbol_text = layout_to_text(symbols[0].layout, text)
    print(f"        First symbol text: {repr(first_symbol_text)}")
    last_symbol_text = layout_to_text(symbols[-1].layout, text)
    print(f"        Last symbol text: {repr(last_symbol_text)}")


def print_image_quality_scores(
    image_quality_scores: documentai.Document.Page.ImageQualityScores,
) -> None:
    print(f"    Quality score: {image_quality_scores.quality_score:.1%}")
    print("    Detected defects:")

    for detected_defect in image_quality_scores.detected_defects:
        print(f"        {detected_defect.type_}: {detected_defect.confidence:.1%}")


def print_style_info(style_info: documentai.Document.Page.Token.StyleInfo) -> None:
    """
    Only supported in version `pretrained-ocr-v2.0-2023-06-02`
    """
    print(f"           Font Size: {style_info.font_size}pt")
    print(f"           Font Type: {style_info.font_type}")
    print(f"           Bold: {style_info.bold}")
    print(f"           Italic: {style_info.italic}")
    print(f"           Underlined: {style_info.underlined}")
    print(f"           Handwritten: {style_info.handwritten}")
    print(
        f"           Text Color (RGBa): {style_info.text_color.red}, {style_info.text_color.green}, {style_info.text_color.blue}, {style_info.text_color.alpha}"
    )


def print_visual_elements(
    visual_elements: Sequence[documentai.Document.Page.VisualElement], text: str
) -> None:
    """
    Only supported in version `pretrained-ocr-v2.0-2023-06-02`
    """
    checkboxes = [x for x in visual_elements if "checkbox" in x.type]
    math_symbols = [x for x in visual_elements if x.type == "math_formula"]

    if checkboxes:
        print(f"    {len(checkboxes)} checkboxes detected:")
        print(f"        First checkbox: {repr(checkboxes[0].type)}")
        print(f"        Last checkbox: {repr(checkboxes[-1].type)}")

    if math_symbols:
        print(f"    {len(math_symbols)} math symbols detected:")
        first_math_symbol_text = layout_to_text(math_symbols[0].layout, text)
        print(f"        First math symbol: {repr(first_math_symbol_text)}")




def process_document(
    project_id: str,
    location: str,
    processor_id: str,
    processor_version: str,
    file_path: str,
    mime_type: str,
    process_options: Optional[documentai.ProcessOptions] = None,
) -> documentai.Document:
    # You must set the `api_endpoint` if you use a location other than "us".
    client = documentai.DocumentProcessorServiceClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-documentai.googleapis.com"
        )
    )

    # The full resource name of the processor version, e.g.:
    # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
    # You must create a processor before running this sample.
    name = client.processor_version_path(
        project_id, location, processor_id, processor_version
    )

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Configure the process request
    request = documentai.ProcessRequest(
        name=name,
        raw_document=documentai.RawDocument(content=image_content, mime_type=mime_type),
        # Only supported for Document OCR processor
        process_options=process_options,
    )

    result = client.process_document(request=request)

    # For a full list of `Document` object attributes, reference this page:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1/Document
    return result.document




def layout_to_text(layout: documentai.Document.Page.Layout, text: str) -> str:
    """
    Document AI identifies text in different parts of the document by their
    offsets in the entirety of the document"s text. This function converts
    offsets to a string.
    """
    # If a text segment spans several lines, it will
    # be stored in different text segments.
    return "".join(
        text[int(segment.start_index) : int(segment.end_index)]
        for segment in layout.text_anchor.text_segments
    )

다음 단계

프로세서 목록을 검토합니다.
Layout Parser를 사용하여 문서를 읽을 수 있는 청크로 분리합니다.
커스텀 분류기를 만듭니다.