Class TextAnnotation (1.0.2)

TextAnnotation contains a structured representation of OCR extracted text. The hierarchy of an OCR extracted text structure is like this: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol Each structural component, starting from Page, may further have their own properties. Properties describe detected languages, breaks etc.. Please refer to the TextAnnotation.TextProperty message definition below for more detail.

UTF-8 text detected on the pages.

Classes

DetectedBreak

Detected start or end of a structural component.

True if break prepends the element.

DetectedLanguage

Detected language for a structural component.

Confidence of detected language. Range [0, 1].

TextProperty

Additional information detected on the structural component.

Detected start or end of a text segment.