Document splitters behavior

General splitter behavior

Splitter output contains split information for the input document, including a confidence score. Splitters do not split the document for you. The Document AI API outputs a Document JSON object, and the output format uses the entities field for representing document splits. Additional information depends on the specific type of splitter.

Output examples:

Document


{
  "text": "page1 page2 page3",
  "entities": [
    {
      "type": "",
      "confidence": 0.9,
      "text_anchor": {
        "text_segments": {
          "start_index": 0,
          "end_index": 12
        }
      },
      "page_anchor": {
        "page_refs": [
          {
            "page": 0
          },
          {
            "page": 1
          }
        ]
      }
    },
    {
      "type": "",
      "confidence": 0.8,
      "text_anchor": {
        "text_segments": {
          "start_index": 12,
          "end_index": 18
        }
      },
      "page_anchor": {
        "page_refs": [
          {
            "page": 2
          }
        ]
      }
    }
  ]
}