Document Entity Extraction

Extract a set of entities from a document and return them in JSON.

You can query a model directly and test the results returned when using different parameter values with the Cloud console, or by calling the Vertex AI API directly.

System instructions
As an expert in document entity extraction, you parse documents to identify and organize specific entities from diverse sources into structured formats, following detailed guidelines for clarity and completeness.

Prompt Freeform

earnings_statement.pdf

You are a document entity extraction specialist. Given a document, your task is to extract the text value of the following entities:
{
 "earning_item": [
  {
   "earning_rate": "",
   "earning_hours": "",
   "earning_type": "",
   "earning_this_period": ""
  }
 ],
 "direct_deposit_item": [
  {
   "direct_deposit": "",
   "employee_account_number": ""
  }
 ],
 "current_deduction": "",
 "ytd_deduction": "",
 "employee_id": "",
 "employee_name": "",
 "employer_name": "",
 "employer_address": "",
 "federal_additional_tax": "",
 "federal_allowance": "",
 "federal_marital_status": "",
 "gross_earnings": "",
 "gross_earnings_ytd": "",
 "net_pay": "",
 "net_pay_ytd": "",
 "ssn": "",
 "pay_date": "",
 "pay_period_end": "",
 "pay_period_start": "",
 "state_additional_tax": "",
 "state_allowance": "",
 "state_marital_status": "",
 "tax_item": [
  {
   "tax_this_period": "",
   "tax_type": "",
   "tax_ytd": ""
  }
 ]
}

  • The JSON schema must be followed during the extraction.
  • The values must only include text strings found in the document.
  • Generate null for missing entities.

Response

{
"earning_item": [
{
"earning_rate": "20",
"earning_hours": "80",
"earning_type": "regular pay",
"earning_this_period": "1,600.00"
}
],
"direct_deposit_item": [],
"current_deduction": "160.00",
"ytd_deduction": "1,920.00",
"employee_id": "123456",
"employee_name": "Janet Doe",
"employer_name": "The Greatest Company LLC",
"employer_address": "176 Imaginary Ave\nCambridge, ΜΑ 02138",
"federal_additional_tax": null,
"federal_allowance": null,
"federal_marital_status": null,
"gross_earnings": "1,600.00",
"gross_earnings_ytd": "19,200.00",
"net_pay": "1,060.80",
"net_pay_ytd": "12,729.60",
"ssn": "XXX-XX-1234",
"pay_date": "12/15/17",
"pay_period_end": "12/12/17",
"pay_period_start": "11/10/17",
"state_additional_tax": null,
"state_allowance": null,
"state_marital_status": null,
"tax_item": [
{
"tax_this_period": "20.80",
"tax_type": "FICA MED TAX",
"tax_ytd": "249.60"
},
{
"tax_this_period": "99.20",
"tax_type": "FICA SS TAX",
"tax_ytd": "1190.40"
},
{
"tax_this_period": "179.20",
"tax_type": "FED TAX",
"tax_ytd": "2150.40"
},
{
"tax_this_period": "80.00",
"tax_type": "MA ST TAX",
"tax_ytd": "960.00"
}
]
}
Model: gemini-1.5-flash-001
Temperature: 1
Max output tokens: 8192
TopK: 40
TopP: 0.95