Toolbox - Output table to Dataframe or CSV
Stay organized with collections
Save and categorize content based on your preferences.
Export tables from a processed document (or document shards) to a Pandas Dataframe or a CSV file.
Explore further
For detailed documentation that includes this code sample, see the following:
Code sample
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis tool exports tables from a processed document or its shards.\u003c/p\u003e\n"],["\u003cp\u003eTables can be exported into a Pandas Dataframe for further manipulation and analysis.\u003c/p\u003e\n"],["\u003cp\u003eThe exported tables can also be saved as CSV, HTML, or Markdown files.\u003c/p\u003e\n"],["\u003cp\u003eThe code provided helps authenticate to Document AI and offers information on setting up local authentication for development.\u003c/p\u003e\n"]]],[],null,["# Toolbox - Output table to Dataframe or CSV\n\nExport tables from a processed document (or document shards) to a Pandas Dataframe or a CSV file.\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Document AI Toolbox client libraries](/document-ai/docs/toolbox)\n- [Handle processing response](/document-ai/docs/handle-response)\n\nCode sample\n-----------\n\n### Python\n\n\nFor more information, see the\n[Document AI Python API\nreference documentation](/python/docs/reference/documentai/latest).\n\n\nTo authenticate to Document AI, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n\n from google.cloud.documentai_toolbox import document\n\n # TODO(developer): Uncomment these variables before running the sample.\n # Given a local document.proto or sharded document.proto in path\n # document_path = \"path/to/local/document.json\"\n # output_file_prefix = \"output/table\"\n\n\n def table_sample(document_path: str, output_file_prefix: str) -\u003e None:\n wrapped_document = document.Document.from_document_path(document_path=document_path)\n\n print(\"Tables in Document\")\n for page in wrapped_document.pages:\n for table_index, table in enumerate(page.tables):\n # Convert table to Pandas Dataframe\n # Refer to https://pandas.pydata.org/docs/reference/frame.html for all supported methods\n df = table.to_dataframe()\n print(df)\n\n output_filename = f\"{output_file_prefix}-{page.page_number}-{table_index}\"\n\n # Write Dataframe to CSV file\n df.to_csv(f\"{output_filename}.csv\", index=False)\n\n # Write Dataframe to HTML file\n df.to_html(f\"{output_filename}.html\", index=False)\n\n # Write Dataframe to Markdown file\n df.to_markdown(f\"{output_filename}.md\", index=False)\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=documentai)."]]