Making of a Jump Start Solution: Generative AI for document summarization
Yvonne Li
ML Engineer - DevRel
At Google Cloud Next, we announced Jump Start Solutions. Each Jump Start Solution is a simple-to-deploy architecture and application that makes it faster to get started with Google Cloud. These pre-built solutions also include interactive tutorials and a guide to teach you all about the products used and help you learn how to modify the solution for your use case.
Let's look deeper at one of the Jump Start Solutions: Generative AI Document Summarization. Yvonne Li from the engineering team answered questions and shared her insights into the factors that influenced the design, the challenges they faced, and what she recommends for learners who want to modify this solution.
[Interviewer] What types of use cases and problems can the Jump Start Solution address?
[Yvonne] Here is a classic scenario this Jump Start Solution is designed to help.
Many large enterprises have countless documents stored as PDFs. Whenever employees need to locate data, they must visually scan through files. This process can be frustrating and time-consuming for employees and costly for the company.
Generative AI Document Summarization leverages Vertex AI generative AI large language models (LLMs) to process and summarize documents on demand.
[Interviewer] What should folks expect to learn or be able to do once they've deployed this JSS?
[Yvonne] By deploying the Generative AI Document Summarization solution, you will be able to:
- Understand how the Generative AI Document Summarization application works.
- Deploy an application that orchestrates the document summarization process.
- Trigger a pipeline with a PDF upload and view a generated summary.
[Interviewer] Why did you pick the architecture, frameworks, and languages you did?
[Yvonne] We chose the Vertex AI PaLM API because it supported our use case of accepting and summarizing ad hoc user submissions.
For this Jump Start Solution, we picked Cloud Functions as the process runner over Cloud Run. Here are a few reasons why:
- Simplicity: Cloud Functions are more straightforward to write than Cloud Run services. You only need to write your business logic. Cloud Functions also handles things like web requests, making it easier for developers to get started with the Jump Start Solution.
- Cost: Cloud Functions can be more cost-effective than Cloud Run for workloads that do not require many resources. This is because Cloud Functions are charged on a per-request basis, while Cloud Run is charged on a per-second basis.
As for the language, we chose Python as it is a popular choice with data scientists and ML practitioners. The Python SDK made it very easy to work with the PaLM API.
[Interviewer] Did you run into any interesting challenges? How did you overcome them?
[Yvonne] We ran into issues with document preprocessing handling a wide variety of input content. Data cleaning is a pain and concern because directly passing the information after an optical character recognition (OCR) scan into LLMs would not provide an informative result.
In the current solution, we assumed the input file is similar to a research paper. It has different sections in the content, and we extract and preprocess those sections using a deliberate heuristic.
[Interviewer] What changes would you make, and what would you add if you were going to take this solution to production?
[Yvonne] I would change the data preparation process before ingesting data into PaLM 2. Currently, we assume that the input PDF is similar to a research paper, and we manually clean the data subsections into an abstract, a conclusion, and others. However, in real-world scenarios, you may need to adapt this process to fit your specific data needs.
For more consistently structured input PDFs — say, like forms — a better option would be to use Document AI.
[Interviewer] What pleasantly surprised you?
[Yvonne] The cost of this Jump Start Solution (including calling the PaLM model) is reasonable, considering the application's capabilities and the resources required to run it. But be careful; the cost of this application depends on the size of the input PDF file.
[Interviewer] If folks want to learn more, are there any additional resources you recommend?
[Yvonne] People who want to learn more about Generative AI can check out the Generative AI for Developers Learning Path.
To try out the Document Summarization with Generative AI Jump Start Solution, you can deploy it from the solution catalog. You can also read the guide or look at its code on GitHub.