Using Artificial Intelligence to pull summarized content from large document collections

Overview

The Digital Transformation Hub at Cal Poly (DxHub), in partnership with Amazon Web Services (AWS), has collaborated with the Data Lab team at the World Bank to create and launch an innovative AI-powered chatbot named ‘Pluto’. This chatbot distills over 75 years of developmental knowledge encapsulated within thousands of publicly accessible reports. Pluto enables development teams to efficiently access advice and insights derived from previous development projects, such as those focused on wastewater and water treatment. For instance, by querying Pluto with a request like “give me the top 3 lessons learned on clean water projects in Africa”, users receive synthesized, high-quality responses along with links to the source documents for in-depth exploration. This approach maximizes the strengths of AI technology by leveraging easier search mechanisms and the summarization of aggregate data while mitigating hallucinations.

Problem

Historically, governmental project teams in countries receiving World Bank loans have faced challenges in accessing comprehensive lessons from past development projects. The vast public documentation available was cumbersome to navigate, making it a daunting task to extract relevant lessons and advice applicable to new initiatives. The solution being developed aims to overcome this barrier by offering an easy-to-use, interactive interface. This interface is built atop an intelligent search engine that probes the extensive archives, records, and data of the World Bank, streamlining access to invaluable insights. For the purpose of the pilot, the team worked with a subset of data that focused on water quality projects.

Innovation in action

The team applied the Amazon Working Backwards process by clearly framing the problem and users to solve for and then interviewing multiple prospective users to assess their existing pain points when approach setting up new development projects. This work set the foundation for the design of the tool to ensure alignment with the needs of real-world development experts. The World Bank team envisioned the MVP (minimum viable product) of the solution by writing a fictitious and future oriented press release that detailed out the launch of the solution while clearly describing the need and the new customer experience that would be created. From there the team quickly got to work diving into the design and implementation details of the solution. In an exciting twist to the innovation process, the team received Early Access (EA) to the new AI enabling Amazon Bedrock service and access to the marketplace of large language models (LLM’s) and specifically to Anthropic’s Claude model which powers the prototype.

Solution

To accomplish the vision the team began with the publicly available World Bank APIs that pull public documents and metadata attributes into Amazon Simple Storage Service (S3). The Pluto chatbot then processes user queries through Amazon Kendra, leveraging its semantic ranking capabilities and the document attributes provided by the API to enhance relevance. By searching the over public 4000 documents that were transitioned to S3, the chatbot gets back a ranked list of relevant documents and excerpt text. Using a Retrieval Augmented Generation (RAG) workflow, the document contents are passed to Amazon Bedrock. Amazon Bedrock provides access to 6 different foundation models via an API call all within the privacy and security of your own account. The RAG workflow allows for the summarization of pertinent document contents into natural language responses, courtesy of Amazon Bedrock and Anthropic’s Claude v2 LLM. This process enables continued user interaction and clarification queries. Importantly, all responses are backed by links to the source documents, bolstering user confidence and minimizing potential misinformation inherent in LLM outputs. This give the users confidence on the source and greatly reduces any hallucinations that LLMs can be prone to. The World Bank team has leveraged this pilot to continue to explore how to best use AI to help its clients discover knowledge more easily.

About the DxHub

The Cal Poly Digital Transformation Hub (DxHub) is a strategic relationship with Amazon Web Services (AWS) and is the world’s first cloud innovation center supported by AWS on a University campus. The primary goal of the DxHub is to provide real-world problem-solving experiences to students by immersing them in the application of proven innovation methods in combination with the latest technologies to solve important challenges in the public sector. The challenges being addressed cover a wide variety of topics including homelessness, evidence-based policing, digital literacy, virtual cybersecurity laboratories and many others. The DxHub leverages the deep subject matter expertise of government, education and non-profit organizations to clearly understand the customers affected by public sector challenges and develops solutions that meet the customer needs.