Transforming organizational knowledge management with AI-powered multi-modal search for Internet2's cloud community

Overview

The Cal Poly Digital Transformation Hub (DxHub), powered by Amazon Web Services (AWS) and part of the AWS Cloud Innovation Centers (CIC) program, collaborated with the Internet2 NET+ cloud service program to transform how subscribers of their Cloud Infrastructure Community Program (CICP) access and use their growing collections of meeting content and educational materials. 

The NET+ cloud service program team at Internet2, a nonprofit, member-driven technology community serving U.S. research and higher education institutions, needed an intelligent solution to make hundreds of documents, video recordings, presentations, and audio files searchable through natural-language queries. 

The result was a multimodal Retrieval-Augmented Generation (RAG) chatbot system that changes the way community members discover and interact with information.  

The solution processes multiple file types — including PDFs, MP4 videos, audio files, PowerPoints, and Google Slides — using intelligent content prioritization and deep linking to deliver fast, accurate, and source-linked answers. 

Problem

Internet2’s CICP produces a massive volume of educational and technical content through monthly meetings, conferences, and collaborative discussions. Over the years, this work has generated thousands of files, meeting transcripts, presentation slides, PDFs, and video recordings, each filled with valuable institutional knowledge. 

However, this information was scattered across numerous Google Drive folders, Confluence pages, and repositories, making it difficult for members to efficiently locate specific content. Traditional search tools required users to manually dig through long recordings or navigate dozens of documents just to find a single discussion, code example, or policy reference. 

Members often couldn’t recall which meeting or document contained the information they needed, leading to repeated questions, duplicated work, and underutilized community expertise. The challenge was compounded by the wide range of file types, each requiring different approaches for indexing and search, as well as the need to balance access to subscriber-only and public content.  

The Internet2 NET+ cloud service team needed a secure, scalable, and intelligent system capable of unifying these distributed materials into a single knowledge base. The solution had to interpret natural-language questions, deliver precise, context-aware answers, and link users directly to timestamps or document sections for full traceability, all while maintaining data privacy and preserving the richness of the community’s multiple technical perspectives.

Innovation In Action

The Cal Poly Digital Transformation Hub (DxHub) team developed a sophisticated AI-powered search system that transforms how Internet2’s CICP subscribers interact with years of collective knowledge. Using Generative AI and a Retrieval-Augmented Generation (RAG) pipeline built on AWS, the chatbot enables users to ask natural-language questions and instantly receive comprehensive, trustworthy answers drawn from across the community’s vast content library. Members can simply type questions like “How do I configure AWS Control Tower?” and receive detailed responses synthesized from multiple sources—meeting transcripts, presentation slides, and technical documents—with direct links back to the original materials. 

A standout feature of the system is its deep linking capability, which provides timestamp-specific references to video content and page-level links to PDF documents, allowing users to jump directly to the exact point at which a topic is discussed. This level of precision makes it dramatically faster and easier to surface relevant insights from hours of video recordings or hundreds of pages of documentation. 

To power this experience, the chatbot leverages AWS Lambda and Amazon S3 for continuous ingestion and content updates, Amazon Titan Text Embeddings V2 and Amazon OpenSearch Service for semantic retrieval, and Amazon Bedrock (Anthropic Claude 3.5 Sonnet) for reasoning and response generation. AWS Transcribe converts audio and video into searchable transcripts, while Amazon Textract extracts text from slides and embedded images, ensuring every data type contributes to the knowledge base. 

The system intelligently handles conflicting viewpoints by presenting multiple perspectives instead of blending them, maintaining what Bob Flynn, Program Manager of Cloud Infrastructure & Platform Services at Internet2, called “the richness of the knowledge of the community.” It also includes chat history and context retention, enabling users to ask follow-up questions naturally within the same session, while a feedback mechanism continuously improves response quality over time. 

As Tim Manik, Cloud Solutions Architect at Internet2, remarked, “The work that both of you have done in the past two weeks is incredibly impressive.” Together, these innovations created a living, multi-modal knowledge assistant that preserves institutional expertise, deepens engagement, and redefines how Internet2’s CICP subscribers discover and share information.

Technical Solution

The solution leverages a serverless architecture built entirely on AWS managed services, ensuring scalability, security, and reliability. At its core, the system uses AWS Lambda for document processing and query handling, Amazon S3 for secure content storage, and Amazon OpenSearch Service for intelligent document retrieval and ranking. 

A multi-modal ingestion framework processes a wide variety of file types using intelligent routing and decision-tree logic that prioritizes the highest-quality sources when multiple formats are available. Amazon DynamoDB maintains chat history and user context, enabling continuity across sessions, while advanced retrieval methodologies, including document falloff techniques, are used to refine and optimize response quality.  

The architecture also enforces subscriber and non-subscriber access controls through metadata attributes, ensuring that users only see content appropriate to their membership level while maintaining the system’s comprehensive search capabilities. 

For user interaction, the platform includes both a demonstration interface and an embeddable React-based web component designed for seamless integration into Internet2’s Confluence pages. This approach provides a consistent, intuitive experience while establishing a framework that enables future opportunities for scalability.

Next Steps

Following successful testing and refinement phases, the Internet2 NET+ cloud service team plans to deploy the chatbot system to its full CICP community, with ongoing enhancements based on user feedback and usage patterns. The system will continue to expand its support for document types and refine its content prioritization algorithms to provide even more accurate responses. Future development includes enhanced automation for content harvesting from Google Drive repositories and expanded integration capabilities with Internet2’s existing digital infrastructure.  

The success of this implementation positions the solution as a model for other research and education organizations seeking to unlock the value of their institutional knowledge repositories.  

Jan Day from AWS expressed enthusiasm for the project’s impact, stating: “This is awesome. I’m really excited. Kudos to the team.”  

Organizations interested in implementing similar AI-powered knowledge management solutions can engage with the Cal Poly Digital Transformation Hub to explore how multi-modal RAG systems can transform their information accessibility and community engagement. 

Student Spotlight

Nick Riley

Software Developer

Supporting Documents

Source Code All of the code and assets developed during the course of creating the prototype.

About the DxHub

The Cal Poly Digital Transformation Hub (DxHub) is a strategic relationship with Amazon Web Services (AWS) and is the world’s first cloud innovation center supported by AWS on a University campus. The primary goal of the DxHub is to provide students with real-world problem-solving experiences by immersing them in the application of proven innovation methods in combination with the latest technologies to solve important challenges in the public sector. The challenges being addressed cover a wide variety of topics including homelessness, evidence-based policing, digital literacy, virtual cybersecurity laboratories and many others. The DxHub leverages the deep subject matter expertise of government, education, and non-profit organizations to clearly understand the customers affected by public sector challenges and develop solutions that meet the customer needs.