Background
As a software engineer, developing new features require reading a lot of relevant data with previous pull requests, the code involved, as well as how to get started. However, searching for a relevant PR itself has proved to be a chore as developers need to go through a lot of irrelevant PRs to find the one (or more) that is relevant to the feature they’re developing. Hence, we develop GitGlimpse, to serve you the right information that is relevant to the feature you’re building through the query you throw to the system.
Presentation
System Architecture
This project implements a sophisticated search system designed to help developers understand and navigate large codebases through historical pull request data. The architecture consists of five interconnected components working in sequence. Beginning with the Data Collection & Processing Pipeline, the system gathers and preprocesses pull request information from repositories, transforming raw data into searchable documents. The Dual Retrieval System employs a hybrid approach combining keyword-based search and semantic similarity, allowing for both exact matches and contextually relevant results. Query Processing & Enhancement utilizes artificial intelligence to refine user queries, expanding them with relevant technical terms and context to improve search accuracy. Document Summarization processes retrieved documents in parallel, distilling key information about code changes, patterns, and best practices. Finally, the Response Generation component synthesizes the summarized information into comprehensive, contextual answers that guide developers through the codebase's patterns and conventions. This architecture enables efficient search and understanding of large codebases, making it easier for developers to find relevant information and best practices from historical data.
Github Project
git-glimpse
NathanAW24 • Updated Mar 25, 2025