How Do We Use LLMs For Code Search?

Developers need fast, precise ways to find code that matches a task or concept. AI can make search feel like asking a teammate instead of guessing filenames or symbols.

What Are We Trying To Achieve?

Ask questions in natural language and get relevant functions, classes, or patterns
Search across languages and repositories
Return snippets with file paths and line ranges
Explain why a match is relevant and offer usage examples

System Overview

A practical AI code search system has three loops:

Indexing loop – parse repos, split code into units, create embeddings, store vectors and metadata.
Query loop – rewrite the user query, run hybrid retrieval (lexical + vector), re-rank, and return matches.
Answer loop – feed the top results to an LLM for summarization, examples, and next-step guidance, with strict “don’t-make-stuff-up” prompting.

Indexing Pipeline (Concrete Steps)

Repo intake
- Pull source from main and active branches.
- Respect .gitignore and exclude vendor, build, and minified assets.
Code splitting
- Prefer semantic chunks: one function or method per chunk; fall back to small text windows (e.g., 100–200 lines with overlap) when parsing fails.
- Extract metadata per chunk: language, file path, symbol name, start/end lines, docstrings, imports.
Static structure
- Build a symbol table with references and callers.
- Capture an AST digest (node types, identifiers) to aid structural matching.
Embeddings
- Use a code-aware embedding model.
- Create vectors for each chunk and a separate vector for the docstring/comment-only view.
- Normalize vectors; store in a vector index such as FAISS or a similar ANN store.
Lexical sidecar
- Build a keyword/BM25 index (filenames, identifiers, comments).
- Keep n‑gram and regex support for exact symbol or API lookups.
Storage
- Save: {id, repo, branch, path, lang, symbol, lines, vector, text, doc_vector, imports, callers}.
- Incremental reindex on commit using git hooks or CI.

Query Pipeline (What Runs On Each Search)

Query rewriting with an LLM
- Expand the user query into:
  - synonyms/aliases (dict merge, map update)
  - language constraints (“Python only”)
  - structural hints (function, class, interface, test)
  - optional regex or API names if present
Hybrid retrieval
- Run lexical search and vector search in parallel.
- Take the union of top‑k results from both (e.g., 200 total).
Semantic re-ranking
- Use a cross-encoder or an LLM “judge” prompt to score each (query, snippet) pair.
- Add features to the score: path match, language match, recent edits, popularity, call graph proximity.
Diversification
- Apply maximal marginal relevance (MMR) so results are not near-duplicates.
Result packaging
- Return: snippet, file path, line range, why-it-matches note, quick usage example.

Answer Loop (Turning Results Into Help)

Feed the top snippets and metadata to an LLM with strict instructions:

Cite file paths and lines for every claim.
Prefer code directly from the repo; do not invent APIs.
If confidence is low, ask a clarifying question or show multiple candidates.
Offer a minimal working example using only retrieved snippets.

RAG prompt sketch

Html

Practical Recipes

Natural-Language → Code

“merge two dicts without mutation in Python”
- Query rewrite adds: copy, dict, update, | operator, Mapping
- Structural filter: functions returning a new mapping
- Results re-ranked with preference for pure functions and tests referencing them

Code → Code (reverse lookup)

Paste a call site; ask for its definition or similar implementations.
Embed the pasted code and run vector search to find near neighbors across languages.

API-migration search

Input: old API call; ask for code that uses a replacement API.
Lexical side finds call sites; vector side surfaces adapter functions and tests.
LLM generates a patch sketch referencing actual files/lines.

Snippet Scoring Heuristics That Help

Path prior: prefer src/ over examples/, prefer non‑deprecated directories.
Freshness: recently touched code gets a small boost.
Test linkage: snippets referenced in tests rank higher.
Comment density: helpful docstrings increase the score slightly.
Call graph: functions widely referenced near the query context climb the list.

Prompt Patterns You Can Reuse

Query rewrite

Html

Judge (re-ranker)

Html

Evaluating Quality

Top‑k recall: does the correct file appear in the top 10/50?
MRR / nDCG: ranking quality for ground‑truth query→file pairs.
Time‑to‑first‑useful‑click: user-centric metric.
Abstention rate: frequency of honest “not found” responses.
Error audits: sample failures where the system pointed to wrong code.

Create a small gold set: real tickets, PR review questions, and onboarding tasks; map each to the expected files/lines.

Privacy And Security

Keep embedding and indexing on infrastructure you control for private repos.
Strip secrets and large binaries from the index.
Respect license boundaries when mixing public and private code.
Log queries and clicks with redaction; never store raw prompts that include credentials.

Common Pitfalls

Chunks too large, hiding the target function in noise.
Index drift from stale branches; run scheduled refreshes.
Re-ranker absent or weak, causing noisy top results.
Prompts that allow hallucinated APIs; add strict rules and abstain logic.
Ignoring structural signals (AST, call graph, tests) that could break ties.

Quick Start Checklist

Parse repos and split into function-level chunks.
Build embeddings and a FAISS index; build a BM25 index too.
Add metadata: path, language, symbol, lines, tests, imports.
Implement LLM query rewrite and hybrid retrieval.
Re-rank with a cross-encoder or LLM judge; diversify results.
Wrap the top snippets in a RAG prompt with strict “no invention” rules.
Track top‑k recall, MRR, time‑to‑first‑click; iterate on chunking and prompts.
Add CI hooks for incremental indexing and stale-index alerts.

AI code search shines when it blends structure (AST, symbols), statistics (embeddings), and dialogue (prompts that reward honesty). Start small on one repo, tune chunking and re-ranking, then scale to the rest of your codebase.

CodeSearchAI

Create your AI Agent

Automate customer interactions in just minutes with your own AI Agent.

Get started for free Chat with AI for fun

Featured posts

What is a REST API and Why Is It Useful?

When working with modern web applications, you often hear about APIs and how they help different software systems communicate. One of the most common types of APIs used today is called REST API. If you’re preparing for a tech interview or just want to understand how web services operate, understanding what a REST API is and why it’s useful can be very helpful.

What Is SLA and Why Do You Need It?

SLA, or Service Level Agreement, is a fundamental part of managing relationships between service providers and clients. It is a formal document that defines the level of service expected from a provider, helping to set clear expectations and responsibilities. This article explains what SLA is and why establishing one is crucial for any business that relies on external services or internal departments.

Gemini 2.0: The Next Level of AI

The world of artificial intelligence continues to move forward, and a new arrival has entered the scene: Gemini 2.0. This new model from Google aims to push the boundaries of what AI can do, moving beyond simple question answering to more complex, agent-like actions. It is not just about processing information; it's about making AI a more active and helpful tool.

Artificial Intelligence: Transforming Industries

Artificial intelligence (AI) is changing how many businesses operate. Its ability to analyze data, automate tasks, and make informed decisions is impacting many sectors. We will discuss the practical uses of AI in healthcare and finance.

Understanding Neural Networks: The Brain Behind Chatbots

Neural networks, the cornerstone of modern artificial intelligence, work as the brain for chatbots, enabling them to think, make decisions, and communicate with humans in natural language. But how exactly does a neural network operate, and what makes it so adept at handling complex tasks like human conversation?

What Does a Data Center Do?

A data center is a large, high-tech facility filled with powerful computers that work continuously to store, process, and manage vast amounts of data. These machines are not ordinary; they handle the essential data and systems that businesses and organizations rely on daily. Data centers host critical IT infrastructure, enabling everything from website hosting and cloud services to data storage and backups. They are the backbone of our digital world, ensuring that technology operates seamlessly and efficiently, supporting the services we depend on every day.

What Is a PDF Reader?

PDF readers are software applications designed to open and display files saved in the Portable Document Format (PDF). These programs provide an easy way to view, and sometimes interact with, documents that maintain their formatting across different devices and platforms. This article will explore what a PDF reader is, its features, and its common uses.

AI in the Workplace: Why It’s a Tool for Reducing Burnout, Not Increasing It

As artificial intelligence becomes a larger part of our work environments, concerns have arisen about whether these tools will lead to burnout by adding more technology and complexity to employees’ routines. But AI actually holds significant potential to reduce burnout, as it helps streamline tasks, automate repetitive work, and allows employees to focus on higher-value responsibilities. While there’s an initial learning curve, AI ultimately improves work efficiency, helping employees create a more balanced and manageable workday.

Achieve more with AI

Enhance your customer experience with an AI Agent today. Easy to set up, it seamlessly integrates into your everyday processes, delivering immediate results.

Try for free Get a demo

Latest posts

AskHandle Blog

Ideas, tips, guides, interviews, industry best practices, and news.

• February 12, 2025

How Does Reinforcement Learning Improve LLM Performance During Training?

Large language models have greatly improved due to reinforcement learning (RL). RL allows LLMs to learn from feedback, improving their ability to generate relevant, coherent, and helpful text. This article explains how the RL process works in training LLMs, with simple examples.

Reinforcement LearningRLLLM

• October 25, 2024

Introducing Stable Diffusion 3.5: A New Era of Image Generation

Stability AI has launched the highly anticipated Stable Diffusion 3.5, featuring a range of models designed to empower creators and businesses alike. This release includes Stable Diffusion 3.5 Large, Stable Diffusion 3.5 Large Turbo, and the soon-to-be-released Stable Diffusion 3.5 Medium, which debuts on October 29th. These models promise superior customizability, high-quality image generation, and efficient performance—all while being accessible for both commercial and non-commercial use under the Stability AI Community License.

ImageStable DiffusionAI

• July 25, 2024

How to Run Llama 3 on Mac: A Step-by-Step Guide

Llama is a series of advanced artificial intelligence models developed by Meta. In this tutorial, we’ll guide you through the process of running Meta Llama on a Mac using Ollama, a powerful tool for setting up and running large language models locally.

Llama 3LLMAI

View all posts