Scale customer reach and grow sales with AskHandle chatbot

How Do We Use LLMs For Code Search?

Finding specific pieces of code in large codebases can be a challenging and time-consuming task. Traditional search methods often rely on keyword matching, which might not be effective when the exact terms are unknown or when searching for code snippets that perform a particular function. Artificial Intelligence (AI) offers new ways to improve code search, making it more efficient and accurate. This article explains how AI can be applied to code search, the benefits it brings, and some practical approaches to implement it.

image-1
Written by
Published onSeptember 6, 2025
RSS Feed for BlogRSS Blog

How Do We Use LLMs For Code Search?

Developers need fast, precise ways to find code that matches a task or concept. AI can make search feel like asking a teammate instead of guessing filenames or symbols.

What Are We Trying To Achieve?

  • Ask questions in natural language and get relevant functions, classes, or patterns
  • Search across languages and repositories
  • Return snippets with file paths and line ranges
  • Explain why a match is relevant and offer usage examples

System Overview

A practical AI code search system has three loops:

  1. Indexing loop – parse repos, split code into units, create embeddings, store vectors and metadata.
  2. Query loop – rewrite the user query, run hybrid retrieval (lexical + vector), re-rank, and return matches.
  3. Answer loop – feed the top results to an LLM for summarization, examples, and next-step guidance, with strict “don’t-make-stuff-up” prompting.

Indexing Pipeline (Concrete Steps)

  1. Repo intake

    • Pull source from main and active branches.
    • Respect .gitignore and exclude vendor, build, and minified assets.
  2. Code splitting

    • Prefer semantic chunks: one function or method per chunk; fall back to small text windows (e.g., 100–200 lines with overlap) when parsing fails.
    • Extract metadata per chunk: language, file path, symbol name, start/end lines, docstrings, imports.
  3. Static structure

    • Build a symbol table with references and callers.
    • Capture an AST digest (node types, identifiers) to aid structural matching.
  4. Embeddings

    • Use a code-aware embedding model.
    • Create vectors for each chunk and a separate vector for the docstring/comment-only view.
    • Normalize vectors; store in a vector index such as FAISS or a similar ANN store.
  5. Lexical sidecar

    • Build a keyword/BM25 index (filenames, identifiers, comments).
    • Keep n‑gram and regex support for exact symbol or API lookups.
  6. Storage

    • Save: {id, repo, branch, path, lang, symbol, lines, vector, text, doc_vector, imports, callers}.
    • Incremental reindex on commit using git hooks or CI.
  1. Query rewriting with an LLM

    • Expand the user query into:

      • synonyms/aliases (dict merge, map update)
      • language constraints (“Python only”)
      • structural hints (function, class, interface, test)
      • optional regex or API names if present
  2. Hybrid retrieval

    • Run lexical search and vector search in parallel.
    • Take the union of top‑k results from both (e.g., 200 total).
  3. Semantic re-ranking

    • Use a cross-encoder or an LLM “judge” prompt to score each (query, snippet) pair.
    • Add features to the score: path match, language match, recent edits, popularity, call graph proximity.
  4. Diversification

    • Apply maximal marginal relevance (MMR) so results are not near-duplicates.
  5. Result packaging

    • Return: snippet, file path, line range, why-it-matches note, quick usage example.

Answer Loop (Turning Results Into Help)

Feed the top snippets and metadata to an LLM with strict instructions:

  • Cite file paths and lines for every claim.
  • Prefer code directly from the repo; do not invent APIs.
  • If confidence is low, ask a clarifying question or show multiple candidates.
  • Offer a minimal working example using only retrieved snippets.

RAG prompt sketch

Html

Practical Recipes

Natural-Language → Code

  • “merge two dicts without mutation in Python”

    • Query rewrite adds: copy, dict, update, | operator, Mapping
    • Structural filter: functions returning a new mapping
    • Results re-ranked with preference for pure functions and tests referencing them

Code → Code (reverse lookup)

  • Paste a call site; ask for its definition or similar implementations.
  • Embed the pasted code and run vector search to find near neighbors across languages.
  • Input: old API call; ask for code that uses a replacement API.
  • Lexical side finds call sites; vector side surfaces adapter functions and tests.
  • LLM generates a patch sketch referencing actual files/lines.

Snippet Scoring Heuristics That Help

  • Path prior: prefer src/ over examples/, prefer non‑deprecated directories.
  • Freshness: recently touched code gets a small boost.
  • Test linkage: snippets referenced in tests rank higher.
  • Comment density: helpful docstrings increase the score slightly.
  • Call graph: functions widely referenced near the query context climb the list.

Prompt Patterns You Can Reuse

Query rewrite

Html

Judge (re-ranker)

Html

Evaluating Quality

  • Top‑k recall: does the correct file appear in the top 10/50?
  • MRR / nDCG: ranking quality for ground‑truth query→file pairs.
  • Time‑to‑first‑useful‑click: user-centric metric.
  • Abstention rate: frequency of honest “not found” responses.
  • Error audits: sample failures where the system pointed to wrong code.

Create a small gold set: real tickets, PR review questions, and onboarding tasks; map each to the expected files/lines.

Privacy And Security

  • Keep embedding and indexing on infrastructure you control for private repos.
  • Strip secrets and large binaries from the index.
  • Respect license boundaries when mixing public and private code.
  • Log queries and clicks with redaction; never store raw prompts that include credentials.

Common Pitfalls

  • Chunks too large, hiding the target function in noise.
  • Index drift from stale branches; run scheduled refreshes.
  • Re-ranker absent or weak, causing noisy top results.
  • Prompts that allow hallucinated APIs; add strict rules and abstain logic.
  • Ignoring structural signals (AST, call graph, tests) that could break ties.

Quick Start Checklist

  1. Parse repos and split into function-level chunks.
  2. Build embeddings and a FAISS index; build a BM25 index too.
  3. Add metadata: path, language, symbol, lines, tests, imports.
  4. Implement LLM query rewrite and hybrid retrieval.
  5. Re-rank with a cross-encoder or LLM judge; diversify results.
  6. Wrap the top snippets in a RAG prompt with strict “no invention” rules.
  7. Track top‑k recall, MRR, time‑to‑first‑click; iterate on chunking and prompts.
  8. Add CI hooks for incremental indexing and stale-index alerts.

AI code search shines when it blends structure (AST, symbols), statistics (embeddings), and dialogue (prompts that reward honesty). Start small on one repo, tune chunking and re-ranking, then scale to the rest of your codebase.

CodeSearchAI
Create your AI Agent

Automate customer interactions in just minutes with your own AI Agent.

Featured posts

Subscribe to our newsletter

Achieve more with AI

Enhance your customer experience with an AI Agent today. Easy to set up, it seamlessly integrates into your everyday processes, delivering immediate results.