How do I ask an AI to web search for me?
Most LLM products with “browse” or “online” modes hide their web search stack behind proprietary infrastructure. That’s convenient, but it also means:
- You have no control over which search engine or sources are used.
- You can’t customize ranking, filtering, or post‑processing.
- You’re locked into a specific vendor’s behavior and pricing.
By building your own web search integration, you can:
- Swap between Google, Bing, SerpAPI, Tavily, Parallel, or your own crawler.
- Enforce domain allow‑lists/deny‑lists and security rules.
- Tune how much and what kind of web content the model sees.
The core idea is simple: the LLM doesn’t “browse” the web by itself. Instead, it calls a tool you define—typically named something like web_search—and your backend handles the actual HTTP requests and data cleaning.
High-Level Architecture
At a high level, a custom web search workflow looks like this:
- The user asks a question.
- The LLM decides whether it needs fresh, external information.
- If it does, it calls a
web_searchtool with a query string. - Your backend receives this tool call, runs your own search logic (API, crawler, etc.), and returns structured results.
- The LLM reads those results and writes a grounded answer.
In other words, the LLM becomes an orchestrator: it decides when to search and what to search, but your code controls how and where the search actually happens.
Step 1: Choose a Web Search Backend
You first need a way to actually search the web. You have three main options.
1. Use a commercial search API
These are search engines exposed via HTTP+JSON. Common examples include:
- Google Custom Search API
- Bing Web Search API
- Metasearch and “for LLMs” APIs (SerpAPI, Tavily, Parallel Search, Firecrawl Websearch, etc.)
Advantages:
- They handle crawling, ranking, deduplication, and language support.
- You just send a query string and get back structured results.
For most projects, this is the easiest and most robust starting point.
2. Use an LLM‑optimized search API
Some providers focus specifically on LLM use‑cases and return:
- Clean snippets instead of heavy HTML.
- Additional metadata like scores, categories, or suggested queries.
This can reduce how much processing you have to do before passing data into the model.
3. Build your own crawler and index
This is only worth it if you:
- Need to search a specialized corpus at scale.
- Need full control over ranking and freshness.
- Are willing to maintain crawling and indexing infrastructure.
If you’re just starting, avoid this route and use an existing API.
Step 2: Implement a web_search Function in Your Backend
Once you pick a search provider, write a simple function that:
- Accepts a query string (and optional limit like
max_results). - Calls the search API over HTTP.
- Normalizes results into a consistent structure (title, snippet, URL).
Here’s a conceptual Python example (you’d adapt the endpoint and fields):
Python
You now have a reusable building block that can be called by any part of your system—not just the LLM.
If you want deeper grounding, you can extend this by:
- Fetching the HTML at the top N URLs.
- Extracting text (e.g., with a boilerplate remover).
- Summarizing or chunking content before feeding it to the model.
Step 3: Expose web_search as a Tool to the LLM
Modern LLMs support some form of tool or function calling. You describe your tool’s name, purpose, and input schema, then the model can decide when to use it.
A typical tool description looks like this (conceptually):
Python
Your interaction pattern then becomes:
- Send user messages plus this
toolslist to the model. - Inspect the model’s response:
- If it returns a normal answer, you’re done.
- If it returns a tool call named
web_search, parse the arguments.
- Call your backend
web_search()with those arguments. - Send the tool result back to the model as an additional message.
- Ask the model to generate a final answer using the tool output.
This separates decision‑making (the LLM) from execution (your code).
Step 4: Build a Simple Agent Loop
Let’s pull these ideas together into a minimal “agent loop” in pseudo‑Python:
Python
This pattern gives you a lot of flexibility:
- To change search providers, you only edit
web_search(). - To add more tools (database lookups, internal APIs, etc.), you extend the tool list and handler.
- To implement guardrails, you intercept and sanitize tool outputs before they go back into
messages.
Step 5: Adapting to Different Types of LLMs
How you wire this up depends on what “new LLM” you’re using.
Hosted LLMs with native tool calling
Many hosted models already support a tools/functions interface. In that case:
- Use their documented schema for tool descriptions.
- Register your
web_searchtool with the model. - Implement the agent loop as in the earlier example.
The main work is in your web_search implementation and how you format tool output messages.
Local models via serving frameworks
If you run a local model via a server or framework that supports tool calling, the overall pattern is almost identical. You still:
- Define a JSON schema for tools.
- Parse tool calls from the model’s output.
- Execute them in your backend.
The only thing that changes is how you send prompts and receive responses.
Bare models without tool support
If your model doesn’t support tools at all, you can still do this with a ReAct‑style protocol:
- In your system prompt, teach the model to write commands like:
SEARCH[what is the latest news about X?]
- When your backend sees
SEARCH[...]in the model’s text:- Extract the query.
- Call
web_search(query). - Append the results to the conversation as a new message, e.g.:
- “Search results: … (snippets, URLs, etc.)”
- Ask the model to continue, now that it sees the search results.
It’s more manual than built‑in tools, but the logic is the same.
Step 6: Useful Enhancements
Once the basic web search integration works, you can iterate on quality and cost.
Some practical upgrades:
-
Reranking results
Use an embedding model to rerank snippets by semantic similarity to the user’s question before passing them to the LLM. -
Content trimming
Long web pages quickly blow up your context window. Summarize or chunk content, and only send the most relevant excerpts. -
Caching and rate‑limiting
Cache search results for frequent queries and set hard limits on searches per conversation or per user. -
Domain control
Restrict the model to trusted sources, either by whitelist (e.g., docs, reputable news sites) or blacklist.
Putting It All Together
To give your AI agent custom web search instead of relying on a platform’s built‑ins:
- Pick a web search backend (API or your own index).
- Implement a
web_search(query, max_results)function in your code. - Expose it as a tool/function to your LLM with a clear schema.
- Write an agent loop that:
- Lets the LLM request web search.
- Executes the search.
- Feeds results back for a final, grounded answer.
- Iterate with reranking, caching, and domain restrictions as your use‑case grows.












