How do I ask an AI to web search for me?

Most LLM products with “browse” or “online” modes hide their web search stack behind proprietary infrastructure. That’s convenient, but it also means:

You have no control over which search engine or sources are used.
You can’t customize ranking, filtering, or post‑processing.
You’re locked into a specific vendor’s behavior and pricing.

By building your own web search integration, you can:

Swap between Google, Bing, SerpAPI, Tavily, Parallel, or your own crawler.
Enforce domain allow‑lists/deny‑lists and security rules.
Tune how much and what kind of web content the model sees.

The core idea is simple: the LLM doesn’t “browse” the web by itself. Instead, it calls a tool you define—typically named something like web_search—and your backend handles the actual HTTP requests and data cleaning.

High-Level Architecture

At a high level, a custom web search workflow looks like this:

The user asks a question.
The LLM decides whether it needs fresh, external information.
If it does, it calls a web_search tool with a query string.
Your backend receives this tool call, runs your own search logic (API, crawler, etc.), and returns structured results.
The LLM reads those results and writes a grounded answer.

In other words, the LLM becomes an orchestrator: it decides when to search and what to search, but your code controls how and where the search actually happens.

Step 1: Choose a Web Search Backend

You first need a way to actually search the web. You have three main options.

1. Use a commercial search API

These are search engines exposed via HTTP+JSON. Common examples include:

Google Custom Search API
Bing Web Search API
Metasearch and “for LLMs” APIs (SerpAPI, Tavily, Parallel Search, Firecrawl Websearch, etc.)

Advantages:

They handle crawling, ranking, deduplication, and language support.
You just send a query string and get back structured results.

For most projects, this is the easiest and most robust starting point.

2. Use an LLM‑optimized search API

Some providers focus specifically on LLM use‑cases and return:

Clean snippets instead of heavy HTML.
Additional metadata like scores, categories, or suggested queries.

This can reduce how much processing you have to do before passing data into the model.

3. Build your own crawler and index

This is only worth it if you:

Need to search a specialized corpus at scale.
Need full control over ranking and freshness.
Are willing to maintain crawling and indexing infrastructure.

If you’re just starting, avoid this route and use an existing API.

Step 2: Implement a `web_search` Function in Your Backend

Once you pick a search provider, write a simple function that:

Accepts a query string (and optional limit like max_results).
Calls the search API over HTTP.
Normalizes results into a consistent structure (title, snippet, URL).

Here’s a conceptual Python example (you’d adapt the endpoint and fields):

Python

You now have a reusable building block that can be called by any part of your system—not just the LLM.

If you want deeper grounding, you can extend this by:

Fetching the HTML at the top N URLs.
Extracting text (e.g., with a boilerplate remover).
Summarizing or chunking content before feeding it to the model.

Step 3: Expose `web_search` as a Tool to the LLM

Modern LLMs support some form of tool or function calling. You describe your tool’s name, purpose, and input schema, then the model can decide when to use it.

A typical tool description looks like this (conceptually):

Python

Your interaction pattern then becomes:

Send user messages plus this tools list to the model.
Inspect the model’s response:
- If it returns a normal answer, you’re done.
- If it returns a tool call named web_search, parse the arguments.
Call your backend web_search() with those arguments.
Send the tool result back to the model as an additional message.
Ask the model to generate a final answer using the tool output.

This separates decision‑making (the LLM) from execution (your code).

Step 4: Build a Simple Agent Loop

Let’s pull these ideas together into a minimal “agent loop” in pseudo‑Python:

Python

This pattern gives you a lot of flexibility:

To change search providers, you only edit web_search().
To add more tools (database lookups, internal APIs, etc.), you extend the tool list and handler.
To implement guardrails, you intercept and sanitize tool outputs before they go back into messages.

Step 5: Adapting to Different Types of LLMs

How you wire this up depends on what “new LLM” you’re using.

Hosted LLMs with native tool calling

Many hosted models already support a tools/functions interface. In that case:

Use their documented schema for tool descriptions.
Register your web_search tool with the model.
Implement the agent loop as in the earlier example.

The main work is in your web_search implementation and how you format tool output messages.

Local models via serving frameworks

If you run a local model via a server or framework that supports tool calling, the overall pattern is almost identical. You still:

Define a JSON schema for tools.
Parse tool calls from the model’s output.
Execute them in your backend.

The only thing that changes is how you send prompts and receive responses.

Bare models without tool support

If your model doesn’t support tools at all, you can still do this with a ReAct‑style protocol:

In your system prompt, teach the model to write commands like:
- SEARCH[what is the latest news about X?]
When your backend sees SEARCH[...] in the model’s text:
- Extract the query.
- Call web_search(query).
- Append the results to the conversation as a new message, e.g.:
  - “Search results: … (snippets, URLs, etc.)”
Ask the model to continue, now that it sees the search results.

It’s more manual than built‑in tools, but the logic is the same.

Step 6: Useful Enhancements

Once the basic web search integration works, you can iterate on quality and cost.

Some practical upgrades:

Reranking results
Use an embedding model to rerank snippets by semantic similarity to the user’s question before passing them to the LLM.
Content trimming
Long web pages quickly blow up your context window. Summarize or chunk content, and only send the most relevant excerpts.
Caching and rate‑limiting
Cache search results for frequent queries and set hard limits on searches per conversation or per user.
Domain control
Restrict the model to trusted sources, either by whitelist (e.g., docs, reputable news sites) or blacklist.

Putting It All Together

To give your AI agent custom web search instead of relying on a platform’s built‑ins:

Pick a web search backend (API or your own index).
Implement a web_search(query, max_results) function in your code.
Expose it as a tool/function to your LLM with a clear schema.
Write an agent loop that:
- Lets the LLM request web search.
- Executes the search.
- Feeds results back for a final, grounded answer.
Iterate with reranking, caching, and domain restrictions as your use‑case grows.

Create your AI Agent

Automate customer interactions in just minutes with your own AI Agent.

Get started for free Chat with AI for fun

Featured posts

Deep Learning Fuels Next-Gen Humanoids

Deep learning is changing the way we build humanoids, making them smarter, more adaptable, and closer to human-like behavior than ever before. This branch of artificial intelligence uses neural networks to process vast amounts of data, enabling machines to learn and improve on their own. As a result, the latest generation of humanoids is stepping out of science fiction and into reality, with abilities that surprise even their creators. Let’s explore how deep learning is shaping these advanced robots.

Automate Your Customer Service with AskHandle's New Free Plan

Customer service now stands as the frontline in maintaining client satisfaction. Yet, the traditional human-operated support system is riddled with challenges: high costs, inconsistent service, and the ever-daunting issue of scalability. In this scenario, AskHandle arrives — an innovative AI chatbot reshaping the domain of customer engagements, now made even more attainable with the introduction of its free plan for newcomers!

What Would Happen If Your WhatsApp Could Answer Guests While You Sleep?

Every summer, the same chaos unfolds for vacation rental landlords across Europe. A family from Berlin lands in Lisbon and can't find the key lockbox. A couple in Mallorca messages at midnight asking how to work the air conditioning. A group booking in the Algarve needs an early check-in and has been waiting 48 hours for a reply. Meanwhile, the landlord — managing three properties, coordinating a cleaner, and trying to enjoy their own summer — is drowning in a backlog of unanswered messages. The properties are beautiful. The reviews, however, are starting to tell a different story. And the fix isn't a bigger team or a fancier property management system. For most European landlords, it starts with something already sitting on every traveller's phone: WhatsApp.

How Much Electricity Does an AI GPU Actually Cost?

As artificial intelligence (AI) grows rapidly, questions about the cost and environmental impact of training and deploying models have become critical. A key factor is the energy consumption of AI-specific hardware, particularly high-performance Graphics Processing Units (GPUs). These components, designed to handle the massive parallel calculations required for deep learning, require significant electricity, raising valid concerns about sustainability and the economic barrier to entry in AI development.

The Critical Role of Upfront Capex and Ongoing Opex in Software Development

Planning a successful software project extends far beyond the scope of writing code. It demands a meticulous financial strategy that accounts for all costs over the project's entire lifecycle. These expenditures are broadly classified into two fundamental categories: upfront Capital Expenditures (Capex) and ongoing Operational Expenditures (Opex). A thorough understanding of both is paramount for organizations to forecast budgets with precision, make informed strategic decisions, and ultimately ensure the project's long-term viability.

Is it possible to use CPU to do GPU's work in theory?

In the world of computers, the Central Processing Unit (CPU) and Graphics Processing Unit (GPU) serve different purposes. CPUs handle general tasks, running the operating system, executing applications, and managing input/output operations. GPUs, on the other hand, are specialized for parallel processing tasks like rendering graphics or performing complex calculations in scientific computing. This article explores whether, in theory, a CPU can take over the responsibilities of a GPU.

What Is Regex, and What Are Its Basic Syntax?

Regex, short for "regular expression," is a powerful tool used in programming and data processing to search, match, and manipulate text. It provides a concise and flexible way to describe search patterns. This article explores what regex is and introduces its fundamental syntax.

10 Tips to Own Your Morning and Elevate Your Life

Mornings can set the tone for the entire day. The way you start your morning can greatly influence your mood, productivity, and overall well-being. Here are ten practical tips to help you take charge of your mornings and uplift your life.

Achieve more with AI

Enhance your customer experience with an AI Agent today. Easy to set up, it seamlessly integrates into your everyday processes, delivering immediate results.

Try for free Get a demo

Latest posts

AskHandle Blog

Ideas, tips, guides, interviews, industry best practices, and news.

• March 30, 2026

What Is Threshold-Based Overage Billing?

Threshold-based overage billing is a payment method used when a customer goes past the amount included in a plan, but instead of charging every tiny extra unit right away, the system waits until usage crosses a set billing point, called a threshold. This method is common in software, cloud services, messaging platforms, data tools, and subscription products that mix fixed plans with variable usage. For customers, it can make bills feel more organized and less noisy. For businesses, it can reduce the number of very small charges and create a cleaner billing process.

OverageBillingUsage-based

• April 29, 2025

Can I build a software without using any cloud services?

Creating software without relying on cloud services is possible, but it has some important considerations. Many developers think about using cloud platforms for ease and scalability, but it is not a requirement. You can build, run, and maintain software entirely on your own hardware. This article explains how to build software without cloud services and the pros and cons of such an approach.

CloudSoftwareDevelopers

• April 19, 2025

Which App Development Tool Should You Use?

Want to build an app but don’t know which tool to use? Whether you’re targeting iOS, Android, or both, the right software can make a big difference—especially for beginners. Here are some top options to get you started.

XcodeAndroid StudioApp

View all posts