Scale customer reach and grow sales with AskHandle chatbot

Which regex turns plain-text URLs into clickable HTML links?

Turning plain-text links into clickable HTML anchors is one of those tasks that looks simple until you run into punctuation, parentheses, query strings, and edge cases like already-linked URLs. Regex can help a lot, but the “best” pattern depends on what you consider a valid URL, what inputs you expect, and how safe you need the output to be.

image-1
Written by
Published onMarch 21, 2026
RSS Feed for BlogRSS Blog

Which Regex Turns Plain-text URLs into Clickable HTML Links?

Turning plain-text links into clickable HTML anchors is one of those tasks that looks simple until you run into punctuation, parentheses, query strings, and edge cases like already-linked URLs. Regex can help a lot, but the “best” pattern depends on what you consider a valid URL, what inputs you expect, and how safe you need the output to be.

What you’re trying to match (and what you’re not)

A practical “linkify” regex usually targets a few common forms:

  • URLs with a scheme: https://example.com, http://example.com/path?x=1#frag
  • URLs that start with www.: www.example.com
  • Email addresses (optional): [email protected]

Many teams avoid supporting naked domains like example.com because they create false positives (version1.2.3, some.file.txt). If you must support naked domains, handle it carefully and accept that you may link text you didn’t mean to.

Also decide early: should you linkify inside existing HTML? If your input may already contain <a href="...">, regex-only approaches can double-wrap links and break markup. If there’s any chance of HTML, parse it first and only linkify text nodes.

A solid baseline regex for http/https

If you only want to match explicit http:// and https:// URLs, this is a good starting point:

Regex

What it does:

  • \bhttps?:// matches http:// or https:// at a word boundary.
  • [^\s<>"']+ eats characters until whitespace or a character that often ends an attribute or tag.
  • The final character class [^\s<>"'.,;:!?)] tries to avoid grabbing trailing punctuation like . or ) that’s often adjacent in prose.

This pattern won’t be perfect for every case, but it covers most URLs found in text.

Quick replacement pattern (HTML anchor)

In many regex engines you can replace the match with:

Html

\\$& means “the entire match” (some engines use \0 or \\$0). Adjust based on your language.

People often type www.example.com without https://. You can match those too, and prepend a scheme in the href.

A common approach is to use two alternatives: one for scheme URLs, one for www.

Regex

Then, in replacement logic, if the match starts with www., use https:// in the href. Many languages allow conditional replacements only via code, so you typically do this with a function:

  • visible text: the match as-is
  • href: match if it already starts with http, otherwise https:// + match

If you must do it in pure regex replacement (engine-dependent), it gets messy and not portable, so code is usually cleaner.

Handling parentheses and trailing punctuation better

Text frequently includes URLs wrapped in parentheses:

  • (https://example.com/path)
  • See https://en.wikipedia.org/wiki/Title_(something).

A regex that refuses to end with ) helps, but it can also incorrectly strip a legitimate closing parenthesis that belongs to the URL. A more careful strategy:

  1. Match broadly.
  2. Trim trailing punctuation in post-processing: .,;:!? and sometimes ) if it’s unmatched.

In code, after a broad match like:

Regex

You can strip trailing punctuation with a small loop:

  • While the last character is in .,;:!? remove it.
  • If the last character is ) and the URL has more ) than (, remove it. (This heuristic handles the “wrapped in parentheses” case while keeping balanced URL parentheses.)

Regex alone can’t easily count balanced parentheses, so this hybrid approach tends to behave better.

Avoiding matches inside HTML attributes

If the input might contain HTML, the safest method is: parse HTML, walk text nodes, linkify only their text. If you still want a regex-only guardrail for plain text that might include fragments like <a href="...">, you can reduce collateral damage by rejecting matches preceded by =" or similar, but this is fragile.

A simple defensive pattern for plain text contexts is to treat < and > as boundaries (already shown in the classes above). This prevents the match from bleeding into tags, but won’t stop double-linking inside attributes.

Email addresses (optional)

If you want to linkify emails to mailto::

Regex

Use case-insensitive mode. Replacement:

Html

Be cautious with trailing punctuation again (emails at end of sentence). The \b helps, but commas and periods still appear right after an email in prose. You may want to apply the same punctuation-trimming rule as for URLs.

A practical “one regex” option (URLs only)

If you want one reasonably safe pattern that catches http(s) and www and avoids common trailing punctuation:

Regex

Notes:

  • It allows many characters that appear in real URLs: ?, #, &, %, =, /, -, _, ..
  • It avoids grabbing obvious closers at the end.
  • It still won’t validate the domain; it’s a linker, not a validator.

Output safety: escaping and allowed protocols

When you convert text into HTML, treat the URL as untrusted input:

  • Escape the visible text to avoid injecting HTML.
  • Escape the attribute value too (quotes matter).
  • Restrict protocols. If you accept arbitrary schemes, someone can input javascript:alert(1) and you’ll build a dangerous link. Many linkifiers allow only http, https, and maybe mailto.

If you’re auto-prepending https:// to www. links, that also helps avoid weird schemes.

Suggested approach: regex + small cleanup

For most apps, the best results come from a two-step routine:

  1. Use a broad-but-reasonable regex to find candidates:
    • \b(?:https?:\/\/|www\.)[^\s<>"']+
  2. Post-process each match:
    • trim trailing punctuation
    • fix href by adding https:// for www.
    • escape output properly
    • optionally add rel="noopener noreferrer" and target="_blank" if you open new tabs

Regex gets you 90% of the way; a little code handles the human writing patterns that regex alone tends to fumble.

RegexURLLinks
Create your AI Agent

Automate customer interactions in just minutes with your own AI Agent.

Featured posts

Subscribe to our newsletter

Achieve more with AI

Enhance your customer experience with an AI Agent today. Easy to set up, it seamlessly integrates into your everyday processes, delivering immediate results.