Which Regex Turns Plain-text URLs into Clickable HTML Links?
Turning plain-text links into clickable HTML anchors is one of those tasks that looks simple until you run into punctuation, parentheses, query strings, and edge cases like already-linked URLs. Regex can help a lot, but the “best” pattern depends on what you consider a valid URL, what inputs you expect, and how safe you need the output to be.
What you’re trying to match (and what you’re not)
A practical “linkify” regex usually targets a few common forms:
- URLs with a scheme:
https://example.com,http://example.com/path?x=1#frag - URLs that start with
www.:www.example.com - Email addresses (optional):
[email protected]
Many teams avoid supporting naked domains like example.com because they create false positives (version1.2.3, some.file.txt). If you must support naked domains, handle it carefully and accept that you may link text you didn’t mean to.
Also decide early: should you linkify inside existing HTML? If your input may already contain <a href="...">, regex-only approaches can double-wrap links and break markup. If there’s any chance of HTML, parse it first and only linkify text nodes.
A solid baseline regex for http/https
If you only want to match explicit http:// and https:// URLs, this is a good starting point:
Regex
What it does:
\bhttps?://matcheshttp://orhttps://at a word boundary.[^\s<>"']+eats characters until whitespace or a character that often ends an attribute or tag.- The final character class
[^\s<>"'.,;:!?)]tries to avoid grabbing trailing punctuation like.or)that’s often adjacent in prose.
This pattern won’t be perfect for every case, but it covers most URLs found in text.
Quick replacement pattern (HTML anchor)
In many regex engines you can replace the match with:
Html
\\$& means “the entire match” (some engines use \0 or \\$0). Adjust based on your language.
Supporting www. links (and adding a scheme)
People often type www.example.com without https://. You can match those too, and prepend a scheme in the href.
A common approach is to use two alternatives: one for scheme URLs, one for www.
Regex
Then, in replacement logic, if the match starts with www., use https:// in the href. Many languages allow conditional replacements only via code, so you typically do this with a function:
- visible text: the match as-is
- href:
matchif it already starts withhttp, otherwisehttps://+ match
If you must do it in pure regex replacement (engine-dependent), it gets messy and not portable, so code is usually cleaner.
Handling parentheses and trailing punctuation better
Text frequently includes URLs wrapped in parentheses:
(https://example.com/path)See https://en.wikipedia.org/wiki/Title_(something).
A regex that refuses to end with ) helps, but it can also incorrectly strip a legitimate closing parenthesis that belongs to the URL. A more careful strategy:
- Match broadly.
- Trim trailing punctuation in post-processing:
.,;:!?and sometimes)if it’s unmatched.
In code, after a broad match like:
Regex
You can strip trailing punctuation with a small loop:
- While the last character is in
.,;:!?remove it. - If the last character is
)and the URL has more)than(, remove it. (This heuristic handles the “wrapped in parentheses” case while keeping balanced URL parentheses.)
Regex alone can’t easily count balanced parentheses, so this hybrid approach tends to behave better.
Avoiding matches inside HTML attributes
If the input might contain HTML, the safest method is: parse HTML, walk text nodes, linkify only their text. If you still want a regex-only guardrail for plain text that might include fragments like <a href="...">, you can reduce collateral damage by rejecting matches preceded by =" or similar, but this is fragile.
A simple defensive pattern for plain text contexts is to treat < and > as boundaries (already shown in the classes above). This prevents the match from bleeding into tags, but won’t stop double-linking inside attributes.
Email addresses (optional)
If you want to linkify emails to mailto::
Regex
Use case-insensitive mode. Replacement:
Html
Be cautious with trailing punctuation again (emails at end of sentence). The \b helps, but commas and periods still appear right after an email in prose. You may want to apply the same punctuation-trimming rule as for URLs.
A practical “one regex” option (URLs only)
If you want one reasonably safe pattern that catches http(s) and www and avoids common trailing punctuation:
Regex
Notes:
- It allows many characters that appear in real URLs:
?,#,&,%,=,/,-,_,.. - It avoids grabbing obvious closers at the end.
- It still won’t validate the domain; it’s a linker, not a validator.
Output safety: escaping and allowed protocols
When you convert text into HTML, treat the URL as untrusted input:
- Escape the visible text to avoid injecting HTML.
- Escape the attribute value too (quotes matter).
- Restrict protocols. If you accept arbitrary schemes, someone can input
javascript:alert(1)and you’ll build a dangerous link. Many linkifiers allow onlyhttp,https, and maybemailto.
If you’re auto-prepending https:// to www. links, that also helps avoid weird schemes.
Suggested approach: regex + small cleanup
For most apps, the best results come from a two-step routine:
- Use a broad-but-reasonable regex to find candidates:
\b(?:https?:\/\/|www\.)[^\s<>"']+
- Post-process each match:
- trim trailing punctuation
- fix
hrefby addinghttps://forwww. - escape output properly
- optionally add
rel="noopener noreferrer"andtarget="_blank"if you open new tabs
Regex gets you 90% of the way; a little code handles the human writing patterns that regex alone tends to fumble.












