// an honest tool
The Real-World Email Regex
Most email-regex advice is a tour of RFC 5322 edge cases that never occur in real life. Here is the regex you should actually use, backed by analysis of 5,280,739 real addresses, plus a live tester so you can prove it on your own inputs.
^[A-Za-z0-9](?:[A-Za-z0-9-]|\.(?!\.)){0,62}[A-Za-z0-9]@(?:[A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?\.)+[A-Za-z]{2,63}$
This pattern reaches better than five-nines accuracy against consumer email addresses in the wild. Below you can test it, see the data behind it, and build the full RFC-compliant version chunk by chunk.
Live tester
Type or paste an address. The matched substring is split into its capture groups, and the verdict updates as you type. Flip the speed: Realistic runs the practical pattern that covers 99.999% of real addresses; Ludicrous runs the full RFC 5321 compliance attempt, IPv4 and IPv6 address literals included. This switch drives the tester and the test suite below.
Show the active pattern
The email regex in every language
The realistic pattern, ready to paste, in the language you are working in. Same rule everywhere: ASCII alphanumeric, single dots, and dashes in the mailbox, then a DNS domain. Two languages need real care, and the snippets below get it right.
const EMAIL = /^[A-Za-z0-9](?:[A-Za-z0-9-]|\.(?!\.)){0,62}[A-Za-z0-9]@(?:[A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?\.)+[A-Za-z]{2,63}$/;
EMAIL.test(value); // true / false
import re
EMAIL = re.compile(r"[A-Za-z0-9](?:[A-Za-z0-9-]|\.(?!\.)){0,62}[A-Za-z0-9]@(?:[A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?\.)+[A-Za-z]{2,63}")
bool(EMAIL.fullmatch(value)) # fullmatch anchors both ends
import java.util.regex.Pattern;
static final Pattern EMAIL = Pattern.compile(
"^[A-Za-z0-9](?:[A-Za-z0-9-]|\\.(?!\\.)){0,62}[A-Za-z0-9]@(?:[A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?\\.)+[A-Za-z]{2,63}$");
EMAIL.matcher(value).matches();
// Go's regexp (RE2) does not support (?!...). Forbid double dots by
// structure instead: alphanumeric runs separated by single . or -.
var email = regexp.MustCompile(`^[A-Za-z0-9]+(?:[.-][A-Za-z0-9]+)*@(?:[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*\.)+[A-Za-z]{2,63}$`)
email.MatchString(value)
$re = '/^[A-Za-z0-9](?:[A-Za-z0-9-]|\.(?!\.)){0,62}[A-Za-z0-9]@(?:[A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?\.)+[A-Za-z]{2,63}$/';
(bool) preg_match($re, $value);
using System.Text.RegularExpressions;
Regex.IsMatch(value,
@"^[A-Za-z0-9](?:[A-Za-z0-9-]|\.(?!\.)){0,62}[A-Za-z0-9]@(?:[A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?\.)+[A-Za-z]{2,63}$");
# In Ruby ^ and $ match line boundaries, so a newline can sneak an
# injection past them. Anchor with \A and \z instead.
EMAIL = /\A[A-Za-z0-9](?:[A-Za-z0-9-]|\.(?!\.)){0,62}[A-Za-z0-9]@(?:[A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?\.)+[A-Za-z]{2,63}\z/
!!(value =~ EMAIL)
Whichever language you use, the regex proves shape, not existence. Pair it with a verification email before you trust an address.
RFC 5321 supersedes RFC 5322
The famous argument that email validation by regex is futile rests on RFC 5322, which treats an email address as a generic message header. Under that reading, parenthesised comments and folded whitespace are legal even inside a domain name, and the canonical "futility" test suite marks addresses with comments and Unicode as valid.
This conflates two standards. Because SMTP is universal for email transmission, no examination of address formatting is complete without RFC 5321, and RFC 5321 is explicit: the domain portion of an address is restricted, for SMTP purposes, to a sequence of letters, digits, and hyphens drawn from the ASCII character set. RFC 5321 has the final word, and it makes the problem far simpler, especially for bounding string length.
| Aspect | RFC 5322 (message header) | RFC 5321 (SMTP address) |
|---|---|---|
| Domain characters | Comments, whitespace, broad character set | ASCII letters, digits, hyphens only |
| Comments in parentheses | Legal, even in the domain | Not part of the deliverable address |
| Folded whitespace | Legal (multi-line headers) | Not applicable to delivery |
| Practical relevance | Theoretical edge cases | What actually gets delivered |
The data: email addresses in the wild
The right measure for the practical programmer is not the standard, it is what real mailbox providers actually permit. Here is what the data says.
82% → 97% → ~100%
According to Adam Z. Wasserman's analysis of 5,280,739 real email addresses, 82% contain only ASCII alphanumeric characters, 97% contain only ASCII alphanumeric characters plus dots, and approximately 100% contain only ASCII alphanumeric characters, dots, and dashes. In other words, a further 15% add only dots, and a final 3% add only dashes.
Sample drawn from 115 million accounts. 99% confidence level, 0.055% margin of error for the general population of internet email. Source: primary analysis by Adam Z. Wasserman.
Consumer email: character distribution
Rare features in the same sample: underscores 0.00072% (38 addresses), plus signs 0.00051% (27 addresses), Unicode 0.00002% (one single address). Assuming ASCII alphanumeric, dots, and dashes gives better than five-nines accuracy for consumer email.
Business email: the top ten hosts
Of 6,771,269 companies using 91 hosting solutions, the Pareto distribution holds: 95.19% of mailboxes sit with just ten providers. The largest three all permit only ASCII letters, numbers, and dots when creating a mailbox.
Business-host data from Datanyze.
When do you actually need this?
Pick what you are building. The honest answer is often "do not lean on the regex alone."
Log-redaction playground
The real use case for email regex is mining and anonymising large volumes of unstructured text: referrer logs, exports, dumps. Paste some text below and redact every address in one pass. This is the kind of job an in-house team can declare impossible and a single compiled regex can finish on a laptop in minutes.
The browser version above is a JavaScript adaptation. It also handles the URL-encoded
%40 form of @ while skipping image filenames such as
imagefile@2x.png, the same collision the original log-mining job had to dodge.
The annotated cookbook
An email address is a mailbox, an @ delimiter, and a domain. Below are composable
chunks for each part. Toggle the ones you want and watch the assembled pattern build up. Each
chunk carries the annotation explaining what rule it enforces.
Balancing parentheses: the open challenge
The cookbook leaves one rule unsolved: parenthesised comments are legal only when the parentheses are balanced, and so are quoted strings. Can standard regex enforce that? The honest answer is layered, because it depends entirely on what you count as "standard regex."
Pure regular expressions provably cannot
Balanced parentheses to arbitrary depth is the Dyck language, which is context-free but not regular. The pumping lemma proves no finite automaton can do it: a true regular expression has no memory to count how deep it has gone. If "standard" means formally regular, this is a hard no, and it is a theorem rather than a failure of cleverness.
But real engines are not regular, and three of them can
regex module)(?<paren>\((?:[^()]|(?&paren))*\))
(?&paren) recurses into the named group, so it matches nesting to any depth. Python's
standard-library re does not support this; the third-party regex module does.
^(?:[^()]|\((?<d>)|\)(?<-d>))*(?(d)(?!))$
(?<d>) pushes on an open paren, (?<-d>) pops on a close (and fails
on an unmatched closer), and (?(d)(?!)) fails at the end if anything is still open.
\((?:[^()]|\((?:[^()])*\))*\)
Each extra level of nesting is one more nested copy. The result is genuinely regular because the depth is finite. JavaScript has no recursion and no balancing groups, so this unrolling is the only regex option in the browser.
The pragmatic answer: count, do not match
Balance-checking is the textbook case where regex is the wrong tool and a one-pass counter is trivially correct, linear time, and engine-independent. And since the data showed parenthesised comments are statistically near zero in the wild, this check rarely earns its keep at all.
function parensBalanced(s) {
var depth = 0;
for (var i = 0; i < s.length; i++) {
var c = s[i];
if (c === "\\") { i++; continue; } // skip escaped char
else if (c === "(") depth++;
else if (c === ")" && --depth < 0) return false;
}
return depth === 0;
}
Run the test suite
These cases include the contentious ones from the original "futility" suite plus a few of our own. The runner applies both live patterns to every address, and the column matching the current speed is highlighted, so you can see exactly where realistic and ludicrous part ways.
| Address | Realistic | Ludicrous | Note |
|---|
Ludicrous speed
The realistic regex stops at DNS domains because that is what 99.999% of real addresses use. But RFC 5321 also permits address literals: a bracketed IPv4 address, or the full zoo of IPv6 forms. Going to ludicrous speed means actually matching all of them. Unlike the RFC 5322 Full Monty below, this one has no POSIX classes, so it runs natively in your browser. Flip the switch and every validator on the page upgrades to it.
Try john@[192.168.1.1] or user@[IPv6:2001:db8::1] in the tester while
on ludicrous: they pass here and fail on realistic. The cost of going ludicrous is a pattern that
is dramatically larger, harder to read, and slower, in exchange for matching address forms that
essentially never appear in real signups. It is the regex equivalent of strapping a bigger engine
to a car you only ever drive to the shops.
Want to go even faster? RFC 5322 takes you beyond ludicrous, straight to plaid. That one is in the Full Monty below, and it cannot even run in a browser.
The Full Monty: gone to plaid
Beyond ludicrous lies RFC 5322 itself. For completeness, here is the assembled pattern with named subgroups for the mailbox,
single-dot rule, folded whitespace, the @ delimiter, DNS domains, and IPv4 and IPv6
address literals. It uses POSIX character classes and a folded-whitespace construct that the
JavaScript regex engine does not support, so it is shown here as copyable Python.
re.compile, or open in Regex101)import re
# Mailbox + @ delimiter + domain (DNS, IPv4, IPv6 literals).
# Drop the leading ^ if you are searching inside a longer string
# rather than validating a whole string.
EMAIL = re.compile(r"""
^(?P<mailbox>(
[a-zA-Z0-9+!\#$%&'*\-/=?_{}|~]
| (?P<singleDot>(?<!\.)(?<!^)\.(?!\.))
){1,64})
\s?(?P<atSign>(?<!-)(?<!\.)@(?!@))
(?P<domain>
(?P<dns>[[:alnum:]]([[:alnum:]\-]{0,63}\.){1,24}[[:alnum:]\-]{1,63}[[:alnum:]])
| (?P<IPv4>\[((?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\])
)$
""", re.VERBOSE)
Open the complete pattern and test data on Regex101. The full version also enumerates every IPv6 and IPv6-mapped-IPv4 form (full, compressed, and the four edge forms each), named so each subgroup can be picked apart.
No regex engine survives the jump. (Psst: you can also just type the word anywhere.)
Frequently asked questions
Can you validate an email address with regex?
Yes, for practical purposes. RFC 5321 restricts the domain to ASCII letters, digits, and hyphens, which is straightforward to match. The mailbox has more latitude, but real-world data shows that matching ASCII alphanumeric characters, dots, and dashes covers better than 99.999% of consumer addresses. What regex cannot tell you is whether the address actually exists. For that, send a verification link.
Should I use RFC 5321 or RFC 5322 for email validation?
RFC 5321. It governs SMTP, which is how email is actually transmitted, and it restricts the domain to ASCII letters, digits, and hyphens. RFC 5322 describes generic message headers and permits comments and folded whitespace that never appear in deliverable addresses.
What email regex should I actually use?
For signup, contact forms, and log parsing, match ASCII alphanumeric characters, single dots, and
dashes in the mailbox, an @, and a DNS-parsable domain. That is the practical default
at the top of this page. Confirm existence with a fire-and-forget verification link rather than
trusting the string alone.
Why not just match every RFC 5322 edge case?
Because those cases almost never occur. In 5.28 million addresses, exactly one used Unicode and only 27 used a plus sign. Spending your time on quoted strings and parenthesised comments is optimising for inputs that do not exist. Ask what the simplest thing that could possibly work is, then build that.
Can regex match balanced parentheses?
Not with a formally regular expression: balanced parentheses form the Dyck language, which is
context-free, not regular, and the pumping lemma proves no finite automaton can count nesting
depth. Real engines that go beyond regular languages can: PCRE and Perl via recursion
((?&name)), and .NET via balancing groups. JavaScript has neither, so in the
browser you must either unroll to a fixed depth or, better, count with a one-pass linear scan.
See balancing parentheses above.
Is a plus sign valid in an email address?
Yes. A plus sign is legal in the mailbox under RFC 5321, and Gmail and others use it for
sub-addressing (you+tag@gmail.com). It is rare in real data (about 0.00051% of
addresses), so the strict realistic pattern omits it; if you support plus-addressing, add
+ to the mailbox character class, or switch the tester to ludicrous speed to see
it accepted.
Are email addresses case sensitive?
The domain is case insensitive. The mailbox is technically case sensitive per RFC 5321, but in practice nearly every provider treats it case insensitively. Validate both cases; when storing and comparing, lowercase the domain, and pragmatically the whole address.
What is the best email validation regex in JavaScript?
Use the realistic pattern and call .test(). It matches ASCII alphanumeric, single
dots, and dashes, which covers better than 99.999% of real addresses. Copy it (and the Python,
Java, Go, PHP, C#, and Ruby versions) from the email regex in every
language above.
How do I redact emails from large log files quickly?
Compile a single regex and apply it across the file in one pass rather than iterating character by character. Try it in the redaction playground above. A compiled pattern can process hundreds of millions of lines on a laptop in minutes.