Phishing & DNS

The Domain You Saw Wasn't the Domain You Got: Homograph Attacks Explained

May 13, 2026 9 min read Haven Team

In 2017, a security researcher registered the domain аррӏе.com and produced a working SSL certificate for it. Read it carefully — every character is Cyrillic, not Latin. In most browsers at the time, it rendered indistinguishably from apple.com. The class of attack hasn't gone away; it's just gotten quieter.


Domain names are supposed to be the human-readable layer of the internet's addressing system. They're how you check whether you've actually landed on your bank's site instead of an imitation. That trust assumption is more fragile than most users realize, because of a quirk of how internationalized domain names are encoded.

The vulnerability is called an IDN homograph attack — or more colloquially, Punycode phishing. Understanding it requires a brief tour through how non-Latin characters got into DNS in the first place.

How IDN Came to Exist

DNS was originally specified to handle a restricted character set: letters a-z, digits 0-9, and hyphen — the so-called LDH rule. That worked fine for English speakers and badly for everyone else. By the early 2000s, the IETF recognized the need to support domain names in scripts like Chinese, Arabic, Cyrillic, and Devanagari.

The solution, finalized in RFC 3490 (2003) and refined in RFC 5891 (2010), was clever. Rather than change the DNS protocol itself, IDN defines an encoding layer. Domain names containing non-ASCII characters are translated, at the application boundary, into an ASCII-compatible form called Punycode. The actual DNS lookup happens with the Punycode form. The display happens with the Unicode form.

For example, the domain москва.рф (Moscow.rf in Cyrillic) becomes xn--80adxhks.xn--p1ai in Punycode. Both forms refer to the same record in DNS, but only the Punycode form actually travels over the wire.

The Attack

The attack works because Unicode contains many characters that are visually indistinguishable from Latin letters but are technically different code points. A few examples:

An attacker registers a domain like аpple.com where the first character is Cyrillic. In Punycode, this becomes xn--pple-43d.com — clearly a different domain from apple.com. But when displayed in Unicode form, it's identical.

The 2017 demonstration by researcher Xudong Zheng went further: a domain composed entirely of Cyrillic letters that happen to look exactly like the Latin letters in "apple." Modern browsers responded to that demonstration with stricter display policies, but the underlying attack class remains live.

Why this is worse than typo-squatting

Typo-squatted domains like arnazon.com (rn instead of m) at least look slightly off if you read carefully. Homograph domains can be pixel-perfect identical to the original. There is no version of "reading carefully" that defends you.

What Browsers Do About It

Modern browsers implement a series of heuristics to decide when to display a domain in Unicode form versus its raw Punycode form. The general logic is:

These policies vary by browser and by version. Chrome's policy is documented in their IDN Policy and has been tightened multiple times. Firefox uses the network.IDN.show_punycode preference; Safari has its own internal logic. Mobile browsers, especially in-app browsers, are often less restrictive than desktop versions.

The TLD complication

Some TLD registries have their own restrictions on what scripts they accept. The .com registry, for instance, permits IDN registrations under a restrictive policy. Some country-code TLDs accept any script. Some refuse mixed scripts. The level of homograph protection you get depends partially on which TLD you're looking at.

Where Homograph Attacks Actually Land

Browser address bars get most of the attention, but they're not where homograph attacks tend to succeed in practice. The riskier surfaces are:

Defenses That Actually Work

Defense Effectiveness
Browser IDN display policies Strong against pure-Cyrillic and mixed-script domains in major browsers
Password managers' domain matching Excellent — password managers match on exact Punycode form, not visual rendering. If your password manager doesn't auto-fill, that's a warning sign.
Hardware security keys (WebAuthn) Excellent — origin binding means the key won't authenticate to a different domain regardless of visual similarity
"Just read the URL carefully" Fails by design — that's exactly the assumption homograph attacks break
Certificate transparency monitoring Useful for brand protection (your company can watch for homograph registrations of its trademarks) but not for end-user defense

Practical Recommendations

For individual users:

For organizations:

The general lesson is older than IDN: any time a system separates "how something is stored or transmitted" from "how it is displayed to a human," the gap becomes attackable. Cryptographic identity binding — passwords, hardware keys, certificate pinning — is the only reliable defense, because it operates on the stored form, not the display form.

Where Haven Fits

Haven's identity model uses Matrix-style IDs (@haven_username:havenmessenger.com) for chat, which are subject to the same homograph risks as any other text identifier. Our defenses are the standard ones: passkey-based authentication for the account itself, signature-key verification for contact identity, and a deliberate UI choice to display contact handles in a way that surfaces non-ASCII characters explicitly rather than relying on font rendering to mask them.

Related reading: TOFU key verification covers a parallel problem in cryptographic identity binding.

Try Haven free for 15 days

Encrypted email and chat in one app. No credit card required.

Get Started →