You can strip every byte of metadata from a document, route it through Tor, post it from a burner account, and never sign your name — and still be identified by nothing but the words. The way you deploy commas, the function words you lean on without noticing, the average length of your sentences: these form a fingerprint you carry into every text you write. The field that reads that fingerprint is called stylometry, and it has been quietly unmasking anonymous authors for a very long time.

For anyone who relies on pseudonymity — whistleblowers, dissidents, researchers publishing controversial work — stylometry is the deanonymization threat that survives perfect operational security everywhere else. You can fix a leaked IP. You cannot easily stop sounding like yourself.

What a writing fingerprint is made of

The counterintuitive part is that the most identifying features of your writing are the ones you pay the least attention to. Content words — nouns and topics — are easy to control and easy to fake. The giveaways are the structural habits below the level of conscious choice:

Why function words betray you. A determined author can change topic, vocabulary, and tone at will. What they almost never manage is to change how frequently they reach for "rather," "thus," or "actually" — these are produced automatically, below deliberate control. That stability is exactly what makes them a reliable signal across documents.

A technique older than the internet

Stylometry predates computers entirely. Its landmark demonstration came in the 1960s, when statisticians Frederick Mosteller and David Wallace settled a long-standing historical dispute over the authorship of twelve of the Federalist Papers. By analyzing the rates of unremarkable function words, they attributed the contested essays to James Madison rather than Alexander Hamilton — a conclusion historians have largely accepted ever since. The lesson was set decades before anyone worried about online anonymity: identity hides in the boring words.

What modern stylometry can do

Computation turned a historical curiosity into a scalable capability. In a widely cited 2012 study, researchers demonstrated authorship identification at internet scale — distinguishing among tens of thousands of candidate bloggers from writing samples with accuracy far beyond chance. The same logic extends past prose: a 2015 line of research showed that programmers can be de-anonymized from their source code, and even from compiled binaries, because coding style survives in structure and naming.

"Anonymity is not the absence of a name. It is the absence of any feature that links your work to you — and your style is a feature you can't take off."

The arrival of large language models cuts both ways. They sharpen attribution by extracting subtler stylistic signals from less text, and they offer a new defense by rewriting prose into a neutral or borrowed voice. Which side benefits more depends entirely on resources — and the well-resourced side is rarely the lone pseudonymous author.

Defending against stylometric attack

Adversarial stylometry — deliberately defeating authorship analysis — is hard, and honesty requires saying so. There are three recognized strategies, each with costs:

StrategyHow it worksWeakness
ObfuscationConsciously alter your habitsExhausting; leaks under pressure
ImitationMimic another author's styleHard to sustain convincingly
Translation round-tripMachine-translate out and backGarbles meaning; new artifacts

Tools built by academic privacy labs exist to flag your most identifying features so you can blunt them, but all of these methods degrade your writing and none is foolproof. The most robust defense is also the simplest and the most demanding: write less under the pseudonym, and never let the same persona accumulate a large corpus. Stylometry needs text to work; a few hundred words is far harder to attribute than a few thousand.

Where stylometry fits in your threat model

For the overwhelming majority of people, stylometry is not a concern worth losing sleep over — it is expensive, requires a corpus, and needs a plausible suspect to compare against. It becomes serious only under a specific combination: you publish pseudonymously, the content is sensitive enough to motivate a well-resourced adversary, and that adversary has samples of your known writing to match against.

If that describes you, stylometry belongs in your threat model alongside the network-level concerns in our whistleblower OPSEC guide and the broader picture in managing your OSINT footprint. Strong encryption protects the contents of what you send and metadata defenses protect the fact that you sent it — but neither touches the fingerprint inside the words themselves. That last layer is the one most people don't know to defend.

Try Haven

Haven is an encrypted messenger and email app built for people who want privacy without complexity. End-to-end encrypted, open about our design, and easy to use.

Download Haven