In 2017, a security researcher named Hanno Böck uploaded a series of packages to PyPI with names like urllib, bzip, and setuptool — each one a near-miss of a popular library. The packages did nothing malicious; they just phoned home and counted. Over a few weeks, those typo packages were downloaded tens of thousands of times. Many of those downloads came from inside what looked like real CI systems and developer machines.
The lesson was uncomfortable: an entire class of attack was sitting open in the most-used software supply chain on earth, and the cost to defenders was approximately zero.
What Typosquatting Actually Is
A typosquat is a package uploaded to a registry with a name designed to be confused with a legitimate one. The variations are predictable:
- Single-character substitution:
requetsforrequests,colormaforcolorama. - Singular vs plural:
requestvsrequests. - Hyphen vs underscore:
python-dateutilvspython_dateutil. - Scoped vs unscoped:
@lodash/lodashvslodashon npm. - Common transposition:
cross-envvscrossenv. - Brand reuse: legitimate-looking names that resemble a corporate or framework brand (
azure-storage-uploaderwhen no such official package exists).
The attack succeeds whenever a developer, a CI system, or a copy-pasted README contains the wrong name. The wrong name resolves to the attacker's package. The attacker's setup.py, package.json install hook, or post-install script executes — on a developer laptop, in a CI runner, or in production if the package is shipped into a container image.
What the Attacker Gets
Package managers execute code at install time. That gives a malicious typosquat:
- Read access to whatever environment variables exist on the machine — including cloud credentials, AWS keys, GitHub tokens, Slack hooks.
- The ability to write to the developer's home directory — SSH keys, browser profiles, password manager databases.
- Network egress, often unrestricted — exfiltration to any URL.
- The chance to inject malicious code into the project itself, so it persists after install.
The 2018 event-stream incident — though technically a maintainer-takeover attack rather than a typosquat — illustrated what's possible. A maliciously published version of a popular npm package included code targeting a specific Bitcoin wallet application, attempting to exfiltrate private keys from any environment that built it. The package was a transitive dependency, so most affected projects had no direct relationship with it.
Dependency Confusion: The Even Worse Variant
In 2021, researcher Alex Birsan published a paper demonstrating dependency confusion — an attack closely related to typosquatting that doesn't require a typo at all.
Many companies use internal package names (e.g. my-corp-auth-lib) hosted on private registries. Their package managers, when faced with a package name, often search both private and public registries — and prefer the higher version number, wherever it comes from.
Birsan uploaded packages with internal-sounding names — names he'd seen in leaked manifests on GitHub or in public job postings — to the public PyPI and npm registries, with very high version numbers. Builds at Microsoft, Apple, PayPal, Tesla, Yelp, Uber, and several dozen other companies pulled his packages instead of their internal versions. He earned over $130,000 in bounties for what was essentially a name-collision attack.
Package managers were designed to resolve names to the newest version available. They were not designed to answer the question, "is this package from the source we expect?" The default trust model assumes name uniqueness across a single global namespace — an assumption that doesn't survive contact with private registries and corporate naming.
Why It Persists
Registries have invested in detection. npm and PyPI both run automated scanners that flag suspicious packages based on names, install-time behavior, and reputation. Many obvious typosquats are taken down within days. But a few realities make complete prevention hard:
The defender's gap
For every legitimate package, there are dozens of plausibly confusable names. Defending all of them preemptively is impractical, and registries are reluctant to lock down namespace policy in ways that would frustrate legitimate developers.
The economic asymmetry
Uploading a malicious package is cheap. Detection, takedown, and remediation are expensive. The attacker only needs one or two installs to succeed; the defender needs every install to be safe.
The supply chain is deep
The average modern application transitively depends on hundreds, sometimes thousands, of packages. Even a security-conscious developer cannot review them all. A typosquat in a transitive dependency is invisible to the project's direct contributors.
What Actually Defends Against It
| Defense | Approach |
|---|---|
| Lockfiles | package-lock.json, poetry.lock, Cargo.lock: pin every transitive dependency to exact versions and hashes. Once committed, the same names always resolve to the same content. |
| Hash verification | pip's --require-hashes, npm's --audit-signatures: reject any package whose content doesn't match the expected hash, even if the version matches. |
| Private registry mirroring | Proxy a known set of dependencies through a private registry that an attacker can't publish to. Builds only resolve against the mirror. |
| Scope reservations | For npm, register a scope for your organization (@your-org/) so all internal packages live in a namespace only your team can publish to. |
| Dependency review | GitHub's Dependabot, Snyk, OSV-Scanner, Socket.dev — automated review of every dependency change that flags new packages, suspicious patterns, and known malicious uploads. |
| Disable install scripts | npm install --ignore-scripts stops package post-install hooks from running. Breaks some legitimate packages; closes the most direct attack channel. |
A Note on Reproducible Builds
Reproducible builds are a longer-term answer to the same family of problems. If the same source produces a byte-identical artifact every time, regardless of who builds it, then any divergence from the expected hash is a signal that something has been tampered with — including a typosquatted dependency that replaced legitimate code.
Reproducibility doesn't prevent the initial compromise, but it makes detection vastly cheaper. Combined with signing and provenance tooling like SLSA, it shifts the supply chain from "we trust the registry" to "we verify the artifact end to end."
The Practical Habits
For individual developers and small teams:
- Commit lockfiles, and treat lockfile diffs in code review as security-sensitive.
- Pin versions exactly in production manifests; don't use unconstrained
^ranges for security-sensitive code. - When you copy an install command from a tutorial, sanity-check the package name against its homepage or the registry's official page.
- Use
npm install --ignore-scriptswhen bringing in a brand-new package for the first time, and inspect what's inside. - If your IDE autocompletes a package name that doesn't quite look right, slow down. Many typosquats land precisely because an autocomplete picked the wrong entry.
The Quiet Lesson
Package registries are the most trusted infrastructure in modern software development, and they were not designed with active adversarial publishing in mind. Each registry has accumulated mitigations, but the fundamental model — globally unique names, run code at install time, fetch the newest version — was set decades ago and is difficult to change without breaking everything.
Typosquatting persists because it exploits the model itself, not any specific implementation bug. Until the model evolves — toward verified provenance, signed releases, and namespaces that map cleanly to organizational identity — the burden remains on every developer and every CI system to be a little more careful than the tools demand.