Security Analysis Of AMD Predictive Store Forwarding
AMD “Zen3” processors feature a new technology called Predictive Store Forwarding (PSF). PSF is a hardware-based micro-architectural optimization designed to improve the performance of code execution by predicting dependencies between loads and stores. Like technologies such as branch prediction, with PSF the processor “guesses” what the result of a load is likely to be, and speculatively executes subsequent instructions. In the event that the processor incorrectly speculated on the result of the load, it is designed to detect this and flush the incorrect results from the CPU pipeline.
Security research in recent years has examined the security implications of incorrect CPU speculation and how in some cases it may lead to side channel attacks. For instance, conditional branch speculation, indirect branch speculation, and store bypass speculation have been demonstrated to have the potential to be used in side-channel attacks (e.g., Spectre v1, v2, and v4 respectively).
Speculating The Entire X86-64 Instruction Set In Seconds With This One Weird Trick
As cheesy as the title sounds, I promise it cannot beat the cheesiness of the technique I’ll be telling you about in this post. The morning I saw Mark Ermolov’s tweet about the undocumented instruction reading from/writing to the CRBUS, I had a bit of free time in my hands and I knew I had to find out the opcode so I started theory-crafting right away. After a few hours of staring at numbers, I ended up coming up with a method of discovering practically every instruction in the processor using a side(?)-channel. It’s an interesting method involving even more interesting components of the processor so I figured I might as well write about it, so here it goes.
Lord of the Ring(s): Side Channel Attacks on the CPU On-Chip Ring Interconnect Are Practical
We introduce the first microarchitectural side channel attacks that leverage contention on the CPU ring interconnect. There are two challenges that make it uniquely difficult to exploit this channel. First, little is known about the ring interconnect’s functioning and architecture. Second, information that can be learned by an attacker through ring contention is noisy by nature and has coarse spatial granularity. To address the first challenge, we perform a thorough reverse engineering of the sophisticated protocols that handle communication on the ring interconnect. With this knowledge, we build a cross-core covert channel over the ring interconnect with a capacity of over 4 Mbps from a single thread, the largest to date for a cross-core channel not relying on shared memory. To address the second challenge, we leverage the fine-grained temporal patterns of ring contention to infer a victim program’s secrets. We demonstrate our attack by extracting key bits from vulnerable EdDSA and RSA implementations, as well as inferring the precise timing of keystrokes typed by a victim user.
Leaking silhouettes of cross-origin images
This is a writeup of a vulnerability I found in Chromium and Firefox that could allow a malicious page to read some parts of an image located on an origin it is not supposed to be able to access. Although technically interesting, it is quite limited in scope—I am not aware of any major websites it could’ve been used against. As of November 17th, 2020, the vulnerability has been fixed in the most recent versions of both browsers.
The time that it takes CanvasRenderingContext2D.drawImage to draw a pixel depends on whether it is fully transparent, opaque, or semi-transparent. By timing a bunch of calls to drawImage, we can reliably infer the transparency of each pixel in a cross-origin image, which is enough to, for example, read text on a transparent background, like this:
PLATYPUS With Great Power comes Great Leakage
With classical power side-channel attacks, an adversary typically attaches an oscilloscope to monitor the energy consumption of a device. Since Intel Sandy Bridge CPUs, the Intel Running Average Power Limit (RAPL) interface allows monitoring and controlling the power consumption of the CPU and DRAM in software. Hence, the CPU basically comes with its own power meter. With the current implementation of the Linux driver, every unprivileged user has access to its measurements.
Using PLATYPUS, we demonstrate that we can observe variations in the power consumption to distinguish different instructions and different Hamming weights of operands and memory loads, allowing inference of loaded values. PLATYPUS can further infer intra-cacheline control flow of applications, break KASLR, leak AES-NI keys from Intel SGX enclaves and the Linux kernel, and establish a timing-independent covert channel.
With SGX, Intel released a security feature to create isolated environments, so-called enclaves, that are secure even if the operating system is compromised. In our work, we combine PLATYPUS with precise execution control of SGX-Step. As a result, we overcome the hurdle of the limited measuring capabilities of Intel RAPL by repeatedly executing single instructions inside the SGX enclave. Using this technique, we recover RSA keys processed by mbed TLS from an SGX enclave.
Learning from LadderLeak: Is ECDSA Broken?
The paper authors were able to optimize existing attacks exploiting one-bit leakages against 192-bit and 160-bit elliptic curves. They were further able to exploit leakages of less than one bit in the same curves.
We’re used to discrete quantities in computer science, but you can leak less than one bit of information in the case of side-channels.
If “less than one bit” sounds strange, that’s probably our fault for always rounding up to the nearest bit when we express costs in computer science.
TEMPEST@Home - Finding Radio Frequency Side Channels
As the test procedures in the TEMPEST standards are rudely made unavailable to us as they are considered “classified” we have to do the next best thing and make up our own. This article aims to make barely acceptable analogies about how radios work and show that you really don’t need that much in terms of know-how and equipment to find and take advantage of leaky radio signals. Towards the end, we will apply what we have learned to find a signal that can exfiltrate data out of a radio-less and air-gapped desktop workstation through a wall and 50ft away.
AWS re:Invent 2019: Speculation & leakage: Timing side channels & multi-tenant computing
In January 2018, the world learned about Spectre and Meltdown, a new class of issues that affects virtually all modern CPUs via nearly imperceptible changes to their micro-architectural states and can result in full access to physical RAM or leaking of state between threads, processes, or guests. In this session, we examine one of these side-channel attacks in detail and explore the implications for multi-tenant computing. We discuss AWS design decisions and what AWS does to protect your instances, containers, and function invocations. Finally, we discuss what the future looks like in the presence of this new class of issue.
This is a good recap. Specific defenses starts around 42:00.
TRRespass: Exploiting the Many Sides of Target Row Refresh
Well, after two years of rigorous research, looking inside what is implemented inside CPUs and DDR4 chips using novel reverse engineering techniques, we can tell you that we do not live in a Rowhammer-free world. And we will not for the better part of this decade. Turns out while the old hammering techniques no longer work, once we understand the exact nature of these mitigations inside modern DDR4 chips, using new hammering patterns it is trivial to again trigger plenty of new bit flips. Yet again, these results show the perils of lack of transparency and security-by-obscurity. This is especially problematic since unlike software vulnerabilities, we cannot fix these hardware bit flips post-production.
LVI - Hijacking Transient Execution with Load Value Injection
LVI is a new class of transient-execution attacks exploiting microarchitectural flaws in modern processors to inject attacker data into a victim program and steal sensitive data and keys from Intel SGX, a secure vault in Intel processors for your personal data.
LVI turns previous data extraction attacks around, like Meltdown, Foreshadow, ZombieLoad, RIDL and Fallout, and defeats all existing mitigations. Instead of directly leaking data from the victim to the attacker, we proceed in the opposite direction: we smuggle — “inject” — the attacker’s data through hidden processor buffers into a victim program and hijack transient execution to acquire sensitive information, such as the victim’s fingerprints or passwords.
Take A Way: Exploring the Security Implications of AMD’s Cache Way Predictors
In this paper, we are the first to exploit the cache way predictor. We reverse-engineered AMD’s L1D cache way predictor in microarchitectures from 2011 to 2019, resulting in two new attack techniques. With Collide+Probe, an attacker can monitor a victim’s memory accesses without knowledge of physical addresses or shared memory when time-sharing a logical core. With Load+ Reload, we exploit the way predictor to obtain highly-accurate memory-access traces of victims on the same physical core. While Load+Reload relies on shared memory, it does not invalidate the cache line, allowing stealthier attacks that do not induce any last-level-cache evictions.
KASLR: Break It, Fix It, Repeat
Escaping the Chrome Sandbox with RIDL
Vulnerabilities that leak cross process memory can be exploited to escape the Chrome sandbox. An attacker is still required to compromise the renderer prior to mounting this attack. To protect against attacks on affected CPUs make sure your microcode is up to date and disable hyper-threading (HT).
This is a pretty clear write-up and comes with a nice footnote:
When I started working on this I was surprised that it’s still exploitable even though the vulnerabilities have been public for a while. If you read guidance on the topic, they will usually talk about how these vulnerabilities have been mitigated if your OS is up to date with a note that you should disable hyper threading to protect yourself fully. The focus on mitigations certainly gave me a false sense that the vulnerabilities have been addressed and I think these articles could be more clear on the impact of leaving hyper threading enabled.
Speculative Load Hardening
While several approaches are being actively pursued to mitigate specific branches and/or loads inside especially risky software (most notably various OS kernels), these approaches require manual and/or static analysis aided auditing of code and explicit source changes to apply the mitigation. They are unlikely to scale well to large applications. We are proposing a comprehensive mitigation approach that would apply automatically across an entire program rather than through manual changes to the code. While this is likely to have a high performance cost, some applications may be in a good position to take this performance / security tradeoff.
SafeSide is a project to understand and mitigate software-observable side-channels: information leaks between software domains caused by implementation details outside the software abstraction.
CPU Introspection: Intel Load Port Snooping
We’re going to go into a unique technique for observing and sequencing all load port traffic on Intel processors. By using a CPU vulnerability from the MDS set of vulnerabilities, specifically multi-architectural load port data sampling (MLPDS, CVE-2018-12127), we are able to observe values which fly by on the load ports. Since (to my knowledge) all loads must end up going through load ports, regardless of requestor, origin, or caching, this means in theory, all contents of loads ever performed can be observed. By using a creative scanning technique we’re able to not only view “random” loads as they go by, but sequence loads to determine the ordering and timing of them.
We’ll go through some examples demonstrating that this technique can be used to view all loads as they are performed on a cycle-by-cycle basis. We’ll look into an interesting case of the micro-architecture updating accessed and dirty bits using a microcode assist. These are invisible loads dispatched on the CPU on behalf of the user when a page is accessed for the first time.
OpenSSH Key Shielding
On June 21, 2019, support for SSH key shielding was introduced into the OpenBSD tree, from which the OpenSSH releases are derived. SSH key shielding is a measure intended to protect private keys in RAM against attacks that abuse bugs in speculative execution that current CPUs exhibit. This functionality has been part of OpenSSH since the 8.1 release. SSH private keys are now being held in memory in a shielded form; keys are only unshielded when they are used and re‐shielded as soon as they are no longer in active use. When a key is shielded, it is encrypted in memory with AES‐256‐CTR; this is how it works:
Modern processors are being pushed to perform faster than ever before - and with this comes increases in heat and power consumption. To manage this, many chip manufacturers allow frequency and voltage to be adjusted as and when needed. But more than that, they offer the user the opportunity to modify the frequency and voltage through priviledged software interfaces. With Plundervolt we showed that these software interfaces can be exploited to undermine the system’s security. We were able to corrupt the integrity of Intel SGX on Intel Core processors by controling the voltage when executing enclave computations. This means that even Intel SGX’s memory encryption/authentication technology cannot protect against Plundervolt.
Not sure anyone should care about SGX anymore, all things considered, but for completeness, here’s another one.
TAA and other RIDL issues
On Nov 12, 2019, we (VUSec) disclose TSX Asynchronous Abort (TAA), a new speculation-based vulnerability in Intel CPUs as well as other MDS-related issues, as described in our new RIDL addendum. In reality, this is no new vulnerability. We disclosed TAA (and other issues) as part of our original RIDL submission to Intel in Sep 2018. Unfortunately, the Intel PSIRT team missed our submitted proof-of-concept exploits (PoCs), and as a result, the original MDS mitigations released in May 2019 only partially addressed RIDL. You can read the full story below.
On July 3, 2019, we finally learned that, to our surprise, the Intel PSIRT team had missed the PoCs from our Sep 29 submission, despite having awarded a bounty for it, explaining why Intel had failed to address - or even publicly acknowledge - many RIDL-class vulnerabilities on May 14, 2019.
When you have so many problems you’re paying out bounties without knowing what for...
TPM—Fail TPM meets Timing and Lattice Attacks
We discovered timing leakage on Intel firmware-based TPM (fTPM) as well as in STMicroelectronics’ TPM chip. Both exhibit secret-dependent execution times during cryptographic signature generation. While the key should remain safely inside the TPM hardware, we show how this information allows an attacker to recover 256-bit private keys from digital signature schemes based on elliptic curves.
This research shows that even rigorous testing as required by Common Criteria certification is not flawless and may miss attacks that have explicitly been checked for. The STMicroelectronics TPM chip is Common Criteria certified at EAL4+ for the TPM protection profiles and FIPS 140-2 certified at level 2, while the Intel TPM is certified according to FIPS 140-2. However, the certification has failed to protect the product against an attack that is considered by the protection profile.