AWS re:Invent 2019: Speculation & leakage: Timing side channels & multi-tenant computing
> In January 2018, the world learned about Spectre and Meltdown, a new class of issues that affects virtually all modern CPUs via nearly imperceptible changes to their micro-architectural states and can result in full access to physical RAM or leaking of state between threads, processes, or guests. In this session, we examine one of these side-channel attacks in detail and explore the implications for multi-tenant computing. We discuss AWS design decisions and what AWS does to protect your instances, containers, and function invocations. Finally, we discuss what the future looks like in the presence of this new class of issue.
This is a good recap. Specific defenses starts around 42:00.
LVI - Hijacking Transient Execution with Load Value Injection
> LVI is a new class of transient-execution attacks exploiting microarchitectural flaws in modern processors to inject attacker data into a victim program and steal sensitive data and keys from Intel SGX, a secure vault in Intel processors for your personal data.
> LVI turns previous data extraction attacks around, like Meltdown, Foreshadow, ZombieLoad, RIDL and Fallout, and defeats all existing mitigations. Instead of directly leaking data from the victim to the attacker, we proceed in the opposite direction: we smuggle — “inject” — the attacker’s data through hidden processor buffers into a victim program and hijack transient execution to acquire sensitive information, such as the victim’s fingerprints or passwords.
Take A Way: Exploring the Security Implications of AMD’s Cache Way Predictors
> In this paper, we are the first to exploit the cache way predictor. We reverse-engineered AMD’s L1D cache way predictor in microarchitectures from 2011 to 2019, resulting in two new attack techniques. With Collide+Probe, an attacker can monitor a victim’s memory accesses without knowledge of physical addresses or shared memory when time-sharing a logical core. With Load+ Reload, we exploit the way predictor to obtain highly-accurate memory-access traces of victims on the same physical core. While Load+Reload relies on shared memory, it does not invalidate the cache line, allowing stealthier attacks that do not induce any last-level-cache evictions.
KASLR: Break It, Fix It, Repeat
Escaping the Chrome Sandbox with RIDL
> Vulnerabilities that leak cross process memory can be exploited to escape the Chrome sandbox. An attacker is still required to compromise the renderer prior to mounting this attack. To protect against attacks on affected CPUs make sure your microcode is up to date and disable hyper-threading (HT).
This is a pretty clear write-up and comes with a nice footnote:
> When I started working on this I was surprised that it’s still exploitable even though the vulnerabilities have been public for a while. If you read guidance on the topic, they will usually talk about how these vulnerabilities have been mitigated if your OS is up to date with a note that you should disable hyper threading to protect yourself fully. The focus on mitigations certainly gave me a false sense that the vulnerabilities have been addressed and I think these articles could be more clear on the impact of leaving hyper threading enabled.
A Deep Dive Into Samsung's TrustZone
> After a general introduction on the ARM TrustZone and a focus on Qualcomm’s implementation, this new series of articles will discuss and detail the implementation developed by Samsung and Trustonic.
> These blog posts are a follow up to the conference Breaking Samsung’s ARM TrustZone that was given at BlackHat USA this summer. While an event such as this one is a great opportunity to present a subject we have been working on, many details have to be overlooked to fit the 50-minute format. This blog post, and the following ones, will explain all the details that were missing from the presentation as well as release the different tools mentioned in the talk and developed along the way.
Gathering Intel on Intel AVX-512 Transitions
> This is a post about AVX and AVX-512 related frequency scaling. Now, something more than nothing has been written about this already, including cautionary tales of performance loss and some broad guidelines, so do we really need to add to the pile?
> Perhaps not, but I’m doing it anyway. My angle is a lower level look, almost microscopic really, at the specific transition behaviors. One would hope that this will lead to specific, quantitative advice about exactly when various instruction types are likely to pay off, but (spoiler) I didn’t make it there in this post.
CPU Introspection: Intel Load Port Snooping
> We’re going to go into a unique technique for observing and sequencing all load port traffic on Intel processors. By using a CPU vulnerability from the MDS set of vulnerabilities, specifically multi-architectural load port data sampling (MLPDS, CVE-2018-12127), we are able to observe values which fly by on the load ports. Since (to my knowledge) all loads must end up going through load ports, regardless of requestor, origin, or caching, this means in theory, all contents of loads ever performed can be observed. By using a creative scanning technique we’re able to not only view “random” loads as they go by, but sequence loads to determine the ordering and timing of them.
> We’ll go through some examples demonstrating that this technique can be used to view all loads as they are performed on a cycle-by-cycle basis. We’ll look into an interesting case of the micro-architecture updating accessed and dirty bits using a microcode assist. These are invisible loads dispatched on the CPU on behalf of the user when a page is accessed for the first time.
A new cycle-stepped 6502 CPU emulator
> I wrote a new version of my 6502/6510 emulator in the last weeks which can be stepped forward in clock cycles instead of full instructions.
> Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution. When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
> While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs. Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically). This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI. Considerations that are implementation-specific are clearly identified throughout.
> Modern processors are being pushed to perform faster than ever before - and with this comes increases in heat and power consumption. To manage this, many chip manufacturers allow frequency and voltage to be adjusted as and when needed. But more than that, they offer the user the opportunity to modify the frequency and voltage through priviledged software interfaces. With Plundervolt we showed that these software interfaces can be exploited to undermine the system’s security. We were able to corrupt the integrity of Intel SGX on Intel Core processors by controling the voltage when executing enclave computations. This means that even Intel SGX’s memory encryption/authentication technology cannot protect against Plundervolt.
Not sure anyone should care about SGX anymore, all things considered, but for completeness, here’s another one.
microwatt - A tiny Open POWER ISA softcore written in VHDL 2008
> You can try out Microwatt/Micropython without hardware by using the ghdl simulator. If you want to build directly for a hardware target board, see below.
TAA and other RIDL issues
> On Nov 12, 2019, we (VUSec) disclose TSX Asynchronous Abort (TAA), a new speculation-based vulnerability in Intel CPUs as well as other MDS-related issues, as described in our new RIDL addendum. In reality, this is no new vulnerability. We disclosed TAA (and other issues) as part of our original RIDL submission to Intel in Sep 2018. Unfortunately, the Intel PSIRT team missed our submitted proof-of-concept exploits (PoCs), and as a result, the original MDS mitigations released in May 2019 only partially addressed RIDL. You can read the full story below.
> On July 3, 2019, we finally learned that, to our surprise, the Intel PSIRT team had missed the PoCs from our Sep 29 submission, despite having awarded a bounty for it, explaining why Intel had failed to address - or even publicly acknowledge - many RIDL-class vulnerabilities on May 14, 2019.
When you have so many problems you’re paying out bounties without knowing what for...
How a months-old AMD microcode bug destroyed my weekend
> Unfortunately, unpatched Ryzen 3000 says “yes” to the CPUID 01H call, sets the carry bit indicating it has successfully created the most artisanal, organic high-quality random number possible... and gives you a 0xFFFFFFFF for the “random” number, every single time.
> Unfortunately, after successfully applying the update and rebooting again, I realized my error—yes, Asus showed a later date for the BIOS, but the actual version was the same as the one I already had—3.2.0. My CPU still thought 0xFFFFFFFF was the randomest number ever, always, no matter what.
> At this point, I began to get paranoid—systemd had already quietly worked around the bug. But with most applications just quietly ignoring the problem, how would I know if it ever had been patched? What if two years later, I was still vulnerable to stack-smashing that I shouldn’t have been, due to ASLR that wasn’t actually randomizing?
Another entry for the bad workarounds file.
How "special register groups" invaded computer dictionaries for decades
> Half a century ago, the puzzling phrase “special register groups” started showing up in definitions of “CPU”, and it is still there. In this blog post, I uncover how special register groups went from an obscure feature in the Honeywell 800 mainframe to appearing in the Washington Post.
Tethered jailbreaks are back
> checkm8 exploits the Boot ROM to allow anyone with physical control of a phone to run arbitrary code. The Boot ROM, also called the Secure ROM, is the first code that executes when an iPhone is powered on and cannot be changed, because it’s “burned in” to the iPhone’s hardware. The Boot ROM initializes the system and eventually passes control to the kernel. It’s the root of trust for the trusted boot chain of iOS and verifies the integrity of the next stage of the boot process before passing execution control.
Detailed writeup: https://habr.com/en/company/dsec/blog/472762/
CPU Adventure – Unknown CPU Reversing
> We reverse-engineered a program written for a completely custom, unknown CPU architecture, without any documentation for the CPU (no emulator, no ISA reference, nothing) in the span of ten hours.
Position Independent Code (PIC) in shared libraries
> This article explained what position independent code is, and how it helps create shared libraries with shareable read-only text sections. There are some tradeoffs when choosing between PIC and its alternative (load-time relocation), and the eventual outcome really depends on a lot of factors, like the CPU architecture on which the program is going to run.
Go compiler intrinsics
> Over the years there have been various proposals for an inline assembly syntax similar to gcc’s asm(...) directive. None have been accepted by the Go team. Instead, Go has added intrinsic functions1.
> An intrinsic function is Go code written in regular Go. These functions are known the the Go compiler which contains replacements which it can substitute during compilation.
Where do interrupts happen?
> For a simple 1-wide in-order, non-pipelined CPU the answer might be as simple as: the CPU is interrupted either before or after instruction that is currently running2. For anything more complicated it’s not going to be easy. On a modern out-of-order processor there may be hundreds of instructions in-flight at any time, some waiting to execute, a dozen or more currently executing, and others waiting to retire. From all these choices, which instruction will be chosen as the victim?