APRR - Access Protection ReRouting
> Almost a year ago I did a write-up on KTRR, first introduced in Apple’s A10 chip series. Now over the course of the last year, there has been a good bit of talk as well as confusion about the new mitigations shipped with Apple’s A12. One big change, PAC, has already been torn down in detail by Brandon Azad, so I’m gonna leave that out here. What’s left to cover is more than just APRR, but APRR is certainly the biggest chunk, hence the title of this post.
> APRR is a pretty cool feature, even if parts of it are kinda broke. What I really like about it (besides the fact that it is an efficient and elegant solution to switching privileges) is that it untangles EL1 and EL0 memory permissions, giving you more flexibility than a standard ARMv8 implementation. What I don’t like though is that it has clearly been designed as a lockdown feature, allowing you only to take permissions away rather than freely remap them.
> It’s also evident that Apple is really fond of post-exploit mitigations, or just mitigations in general. And on one hand, getting control over the physical address space is a good bit harder now. But on the other hand, Apple’s stacking of mitigations is taking a problematic turn when adding new mitigations actively creates vulnerabilities now.
> For two years I’ve been driving myself crazy trying to figure out the source of a driver problem on OpenBSD: interrupts never arrived for certain touchpad devices. A couple weeks ago, I put out a public plea asking for help in case any non-OpenBSD developers recognized the problem, but while debugging an unrelated issue over the weekend, I finally solved it. It’s been a long journey and it’s a technical tale, but here it is.
Diving deep into the AML.
7 Days To Virtualization: A Series On Hypervisor Development
Extending the Kernel with Built-in Kernel Headers
> Kernel headers are usually unavailable on the target where these BPF tracing programs need to be dynamically compiled and run. That is certainly the case with Android, which runs on billions of devices. It is not practical to ship custom kernel headers for every device. My solution to the problem is to embed the kernel headers within the kernel image itself and make it available through the sysfs virtual filesystem (usually mounted at /sys) as a compressed archive file (/sys/kernel/kheaders.tar.xz). This archive can be uncompressed as needed to a temporary directory. This simple change guarantees that the headers are always shipped with the running kernel.
I feel this is the wrong solution, but interesting nevertheless.
> Let’s talk about files! Most developers seem to think that files are easy.
> In this talk, we’re going to look at how file systems differ from each other and other issues we might encounter when writing to files. We’re going to look at the file “stack“, starting at the top with the file API, moving down to the filesystem, and then moving down to disk.
Provide protection against starvation of the ll/sc loops when accessing userpace.
> Casueword(9) on ll/sc architectures must be prepared for userspace constantly modifying the same cache line as containing the CAS word, and not loop infinitely. Otherwise, rogue userspace livelock kernel.
It’s Time for a Modern Synthesis Kernel
> The promise of kernel-mode runtime code generation is that we can have very fast, feature-rich operating systems by, for example, not including code implementing generic read() and write() system calls, but rather synthesizing code for these operations each time a file is opened. The idea is that at file open time, the OS has a lot of information that it can use to generate highly specialized code, eliding code paths that are provably not going to execute.
The convoy phenomenon
> The duration of a lock is the average number of instructions executed while the lock is held. The execution interval of a lock is the average number of instructions executed between successive requests for that lock by a process. The collision cross section of the lock is the fraction of time it is granted, i.e., the lock grant probability.
> Most of us are stuck with a pre-emptive scheduler (i.e., general purpose operating system with virtual memory). Hence convoys will occur. The problem is to make them evaporate quickly when they do occur rather than have them persist forever.
Reading the Manual for ENIAC, the World’s First Electronic Computer
> It seems like the machine was temperamental. For example, it warns that the DC power should never be turned on without first turning the operation switch to “continuous.”
> “Failure to follow this rule causes certain DC fuses to blow, -240 and -415 in particular.”
> But the consequences are even worse if you opened the DC fuse cabinet when the D.C. power was turned on. “This not only exposes a person to voltage differences of around 1,500 volts but the person may be burned by flying pieces of molten fuse wire” (if one of the fuse cases suddenly blew). In fact, the ENIAC was actually designed with a door switch shunt that prevented it from operating if one of its panel doors was open, “since removing the doors exposes dangerous voltage.” But this feature could be bypassed by holding the door switch shunt in its closed position.
Automatic Exploitation of Fully Randomized Executables
> We present Marten, a new end to end system for automatically discovering, exploiting, and combining information leakage and buffer overflow vulnerabilities to derandomize and exploit remote, fully randomized processes. Results from two case studies high- light Marten’s ability to generate short, robust ROP chain exploits that bypass address space layout randomization and other modern defenses to download and execute injected code selected by an attacker.
What is WofCompressedData?
> The documentation for wofapi.h says merely “This header is used by Data Access and Storage.” For more information, it refers you to another web page that contains no additional information. WOF stands for Windows Overlay Filter, which is a nice name that doesn’t really tell you much about what it does or what it’s for.
> Changing the native NTFS file compression would be a disk format breaking change, which is not something taken lightly. Doing it as a filter provides much more flexibility. The downside is that if you mount the volume on a system that doesn’t support the Windows Overlay Filter, all you see is an empty file. Fortunately, WOF is used only for system-installed files, and if you are mounting the volume onto another system, it’s probably for data recovery purposes, so you’re interested in user data, not system files.
In-DRAM Bulk Bitwise Execution Engine
> Many applications heavily use bitwise operations on large bitvectors as part of their computation. In existing systems, performing such bulk bitwise operations requires the processor to transfer a large amount of data on the memory channel, thereby consuming high latency, memory bandwidth, and energy. In this paper, we describe Ambit, a recently-proposed mechanism to perform bulk bitwise operations completely inside main memory. Ambit exploits the internal organization and analog operation of DRAM-based memory to achieve low cost, high performance, and low energy. Ambit exposes a new bulk bitwise execution model to the host processor. Evaluations show that Ambit significantly improves the performance of several applications that use bulk bitwise operations, including databases.
RowHammer: A Retrospective
> In this article, we comprehensively survey the scientific literature on RowHammer-based attacks as well as mitigation techniques to prevent RowHammer. We also discuss what other related vulnerabilities may be lurking in DRAM and other types of memories, e.g., NAND flash memory or Phase Change Memory, that can potentially threaten the foundations of secure systems, as the memory technologies scale to higher densities. We conclude by describing and advocating a principled approach to memory reliability and security research that can enable us to better anticipate and prevent such vulnerabilities.
Serenity Operating System
> Serenity is a love letter to ‘90s user interfaces, with a custom Unix-like core. It flatters with sincerity by stealing beautiful ideas from various other systems.
Software-defined far memory in warehouse scale computers
If each thread’s TEB is referenced by the fs selector, does that mean that the 80386 is limited to 1024 threads?
> No, it doesn’t, because nobody said that the distinct values had to be different simultaneously.
Improvements in forking, threading, and signal code
> I am improving signaling code in the NetBSD kernel, covering corner cases with regression tests, and improving the documentation. I’ve been working at the level of sytems calls (syscalls): forking, threading, handling these with GDB, and tracing syscalls. Some work happens behind the scenes as I support the work of Michal Gorny on LLDB/ptrace features.
Looking for Entropy in All the Wrong Places
> Imagine we’re writing a C program and we need some random numbers.
Assessing Unikernel Security
> Unikernels are small, specialized, single-address-space machine images constructed by treating component applications and drivers like libraries and compiling them, along with a kernel and a thin OS layer, into a single binary blob. Proponents of unikernels claim that their smaller codebase and lack of excess services make them more efficient and secure than full-OS virtual machines and containers. We surveyed two major unikernels, Rumprun and IncludeOS, and found that this was decidedly not the case: unikernels, which in many ways resemble embedded systems, appear to have a similarly minimal level of security. Features like ASLR, W^X, stack canaries, heap integrity checks and more are either completely absent or seriously flawed. If an application running on such a system contains a memory corruption vulnerability, it is often possible for attackers to gain code execution, even in cases where the application’s source and binary are unknown. Furthermore, because the application and the kernel run together as a single process, an attacker who compromises a unikernel can immediately exploit functionality that would require privilege escalation on a regular OS, e.g. arbitrary packet I/O. We demonstrate such attacks on both Rumprun and IncludeOS unikernels, and recommend measures to mitigate them.
From Zero to NVMM
> Six months ago, I told myself I would write a small hypervisor for an old x86 AMD CPU I had. Just to learn more about virtualization, and see how far I could go alone on my spare time. Today, it turns out that I’ve gone as far as implementing a full, fast and flexible virtualization stack for NetBSD. I’d like to present here some aspects of it.