Non-POSIX file systems
Operating systems and file systems have traditionally been developed hand in hand. They impose mutual constraints on each other. Today we have two major leaders in file system semantics: Windows and POSIX. They are very close to each other when compared to the full set of possibilities. Interesting things happened before POSIX monopolized file system semantics.
One Byte to rule them all
For the last several years, nearly all iOS kernel exploits have followed the same high-level flow: memory corruption and fake Mach ports are used to gain access to the kernel task port, which provides an ideal kernel read/write primitive to userspace. Recent iOS kernel exploit mitigations like PAC and zone_require seem geared towards breaking the canonical techniques seen over and over again to achieve this exploit flow. But the fact that so many iOS kernel exploits look identical from a high level begs questions: Is targeting the kernel task port really the best exploit flow? Or has the convergence on this strategy obscured other, perhaps more interesting, techniques? And are existing iOS kernel mitigations equally effective against other, previously unseen exploit flows?
In this blog post, I’ll describe a new iOS kernel exploitation technique that turns a one-byte controlled heap overflow directly into a read/write primitive for arbitrary physical addresses, all while completely sidestepping current mitigations such as KASLR, PAC, and zone_require. By reading a special hardware register, it’s possible to locate the kernel in physical memory and build a kernel read/write primitive without a fake kernel task port. I’ll conclude by discussing how effective various iOS mitigations were or could be at blocking this technique and by musing on the state-of-the-art of iOS kernel exploitation. You can find the proof-of-concept code here.
The core of Apple is PPL: Breaking the XNU kernel's kernel
While doing research for the one-byte exploit technique, I considered several ways it might be possible to bypass Apple’s Page Protection Layer (PPL) using just a physical address mapping primitive, that is, before obtaining kernel read/write or defeating PAC. Given that PPL is even more privileged than the rest of the XNU kernel, the idea of compromising PPL “before” XNU was appealing. In the end, though, I wasn’t able to think of a way to break PPL using the physical mapping primitive alone.
However, it’s not the Project Zero way to leave any mitigation unbroken. So, having exhausted my search for design flaws, I returned to the ever-faithful technique of memory corruption. Sure enough, decompiling a few PPL functions in IDA was sufficient to find some memory corruption.
Make system(3) and popen(3) use posix_spawn(3) internally
After 1 week of reading POSIX and writing code, 2 weeks of coding and another 1.5 weeks of bugfixes I have successfully implemented posix_spawn in usage in system(3) and popen(3) internally.
Unix's design issue of device numbers being in stat() results for files
Sometimes, you will hear the view that Unix’s design is without significant issues, especially the ‘pure’ design of Research Unix (before people who didn’t really understand Unix like Berkeley and corporate AT&T got their hands on it). Unfortunately that is not the case, and there are some areas where Research Unix made decisions that still haunt us to this day. For reasons beyond the scope of this entry, today’s example is that part of the file attributes that you get from stat() system call and its friends is the ‘device number’ of the filesystem the file is on.
I think it’s a bit exaggerated to say this is an issue that haunts us. More like a historical note.
Memory Ordering in Modern Microprocessors
KVM host in a few lines of code
KVM is a virtualization technology that comes with the Linux kernel. In other words, it allows you to run multiple virtual machines (VMs) on a single Linux VM host. VMs in this case are known as guests. If you ever used QEMU or VirtualBox on Linux - you know what KVM is capable of.
But how does it work under the hood?
Ice Lake Store Elimination
We have found that the store elimination optimization originally uncovered on Skylake client is still present in Ice Lake and is roughly twice as effective in our fill benchmarks. Elimination of 96% L2 writebacks (to L3) and L3 writebacks (to RAM) was observed, compared to 50% to 60% on Skylake. We found speedups of up to 45% in the L3 region and speedups of about 25% in RAM, compared to improvements of less than 20% in Skylake.
But there’s a lot of investigation work to get there.
What Outranks Thread Priority?
This investigation started, as so many of mine do, with me minding my own business, not looking for trouble. In this case all I was doing was opening my laptop lid and trying to log on. The first few times that this resulted in a twenty-second delay I ignored the problem, hoping that it would go away. The next few times I thought about investigating, but performance problems that occur before you have even logged on are trickier to solve, and I was feeling lazy. When I noticed that I was avoiding closing my laptop because I dreaded the all-too-frequent delays when opening it I realized it was time to get serious.
A lot of effort for a rather unsatisfactory conclusion, but I won’t spoil the surprise.
How are Unix pipes implemented?
This article is about how pipes are implemented the Unix kernel. I was a little disappointed that a recent article titled “How do Unix pipes work?” was not about the internals, and curious enough to go digging in some old sources to try to answer the question.
A "living" Linux process with no memory
This code gets a list of all memory maps from /proc/self/maps, then creates a new executable map where it jits some code that calls munmap() on each of the maps it just got, and finally on the map it’s on. This is just a quick example with no portability in mind, so the source code contains the actual bytes that would be emitted by a x64 compiler. After unmapping the final map, where the jit code lies, there’s no new instruction to execute and a segfault is raised.
Hypervisor Necromancy; Reanimating Kernel Protectors
In this (rather long) article we will be investigating methods to emulate proprietary hypervisors under QEMU, which will allow researchers to interact with them in a controlled manner and debug them. Specifically, we will be presenting a minimal framework developed to bootstrap Samsung S8+ proprietary hypervisor as a demonstration, providing details and insights on key concepts on ARM low level development and virtualization extensions for interested readers to create their own frameworks and Actually Compile And Boot them ;). Finally, we will be investigating fuzzing implementations under this setup.
TRRespass: Exploiting the Many Sides of Target Row Refresh
Well, after two years of rigorous research, looking inside what is implemented inside CPUs and DDR4 chips using novel reverse engineering techniques, we can tell you that we do not live in a Rowhammer-free world. And we will not for the better part of this decade. Turns out while the old hammering techniques no longer work, once we understand the exact nature of these mitigations inside modern DDR4 chips, using new hammering patterns it is trivial to again trigger plenty of new bit flips. Yet again, these results show the perils of lack of transparency and security-by-obscurity. This is especially problematic since unlike software vulnerabilities, we cannot fix these hardware bit flips post-production.
Escaping the Chrome Sandbox with RIDL
Vulnerabilities that leak cross process memory can be exploited to escape the Chrome sandbox. An attacker is still required to compromise the renderer prior to mounting this attack. To protect against attacks on affected CPUs make sure your microcode is up to date and disable hyper-threading (HT).
This is a pretty clear write-up and comes with a nice footnote:
When I started working on this I was surprised that it’s still exploitable even though the vulnerabilities have been public for a while. If you read guidance on the topic, they will usually talk about how these vulnerabilities have been mitigated if your OS is up to date with a note that you should disable hyper threading to protect yourself fully. The focus on mitigations certainly gave me a false sense that the vulnerabilities have been addressed and I think these articles could be more clear on the impact of leaving hyper threading enabled.
Avoiding gaps in IOMMU protection at boot
But setting things up in the OS isn’t sufficient. If an attacker is able to trigger arbitrary DMA before the OS has started then they can tamper with the system firmware or your bootloader and modify the kernel before it even starts running. So ideally you want your firmware to set up the IOMMU before it even enables any external devices, and newer firmware should actually do this automatically. It sounds like the problem is solved.
This post explains how to implement heap allocators from scratch. It presents and discusses different allocator designs, including bump allocation, linked list allocation, and fixed-size block allocation. For each of the three designs, we will create a basic implementation that can be used for our kernel.
A Deep Dive Into Samsung's TrustZone
After a general introduction on the ARM TrustZone and a focus on Qualcomm’s implementation, this new series of articles will discuss and detail the implementation developed by Samsung and Trustonic.
These blog posts are a follow up to the conference Breaking Samsung’s ARM TrustZone that was given at BlackHat USA this summer. While an event such as this one is a great opportunity to present a subject we have been working on, many details have to be overlooked to fit the 50-minute format. This blog post, and the following ones, will explain all the details that were missing from the presentation as well as release the different tools mentioned in the talk and developed along the way.
The Year Ahead
There are a few conferences from 2019 that I didn’t manage to get to last year (notably CCS, SOCC, and NeurIPS) which are still on my plate. And then I’ve pulled together this initial ‘watch list’ for the coming year.
Purgeable Memory Allocations for Linux
It allocates anonymous pages using mmap(2). When the allocation is “unlocked” — i.e. the process isn’t actively using it — its pages are marked with MADV_FREE so that the kernel can reclaim them at any time. To lock the allocation so that the process can safely make use of them, the MADV_FREE is canceled. This is all a little trickier than it sounds, and that’s the subject of this article.
CPU Introspection: Intel Load Port Snooping
We’re going to go into a unique technique for observing and sequencing all load port traffic on Intel processors. By using a CPU vulnerability from the MDS set of vulnerabilities, specifically multi-architectural load port data sampling (MLPDS, CVE-2018-12127), we are able to observe values which fly by on the load ports. Since (to my knowledge) all loads must end up going through load ports, regardless of requestor, origin, or caching, this means in theory, all contents of loads ever performed can be observed. By using a creative scanning technique we’re able to not only view “random” loads as they go by, but sequence loads to determine the ordering and timing of them.
We’ll go through some examples demonstrating that this technique can be used to view all loads as they are performed on a cycle-by-cycle basis. We’ll look into an interesting case of the micro-architecture updating accessed and dirty bits using a microcode assist. These are invisible loads dispatched on the CPU on behalf of the user when a page is accessed for the first time.