The multi-generational LRU
One of the key tasks assigned to the memory-management subsystem is to optimize the system’s use of the available memory; that means pushing out pages containing unused data so that they can be put to better use elsewhere. Predicting which pages will be accessed in the near future is a tricky task, and the kernel has evolved a number of mechanisms designed to improve its chances of guessing right. But the kernel not only often gets it wrong, it also can expend a lot of CPU time to make the incorrect choice. The multi-generational LRU patch set posted by Yu Zhao is an attempt to improve that situation.
Improving texture atlas allocation in WebRender
This is a longer version of the piece I published in the mozilla gfx team blog where I focus on the atlas allocation algorithms. In this one I’ll go into more details about the process and methodology behind these improvements. The first part is about the making of guillotiere, a crate that I first released in March 2019. In the second part we’ll have a look at more recent work building upon what I did with guillotiere, to improve texture memory usage in WebRender/Firefox.
What they don’t tell you about demand paging in school
This post details my adventures with the Linux virtual memory subsystem, and my discovery of a creative way to taunt the OOM (out of memory) killer by accumulating memory in the kernel, rather than in userspace.
Good look at practical realities.
.NET Memory Performance Analysis
This document aims to help folks who develop applications in .NET with how to think about memory performance analysis and finding the right approaches to perform such analysis if they need to. In this context .NET includes .NET Framework and .NET Core. In order to get the latest memory improvements in both the garbage collector and the rest of the framework I strongly encourage you to be on .NET Core if you are not already, because that’s where the active development happens.
When I was writing this document I intended to introduce concepts like concurrent GC or pinning as needed by the explanation of the analysis. So as you read it, you’ll gradually come across them. If you already kind of knew what they are and are looking for explanation on a specific concept here are the links to them
The core of Apple is PPL: Breaking the XNU kernel's kernel
While doing research for the one-byte exploit technique, I considered several ways it might be possible to bypass Apple’s Page Protection Layer (PPL) using just a physical address mapping primitive, that is, before obtaining kernel read/write or defeating PAC. Given that PPL is even more privileged than the rest of the XNU kernel, the idea of compromising PPL “before” XNU was appealing. In the end, though, I wasn’t able to think of a way to break PPL using the physical mapping primitive alone.
However, it’s not the Project Zero way to leave any mitigation unbroken. So, having exhausted my search for design flaws, I returned to the ever-faithful technique of memory corruption. Sure enough, decompiling a few PPL functions in IDA was sufficient to find some memory corruption.
Major Bug in glibc is Killing Applications With a Memory Limit
malloc() preallocates large chunks of memory, per thread. This is meant as a performance optimization, to reduce memory contention in highly threaded applications. On a typical physical server, dual Xeon CPU with a terabyte of RAM. The core count is easily 40 or above. 10 cores * 2 CPU * 2 for hyper threading. This means a preallocation of up to 20 GB of RAM in the process.
15 years later: Remote Code Execution in qmail (CVE-2005-1513)
In 2005, three vulnerabilities were discovered in qmail but were never fixed because they were believed to be unexploitable in a default installation. We recently re-discovered these vulnerabilities and were able to exploit one of them remotely in a default installation.
Exploring munmap() on page zero and on unmapped address space
The difference between Linux and FreeBSD is in what they consider to be ‘outside the valid range for the address space of a process’. FreeBSD evidently considers page zero (and probably low memory in general) to always be outside this range, and thus munmap() fails. Linux does not; while it doesn’t normally let you mmap() memory in that area, for good reasons, it is not intrinsically outside the address space. If I’m reading the Linux kernel code correctly, no low address range is ever considered invalid, only address ranges that cross above the top of user space.
Tale of two hypervisor bugs - Escaping from FreeBSD bhyve
VM escape has become a popular topic of discussion over the last few years. A good amount of research on this topic has been published for various hypervisors like VMware, QEMU, VirtualBox, Xen and Hyper-V. Bhyve is a hypervisor for FreeBSD supporting hardware-assisted virtualization. This paper details the exploitation of two bugs in bhyve - FreeBSD-SA-16:32.bhyve  (VGA emulation heap overflow) and CVE-2018-17160  (Firmware Configuration device bss buffer overflow) and some generic techniques which could be used for exploiting other bhyve bugs. Further, the paper also discusses sandbox escapes using PCI device passthrough, and Control-Flow Integrity bypasses in HardenedBSD 12-CURRENT
A "living" Linux process with no memory
This code gets a list of all memory maps from /proc/self/maps, then creates a new executable map where it jits some code that calls munmap() on each of the maps it just got, and finally on the map it’s on. This is just a quick example with no portability in mind, so the source code contains the actual bytes that would be emitted by a x64 compiler. After unmapping the final map, where the jit code lies, there’s no new instruction to execute and a segfault is raised.
This post explains how to implement heap allocators from scratch. It presents and discusses different allocator designs, including bump allocation, linked list allocation, and fixed-size block allocation. For each of the three designs, we will create a basic implementation that can be used for our kernel.
The historical significance of the Burgermaster drive-in restaurant
In Windows 3.0, the data segment that recorded the locations of all the other data segments was named the BurgerMaster.
The Burgermaster restaurant was so important that Bill Gates’s secretary kept it on speed dial. In fact, it wasn’t just on speed dial for Bill Gates’s secretary. It was a company-wide speed dial number. You could call them to order a burger, walk next door, and your order would be ready and waiting for you.
Purgeable Memory Allocations for Linux
It allocates anonymous pages using mmap(2). When the allocation is “unlocked” — i.e. the process isn’t actively using it — its pages are marked with MADV_FREE so that the kernel can reclaim them at any time. To lock the allocation so that the process can safely make use of them, the MADV_FREE is canceled. This is all a little trickier than it sounds, and that’s the subject of this article.
Local Privilege Escalation in OpenBSD's dynamic loader (CVE-2019-19726)
1a/ we set the LD_LIBRARY_PATH environment variable to one single dot (the current working directory) and approximately ARG_MAX colons (the maximum number of bytes for the argument and environment list); as described in man ld.so:
1b/ we set the RLIMIT_DATA resource limit to ARG_MAX * sizeof(char *) (2MB on amd64, 1MB on i386); as described in man setrlimit:
children_tcache writeup and tcache overview
This article is intended for the people who already have some knowledge about heap exploitation. If you already know some heap attacks on glibc<2.26 it’ll be fully understandable to you. But if you don’t, don’t worry - I’ve tried to make this post approachable for everyone with just basic knowledge. If you really know nothing about the topic, I recommend heap-exploitation.
Tcache is an internal mechanism responsible for heap management. It was introduced in glibc 2.26 in the year 2017. It’s objective is to speed up the heap management. Older algorithms are not removed, but they are still used sometimes - for example for bigger chunks, or when an appropriate tcache bin is full. But heap exploitation with this mechanism is a lot easier due to a lack of heap integrity checks.
FreeBSD'fy ZFS zlib zalloc/zfree callbacks
The previous code came from OpenSolaris, which in my understanding require allocation size to be known to free memory. To store that size previous code allocated additional 8 byte header. But I have noticed that zlib with present settings allocates 64KB context buffers for each call, that could be efficiently cached by UMA, but addition of those 8 bytes makes them fall back to physical RAM allocations, that cause huge overhead and lock congestion on small blocks. Since FreeBSD’s free() does not have the size argument, switching to it solves the problem, increasing write speed to ZVOLs with 4KB block size and GZIP compression on my 40-threads test system from ~60MB/s to ~600MB/s.
unfork(2) is the inverse of fork(2). sort of.
By combining userfaultfd with process_vm_readv, any userspace application can obtain a copy-on-write mapping (with some limitations) of memory it never owned. All it needs is ptrace privileges, which is to say, having the same uid usually works.
Legitimate Use of Variable Length Arrays
If the correct solution ultimately didn’t use a VLA, then what good are they? In general, VLAs not useful. They’re time bombs. VLAs are nearly always the wrong choice. You must be careul to check that they don’t exceed some safe maximum, and there’s no reason not to always use the maximum.
There is one convenient, useful, and safe form of VLAs: a pointer to a VLA. It’s convenient and useful because it makes some expressions simpler. It’s safe because there’s no arbitrary stack allocation.
This is a neat trick, but feels pretty close to peril.
And a trick using alloca: https://nullprogram.com/blog/2019/10/28/
How a double-free bug in WhatsApp turns to RCE
In this blog post, I’m going to share about a double-free vulnerability that I discovered in WhatsApp for Android, and how I turned it into an RCE.
Double-free vulnerability in DDGifSlurp in decoding.c in libpl_droidsonroids_gif
Tethered jailbreaks are back
checkm8 exploits the Boot ROM to allow anyone with physical control of a phone to run arbitrary code. The Boot ROM, also called the Secure ROM, is the first code that executes when an iPhone is powered on and cannot be changed, because it’s “burned in” to the iPhone’s hardware. The Boot ROM initializes the system and eventually passes control to the kernel. It’s the root of trust for the trusted boot chain of iOS and verifies the integrity of the next stage of the boot process before passing execution control.
Detailed writeup: https://habr.com/en/company/dsec/blog/472762/