Parsing Protobuf at 2+GB/s: How I Learned To Love Tail Calls in C
While tail calls are usually associated with a functional programming style, I am interested in them purely for performance reasons. It turns out that in some cases we can use tail calls to get better code out of the compiler than would otherwise be possible—at least given current compiler technology—without dropping to assembly.
signed char lotte
“signed char lotte” is a computer program written by Brian Westley and the winner of the “Best Layout” award in the 1990 International Obfuscated C Code Contest. The cleverness of the text is staggering. Superficially it reads as an epistolary exchange between two (possibly former) lovers, Charlotte and Charlie. At the same, it is an executable piece of code whose action is thematically related to its story.
Uncovering a 24-year-old bug in the Linux Kernel
When one side’s receive buffer (Recv-Q) fills up (in this case because the rsync process is doing disk I/O at a speed slower than the network’s), it will send out a zero window advertisement, which will put that direction of the connection on hold. When buffer space eventually frees up, the kernel will send an unsolicited window update with a non-zero window size, and the data transfer continues. To be safe, just in case this unsolicited window update is lost, the other end will regularly poll the connection state using the so-called Zero Window Probes (the persist mode we are seeing here).
Apparently, the bug was in the bulk receiver fast-path, a code path that skips most of the expensive, strict TCP processing to optimize for the common case of bulk data reception. This is a significant optimization, outlined 28 years ago² by Van Jacobson in his “TCP receive in 30 instructions” email. Apparently the Linux implementation did not update snd_wl1 while in the receiver fast path. If a connection uses the fast path for too long, snd_wl1 will fall so far behind that ack_seq will wrap around with respect to it. And if this happens while the receive window is zero, there is no way to re-open the window, as demonstrated above. What’s more, this bug had been present in Linux since v2.1.8, dating back to 1996!
donut.c without a math library
My little donut.c has been making the rounds again, after being featured in a couple YouTube videos (e.g., Lex Fridman and Joma Tech). If I had known how much attention this code would get over the years, I would have spent more time on it.
Pointers Are Complicated II, or: We need better language specs
Below, I will show a series of three compiler transformations that each seem “intuitively justified”, but when taken together they lead to a clearly incorrect result. I will use LLVM for these examples, but the goal is not to pick on LLVM—other compilers suffer from similar issues. The goal is to convince you that to build a correct compiler for languages permitting unsafe pointer manipulation such as C, C++, or Rust, we need to take IR semantics (and specifically provenance) more seriously. I use LLVM for the examples because it is particularly easy to study with its single, extensively-documented IR that a lot of infrastructure evolved around. Let’s get started!
Exploring mullender.c - A deep dive into the first IOCCC winner
I will discuss the code, how I got such old and obscure code to run, as well as include snippets from my conversations with one of the original authors (who was very helpful in figuring some of this out). If all that wasn’t enough I managed to obtain the original PDP and VAX source code, it will be hosted here with permission. I want to give a huge thank you to Sjoerd Mullender and Don Libes for their assistance and permission in reproducing some of this material.
A 35-year-old bug in patch found in efforts to restore 29 year old 2.11BSD
Larry Wall posted patch 1.3 to mod.sources on May 8, 1985. A number of versions followed over the years. It’s been a faithful alley for a long, long time. I’ve never had a problem with patch until I embarked on the 2.11BSD restoration project. In going over the logs very carefully, I’ve discovered a bug that bites this effort twice. It’s quite interesting to use 27 year old patches to find this bug while restoring a 29 year old OS...
How can CharUpper and CharLower guarantee that the uppercase version of a string is the same length as the lowercase version?
Ray Tracing In Notepad.exe At 30 FPS
A few months back, there was a post on Reddit (link), which described a game that used an open source clone of Notepad to handle all its input and rendering. While reading about it, I had the thought that it would be really cool to see something similar that worked with stock Windows Notepad. Then I spent way too much of my free time doing exactly that.
I ended up making a Snake game and a small ray tracer that use stock Notepad for all input and rendering tasks, and got to learn about DLL Injection, API Hooking and Memory Scanning along the way. It seemed like writing up the stuff I learned might make for an interesting read, and give me a chance to show off the dumb stuff I built at the same time, so that’s what these next couple blog posts will be about.
15 years later: Remote Code Execution in qmail (CVE-2005-1513)
In 2005, three vulnerabilities were discovered in qmail but were never fixed because they were believed to be unexploitable in a default installation. We recently re-discovered these vulnerabilities and were able to exploit one of them remotely in a default installation.
Tale of two hypervisor bugs - Escaping from FreeBSD bhyve
VM escape has become a popular topic of discussion over the last few years. A good amount of research on this topic has been published for various hypervisors like VMware, QEMU, VirtualBox, Xen and Hyper-V. Bhyve is a hypervisor for FreeBSD supporting hardware-assisted virtualization. This paper details the exploitation of two bugs in bhyve - FreeBSD-SA-16:32.bhyve  (VGA emulation heap overflow) and CVE-2018-17160  (Firmware Configuration device bss buffer overflow) and some generic techniques which could be used for exploiting other bhyve bugs. Further, the paper also discusses sandbox escapes using PCI device passthrough, and Control-Flow Integrity bypasses in HardenedBSD 12-CURRENT
How are Unix pipes implemented?
This article is about how pipes are implemented the Unix kernel. I was a little disappointed that a recent article titled “How do Unix pipes work?” was not about the internals, and curious enough to go digging in some old sources to try to answer the question.
Patching nVidia GPU driver for hot-unplug on Linux
Anyway, I was kind of annoyed of rebooting every time it happens, so I decided to reboot a few more dozen times instead while patching the driver. This has indeed worked, and left me with something similar to a functional hot-unplug, mildly crippled by the fact that nvidia-modeset is a completely opaque blob that keeps some internal state and tries to act on it, getting stuck when it tries to do something to the now-missing eGPU.
Speculative Load Hardening
While several approaches are being actively pursued to mitigate specific branches and/or loads inside especially risky software (most notably various OS kernels), these approaches require manual and/or static analysis aided auditing of code and explicit source changes to apply the mitigation. They are unlikely to scale well to large applications. We are proposing a comprehensive mitigation approach that would apply automatically across an entire program rather than through manual changes to the code. While this is likely to have a high performance cost, some applications may be in a good position to take this performance / security tradeoff.
Translating Quake 3 into Rust
The Rust-loving team at Immunant has been hard at work on C2Rust, a migration framework that takes the drudgery out of migrating to Rust. Our goal is to make safety improvements to the translated Rust automatically where we can, and help the programmer do the same where we cannot. First, however, we have to build a rock-solid translator that gets people up and running in Rust. Testing on small CLI programs gets old eventually, so we decided to try translating Quake 3 into Rust. After a couple of days, we were likely the first people to ever play Quake3 in Rust!
A new cycle-stepped 6502 CPU emulator
I wrote a new version of my 6502/6510 emulator in the last weeks which can be stepped forward in clock cycles instead of full instructions.
Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution. When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs. Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically). This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI. Considerations that are implementation-specific are clearly identified throughout.
Chunking Optimizations: Let the Knife Do the Work
“Letting the knife do the work” means writing a correct program and lifting unnecessary constraints so that the compiler can use whatever chunk size is appropriate for the target.
Addressing of AF_INET, AF_INET6 and AF_UNIX sockets
A freshly created socket isn’t very useful. We have to tell it to either listen for incoming data, or connect to a remote peer. To achieve anything useful we need to perform a syscall dance, which involves either bind() or connect() or both.
And some notes about the DNS resolver rabbit hole.
Clang format tanks performance
Let’s benchmark toupper implementations.
Actually, I don’t really care about toupper much at all, but I was writing a different post and needed a peg to hang my narrative hat on, and hey toupper seems like a nice harmless benchmark. Despite my effort to choose something which should be totally straightforward and not sidetrack me, this weird thing popped out.