On Modern Hardware the Min-Max Heap beats a Binary Heap
The heap is a data structure that I use all the time and that others somehow use rarely. (I once had a coworker tell me that he knew some code was mine because it used a heap) Recently I was writing code that could really benefit from using a heap (as most code can) but I needed to be able to pop items from both ends. So I read up on double-ended priority queues and how to implement them. These are rare, but the most common implementation is the “Interval Heap” that can be explained quickly, has clean code and is only slightly slower than a binary heap. But there is an alternative called the “Min-Max Heap” that doesn’t have pretty code, but it has shorter dependency chains, which is important on modern hardware. As a result it often ends up faster than a binary heap, even though it allows you to pop from both ends. Which means there might be no reason to ever use a binary heap again.
Exploring mullender.c - A deep dive into the first IOCCC winner
I will discuss the code, how I got such old and obscure code to run, as well as include snippets from my conversations with one of the original authors (who was very helpful in figuring some of this out). If all that wasn’t enough I managed to obtain the original PDP and VAX source code, it will be hosted here with permission. I want to give a huge thank you to Sjoerd Mullender and Don Libes for their assistance and permission in reproducing some of this material.
Implementing traceroute in Go
This tool is very useful to inspect network paths and solve problems. But aside from that, this tool is extremely interesting and its actual implementation is pretty simple.
Under the Hood of a Simple DNS Server
For this post, I will talk mostly about the details of implementing a DNS server that follows the original two RFCs that laid out the spec: 1034 and 1035.
How to contact Google SRE: Dropping a shell in cloud SQL
Google Cloud SQL is a fully managed relational database service. Customers can deploy a SQL, PostgreSQL or MySQL server which is secured, monitored and updated by Google. More demanding users can easily scale, replicate or configure high-availability. By doing so users can focus on working with the database, instead of dealing with all the previously mentioned complex tasks. Cloud SQL databases are accessible by using the applicable command line utilities or from any application hosted around the world. This write-up covers vulnerabilities that we have discovered in the MySQL versions 5.6 and 5.7 of Cloud SQL.
A 35-year-old bug in patch found in efforts to restore 29 year old 2.11BSD
Larry Wall posted patch 1.3 to mod.sources on May 8, 1985. A number of versions followed over the years. It’s been a faithful alley for a long, long time. I’ve never had a problem with patch until I embarked on the 2.11BSD restoration project. In going over the logs very carefully, I’ve discovered a bug that bites this effort twice. It’s quite interesting to use 27 year old patches to find this bug while restoring a 29 year old OS...
Implementing a Type-safe printf in Rust
I show how to use heterogeneous lists and traits to implement a type-safe printf in Rust. These mechanisms can ensure that two variadic argument lists share important properties, like the number of format string holes matches the number of printf arguments.
How Go 1.15 improved converting small integer values to interfaces
The answer turns out to be pretty straightforward, and is in Go CL 216401 (merged in this commit, which may be easier to read). The Go runtime has a special static array of the first 256 integers (0 to 255), and when it would normally have to allocate memory to store an integer on the heap as part of converting it to an interface, it first checks to see if it can just return a pointer to the appropriate element in the array instead. This kind of static allocation of frequently used values is common in languages with lots of dynamic allocation; Python does something similar for small integers, for example (which can sometimes surprise you).
Proposal: Register-based Go calling convention
We propose switching the Go ABI from its current stack-based calling convention to a register-based calling convention. Preliminary experiments indicate this will achieve at least a 5–10% throughput improvement across a range of applications. This will remain backwards compatible with existing assembly code that assumes Go’s current stack-based calling convention through Go’s multiple ABI mechanism.
This also presents a very nice overview of existing calling conventions.
"Rust does not have a stable ABI"
Or more exactly, why does this happen, and why do people perceive it as a problem?
One Byte to rule them all
For the last several years, nearly all iOS kernel exploits have followed the same high-level flow: memory corruption and fake Mach ports are used to gain access to the kernel task port, which provides an ideal kernel read/write primitive to userspace. Recent iOS kernel exploit mitigations like PAC and zone_require seem geared towards breaking the canonical techniques seen over and over again to achieve this exploit flow. But the fact that so many iOS kernel exploits look identical from a high level begs questions: Is targeting the kernel task port really the best exploit flow? Or has the convergence on this strategy obscured other, perhaps more interesting, techniques? And are existing iOS kernel mitigations equally effective against other, previously unseen exploit flows?
In this blog post, I’ll describe a new iOS kernel exploitation technique that turns a one-byte controlled heap overflow directly into a read/write primitive for arbitrary physical addresses, all while completely sidestepping current mitigations such as KASLR, PAC, and zone_require. By reading a special hardware register, it’s possible to locate the kernel in physical memory and build a kernel read/write primitive without a fake kernel task port. I’ll conclude by discussing how effective various iOS mitigations were or could be at blocking this technique and by musing on the state-of-the-art of iOS kernel exploitation. You can find the proof-of-concept code here.
The core of Apple is PPL: Breaking the XNU kernel's kernel
While doing research for the one-byte exploit technique, I considered several ways it might be possible to bypass Apple’s Page Protection Layer (PPL) using just a physical address mapping primitive, that is, before obtaining kernel read/write or defeating PAC. Given that PPL is even more privileged than the rest of the XNU kernel, the idea of compromising PPL “before” XNU was appealing. In the end, though, I wasn’t able to think of a way to break PPL using the physical mapping primitive alone.
However, it’s not the Project Zero way to leave any mitigation unbroken. So, having exhausted my search for design flaws, I returned to the ever-faithful technique of memory corruption. Sure enough, decompiling a few PPL functions in IDA was sufficient to find some memory corruption.
PopCount on ARM64 in Go Assembler
Apropos of Apple’s ARM announcment, I thought I might write up a post on a recent bit of code I wrote that specifically looks at ARM64, and its benchmarks on various hardware. I’ve been implementing some compact data structures for a project. One of the CPU hotspots for the implementation is the need to run a quick population count across a potentially large bit of memory.
SAT solver on top of regex matcher
A SAT problem is an NP-problem, while regex matching is not. However, a quite popular regex ‘backreferences’ extension extends regex matching to a (hard) NP-problem.
Stupid std::tuple tricks: Getting started
I still believe it would be possible to build a high quality editor based on the original design. But I also believe that this would be quite a complex system, and require significantly more work than necessary.
A few good ideas and observations could be mined out of this post.
Why is there a "V" in SIGSEGV Segmentation Fault?
My program received a SIGSEGV signal and crashed with “Segmentation Fault” message. Where does the “V” come from?
Memory Ordering in Modern Microprocessors
Dumbindent: When 93% of the Time was Spent in Clang-Format
Summary: The Wuffs compiler outputs C code. When compiling its standard library, over 93% of the time (2.680 out of 2.855 seconds) was spent formatting that C code with clang-format. dumbindent is a new command-line tool (and Go package) that formats C code. Its output is not as ‘pretty’, but it can be over 80 times faster than clang-format (0.008 versus 0.668 seconds to format 12k lines of C code).
Latency in Asynchronous Python
This week I was debugging a misbehaving Python program that makes significant use of Python’s asyncio. The program would eventually take very long periods of time to respond to network requests. My first suspicion was a CPU-heavy coroutine hogging the thread, preventing the socket coroutines from running, but an inspection with pdb showed this wasn’t the case. Instead, the program’s author had made a couple of fundamental mistakes using asyncio. Let’s discuss them using small examples.