What went wrong with the libdispatch. A tale of caution for the future of concurrency.
The future was multithreading and we had to use the libdispatch to get there. So we did.
As we went down that rabbit hole, things got progressively worse.
What they don’t tell you about demand paging in school
This post details my adventures with the Linux virtual memory subsystem, and my discovery of a creative way to taunt the OOM (out of memory) killer by accumulating memory in the kernel, rather than in userspace.
Good look at practical realities.
Floating Point in the Browser, Part 3: When x+y=x
That is, if you add a small number to a large number then if the small number is “too small” then the large number may (in the default/sane round-to-nearest mode) stay at the same value.
Because of this the loop spins endlessly and the push command runs until the array hits the size limits. If there were no size limits then the push command would keep running until the entire machine ran out of memory, so, yay?
Rust after the honeymoon
So Rust is going really well for us at Oxide, but for the moment I want to focus on more personal things — reasons that I personally have enjoyed implementing in Rust. These run the gamut: some are tiny but beautiful details that allow me to indulge in the pleasure of the craft; some are much more profound features that represent important advances in the state of the art; and some are bodies of software developed by the Rust community, notable as much for their reflection of who is attracted to Rust (and why) as for the artifacts themselves.
Dissecting Lemire’s nearly divisionless random
The idea was simple, I’ve always felt that code readability is undervalued so I figured I’d put cold hard cash up. I announced a $1,000 pot, divided into $500, $300, and $200 prizes for the most readable implementations of Daniel Lemire’s nearly divisionless algorithm for selecting a random number from an interval. I now have winners to announce and congratulate, and they’re in this blog post, but there’s more to this story.
Performance of Elixir's System.get_env/0 Function
At work I was debugging a performance issue in one of our Elixir applications and stumbled across the strange implementation of Elixir’s System.get_env/0 function. In this blog post I’ll show how it caused performance issues for the application I was debugging and I’ll also propose a better implementation of the function. I’ll conclude by explaining why the better implementation is not used yet.
On Modern Hardware the Min-Max Heap beats a Binary Heap
The heap is a data structure that I use all the time and that others somehow use rarely. (I once had a coworker tell me that he knew some code was mine because it used a heap) Recently I was writing code that could really benefit from using a heap (as most code can) but I needed to be able to pop items from both ends. So I read up on double-ended priority queues and how to implement them. These are rare, but the most common implementation is the “Interval Heap” that can be explained quickly, has clean code and is only slightly slower than a binary heap. But there is an alternative called the “Min-Max Heap” that doesn’t have pretty code, but it has shorter dependency chains, which is important on modern hardware. As a result it often ends up faster than a binary heap, even though it allows you to pop from both ends. Which means there might be no reason to ever use a binary heap again.
Exploring mullender.c - A deep dive into the first IOCCC winner
I will discuss the code, how I got such old and obscure code to run, as well as include snippets from my conversations with one of the original authors (who was very helpful in figuring some of this out). If all that wasn’t enough I managed to obtain the original PDP and VAX source code, it will be hosted here with permission. I want to give a huge thank you to Sjoerd Mullender and Don Libes for their assistance and permission in reproducing some of this material.
Implementing traceroute in Go
This tool is very useful to inspect network paths and solve problems. But aside from that, this tool is extremely interesting and its actual implementation is pretty simple.
Under the Hood of a Simple DNS Server
For this post, I will talk mostly about the details of implementing a DNS server that follows the original two RFCs that laid out the spec: 1034 and 1035.
How to contact Google SRE: Dropping a shell in cloud SQL
Google Cloud SQL is a fully managed relational database service. Customers can deploy a SQL, PostgreSQL or MySQL server which is secured, monitored and updated by Google. More demanding users can easily scale, replicate or configure high-availability. By doing so users can focus on working with the database, instead of dealing with all the previously mentioned complex tasks. Cloud SQL databases are accessible by using the applicable command line utilities or from any application hosted around the world. This write-up covers vulnerabilities that we have discovered in the MySQL versions 5.6 and 5.7 of Cloud SQL.
A 35-year-old bug in patch found in efforts to restore 29 year old 2.11BSD
Larry Wall posted patch 1.3 to mod.sources on May 8, 1985. A number of versions followed over the years. It’s been a faithful alley for a long, long time. I’ve never had a problem with patch until I embarked on the 2.11BSD restoration project. In going over the logs very carefully, I’ve discovered a bug that bites this effort twice. It’s quite interesting to use 27 year old patches to find this bug while restoring a 29 year old OS...
Implementing a Type-safe printf in Rust
I show how to use heterogeneous lists and traits to implement a type-safe printf in Rust. These mechanisms can ensure that two variadic argument lists share important properties, like the number of format string holes matches the number of printf arguments.
How Go 1.15 improved converting small integer values to interfaces
The answer turns out to be pretty straightforward, and is in Go CL 216401 (merged in this commit, which may be easier to read). The Go runtime has a special static array of the first 256 integers (0 to 255), and when it would normally have to allocate memory to store an integer on the heap as part of converting it to an interface, it first checks to see if it can just return a pointer to the appropriate element in the array instead. This kind of static allocation of frequently used values is common in languages with lots of dynamic allocation; Python does something similar for small integers, for example (which can sometimes surprise you).
Proposal: Register-based Go calling convention
We propose switching the Go ABI from its current stack-based calling convention to a register-based calling convention. Preliminary experiments indicate this will achieve at least a 5–10% throughput improvement across a range of applications. This will remain backwards compatible with existing assembly code that assumes Go’s current stack-based calling convention through Go’s multiple ABI mechanism.
This also presents a very nice overview of existing calling conventions.
"Rust does not have a stable ABI"
Or more exactly, why does this happen, and why do people perceive it as a problem?
One Byte to rule them all
For the last several years, nearly all iOS kernel exploits have followed the same high-level flow: memory corruption and fake Mach ports are used to gain access to the kernel task port, which provides an ideal kernel read/write primitive to userspace. Recent iOS kernel exploit mitigations like PAC and zone_require seem geared towards breaking the canonical techniques seen over and over again to achieve this exploit flow. But the fact that so many iOS kernel exploits look identical from a high level begs questions: Is targeting the kernel task port really the best exploit flow? Or has the convergence on this strategy obscured other, perhaps more interesting, techniques? And are existing iOS kernel mitigations equally effective against other, previously unseen exploit flows?
In this blog post, I’ll describe a new iOS kernel exploitation technique that turns a one-byte controlled heap overflow directly into a read/write primitive for arbitrary physical addresses, all while completely sidestepping current mitigations such as KASLR, PAC, and zone_require. By reading a special hardware register, it’s possible to locate the kernel in physical memory and build a kernel read/write primitive without a fake kernel task port. I’ll conclude by discussing how effective various iOS mitigations were or could be at blocking this technique and by musing on the state-of-the-art of iOS kernel exploitation. You can find the proof-of-concept code here.
The core of Apple is PPL: Breaking the XNU kernel's kernel
While doing research for the one-byte exploit technique, I considered several ways it might be possible to bypass Apple’s Page Protection Layer (PPL) using just a physical address mapping primitive, that is, before obtaining kernel read/write or defeating PAC. Given that PPL is even more privileged than the rest of the XNU kernel, the idea of compromising PPL “before” XNU was appealing. In the end, though, I wasn’t able to think of a way to break PPL using the physical mapping primitive alone.
However, it’s not the Project Zero way to leave any mitigation unbroken. So, having exhausted my search for design flaws, I returned to the ever-faithful technique of memory corruption. Sure enough, decompiling a few PPL functions in IDA was sufficient to find some memory corruption.
PopCount on ARM64 in Go Assembler
Apropos of Apple’s ARM announcment, I thought I might write up a post on a recent bit of code I wrote that specifically looks at ARM64, and its benchmarks on various hardware. I’ve been implementing some compact data structures for a project. One of the CPU hotspots for the implementation is the need to run a quick population count across a potentially large bit of memory.
SAT solver on top of regex matcher
A SAT problem is an NP-problem, while regex matching is not. However, a quite popular regex ‘backreferences’ extension extends regex matching to a (hard) NP-problem.