How to Make Rust Leak Memory (Also: How to Make It Stop)
Of course you can leak memory, even in Rust. For even medium-sized long-running applications, lots of graphs from a good memory profiler can make life better. And they’ll probably help you find the memory leak too.
Porting Zelda Classic to the Web
I spent the last two months (roughly ~150 hours) porting Zelda Classic to run in a web browser.
I hope my efforts result in Zelda Classic reaching a larger audience. It’s been challenging work, far outside my comfort zone of web development, and I’ve learned a lot about WebAssembly, CMake and multithreading. Along the way, I discovered bugs across multiple projects and did due diligence in fixing (or just reporting) them when I could, and even proposed a change to the HTML spec.
Weird how there’s bugs everywhere one looks.
The case of the failed exchange of the vtable slot
This shell extension is trying to detour the operating system, and it failed. (Note that Windows does not support apps detouring the operating system. This shell extension has exited into unsupported territory.)
Changing std::sort at Google’s Scale and Beyond
We are changing std::sort in LLVM’s libcxx. That’s a long story of what it took us to get there and all possible consequences, bugs you might encounter with examples from open source. We provide some benchmarks, perspective, why we did this in the first place and what it cost us with exciting ideas from Hyrum’s Law to reinforcement learning. All changes went into open source and thus I can freely talk about all of them.
This article is split into 3 parts, the first is history with all details of recent (and not so) past of sorting in C++ standard libraries. Second part is about what it takes to switch from one sorting algorithm to another with various bugs. The final one is about the implementation we have chosen with all optimizations we have done.
CVE-2022-21449: Psychic Signatures in Java
One side of the equation is r and the other side is multiplied by r and a value derived from s. So it would obviously be a really bad thing if r and s were both 0, because then you’d be checking that 0 = 0 ⨉ [a bunch of stuff], which will be true regardless of the value of [a bunch of stuff]! And that bunch of stuff is the important bits like the message and the public key. This is why the very first check in the ECDSA verification algorithm is to ensure that r and s are both >= 1.
Guess which check Java forgot?
An unexpected Redis sandbox escape affecting only Debian, Ubuntu, and other derivatives
This post describes how I broke the Redis sandbox, but only for Debian and Debian-derived Linux distributions. Upstream Redis is not affected. That makes it a Debian vulnerability, not a Redis one. The culprit, if you will, is dynamic linking, but there will be more on that later.
Why Keyboard Shortcuts don't work on non-US Layouts and how Devs could fix it
This is most annoying when the most important keyboard shortcuts are inaccessible. A very common shortcut is / for accessing search functionality. Unfortunately, there is no /-key on most international layouts. Adding modifiers to produce this key with your layout rarely helps. For example, on my German layout, / is produced via Shift+7. Most web applications will ignore this. Similarly painful is when Electron apps use [ and ] for navigating backwards and forwards.
If you use a US layout, you might be surprised to hear about these problems. But rest assured, they are not new and I am not the only one who is affected. We are at a point where it is easy to find users complaining about this for almost any popular web application.
Uncovering a 24-year-old bug in the Linux Kernel
When one side’s receive buffer (Recv-Q) fills up (in this case because the rsync process is doing disk I/O at a speed slower than the network’s), it will send out a zero window advertisement, which will put that direction of the connection on hold. When buffer space eventually frees up, the kernel will send an unsolicited window update with a non-zero window size, and the data transfer continues. To be safe, just in case this unsolicited window update is lost, the other end will regularly poll the connection state using the so-called Zero Window Probes (the persist mode we are seeing here).
Apparently, the bug was in the bulk receiver fast-path, a code path that skips most of the expensive, strict TCP processing to optimize for the common case of bulk data reception. This is a significant optimization, outlined 28 years ago² by Van Jacobson in his “TCP receive in 30 instructions” email. Apparently the Linux implementation did not update snd_wl1 while in the receiver fast path. If a connection uses the fast path for too long, snd_wl1 will fall so far behind that ack_seq will wrap around with respect to it. And if this happens while the receive window is zero, there is no way to re-open the window, as demonstrated above. What’s more, this bug had been present in Linux since v2.1.8, dating back to 1996!
Push some big numbers through your system and look for bugs
Why does this matter? Okay, let’s say you have a JSON message where you pass around the unique ID of some object in your system. Let’s further say that your system “mints” IDs out of a 64 bit number space, and it spreads them around, so large numbers can turn up every now and then. What happens when you finally get an object ID with a value of 1152921504606846976 and put it into a message?
node.example.com Is An IP Address
This takes a bit to get to the punchline, but man, good old duck typing for the win.
It turns out that, under certain conditions, the ipaddress module can create IPv6 addresses from raw bytes. My assumption is that it offers this behavior as a convenient way to parse IP addresses from data fresh off the wire.
Does node.example.com meet those certain conditions? You bet it does. Because we’re using Python 2 it’s just bytes and it happens to be 16 characters long.
The Easy Ones – Three Bugs Hiding in the Open
If everyone on a project spends all of their time heads-down working on the features and known bugs then there are probably some easy bugs hiding in plain sight. Take some time to look through the logs, clean up compiler warnings (although, really, if you have compiler warnings you need to rethink your life choices), and spend a few minutes running a profiler. Extra points if you add custom logging, enable some new warnings, or use a profiler that nobody else does.
Fixing a 3+ year old bug in NVIDIA GeForce Experience
So the issue is such: If you have a joystick plugged in, and the GeForce Experience overlay enabled, your display will not sleep. If you unplug the joystick, the display sleeps. If you disable the overlay, the display sleeps. You can have one or the other - but not both. People hadn’t just tracked the issue down - people tracked it down 3 years ago!
But now for the deep dive disassembly to find and fix the bug. Solid work.
Floating Point in the Browser, Part 3: When x+y=x
That is, if you add a small number to a large number then if the small number is “too small” then the large number may (in the default/sane round-to-nearest mode) stay at the same value.
Because of this the loop spins endlessly and the push command runs until the array hits the size limits. If there were no size limits then the push command would keep running until the entire machine ran out of memory, so, yay?
The Watchdog Hydra
Ammar checked, and sure enough, his code was sending hundreds of thousands of requests per second. It didn’t take him long to figure out why: requests from the watchdog were failing with a 500 error, so it called the login method. The login method had been succeeding, so another watchdog got scheduled. Thirty seconds later, that failed, as did all the previously scheduled watchdogs, which all called login again. Which, on success, scheduled a fresh round of watchdogs. Every thirty seconds, the number of scheduled calls doubled. Before long, Ammar’s code was DoSing the API.
A Massive Leak
“Memory leaks are impossible in a garbage collected language!” is one of my favorite lies. It feels true, but it isn’t. Sure, it’s much harder to make them, and they’re usually much easier to track down, but you can still create a memory leak. Most times, it’s when you create objects, dump them into a data structure, and never empty that data structure. Usually, it’s just a matter of finding out what object references are still being held. Usually.
A few months ago, I discovered a new variation on that theme. I was working on a C# application that was leaking memory faster than bad waterway engineering in the Imperial Valley.
Here's why your Samsung Blu-ray player bricked itself: It downloaded an XML config file that broke the firmware
This file, when fetched and saved to the device’s flash storage and processed by the equipment, crashed the system software and force a reboot. Upon reboot, the player parsed the XML file again from its flash storage, crashed and rebooted again. And so on, and so on, and so on. Crucially, the XML file would be parsed before a new one could be fetched from the internet, so once the bad configuration file was fetched and stored by these particular Samsung Blu-ray players in the field, they were bricked.
Major Bug in glibc is Killing Applications With a Memory Limit
malloc() preallocates large chunks of memory, per thread. This is meant as a performance optimization, to reduce memory contention in highly threaded applications. On a typical physical server, dual Xeon CPU with a terabyte of RAM. The core count is easily 40 or above. 10 cores * 2 CPU * 2 for hyper threading. This means a preallocation of up to 20 GB of RAM in the process.
Three bugs in the Go MySQL Driver
Adding to this challenge, authzd is deployed to our Kubernetes clusters, where we’ve been experiencing issues with high latencies when opening new TCP connections, something that particularly affects the pooling of connections in the Go MySQL driver. One of the most dangerous lies that programmers tell themselves is that the network is reliable, because, well, most of the time the network is reliable. But when it gets slow or spotty, that’s when things start breaking, and we get to find out the underlying issues in the libraries we take for granted.
Good walkthrough of dealing with some unfriendly bugs.
Hunting a Linux kernel bug
Earlier last year, we identified a firewall misconfiguration which accidentally dropped most network traffic. We expected resetting the firewall configuration to fix the issue, but resetting the firewall configuration exposed a kernel bug!
OpenSMTPD advisory dissected
Qualys contacted by e-mail to tell me they found a vulnerability in OpenSMTPD and would send me the encrypted draft for advisory. Receiving this kind of e-mail when working on a daemon that can’t revoke completely privileges is not a thing you want to read, particularly when you know how efficient they are at spotting a small bug and leveraging into a full-fledged clusterfuck.
Legacy code bad, even when it’s freshly written legacy code.