Parsing Protobuf at 2+GB/s: How I Learned To Love Tail Calls in C
While tail calls are usually associated with a functional programming style, I am interested in them purely for performance reasons. It turns out that in some cases we can use tail calls to get better code out of the compiler than would otherwise be possible—at least given current compiler technology—without dropping to assembly.
Why Keyboard Shortcuts don't work on non-US Layouts and how Devs could fix it
This is most annoying when the most important keyboard shortcuts are inaccessible. A very common shortcut is / for accessing search functionality. Unfortunately, there is no /-key on most international layouts. Adding modifiers to produce this key with your layout rarely helps. For example, on my German layout, / is produced via Shift+7. Most web applications will ignore this. Similarly painful is when Electron apps use [ and ] for navigating backwards and forwards.
If you use a US layout, you might be surprised to hear about these problems. But rest assured, they are not new and I am not the only one who is affected. We are at a point where it is easy to find users complaining about this for almost any popular web application.
Eliminating Data Races in Firefox – A Technical Report
We successfully deployed ThreadSanitizer in the Firefox project to eliminate data races in our remaining C/C++ components. In the process, we found several impactful bugs and can safely say that data races are often underestimated in terms of their impact on program correctness. We recommend that all multithreaded C/C++ projects adopt the ThreadSanitizer tool to enhance code quality.
The multi-generational LRU
One of the key tasks assigned to the memory-management subsystem is to optimize the system’s use of the available memory; that means pushing out pages containing unused data so that they can be put to better use elsewhere. Predicting which pages will be accessed in the near future is a tricky task, and the kernel has evolved a number of mechanisms designed to improve its chances of guessing right. But the kernel not only often gets it wrong, it also can expend a lot of CPU time to make the incorrect choice. The multi-generational LRU patch set posted by Yu Zhao is an attempt to improve that situation.
This Man Thought Opening a TXT File Is Fine, He Thought Wrong. MacOS CVE-2019-8761
This research originated when I realized the default text reader on OSX, TextEdit is used to open files with TXT extension by default. On the interface of TextEdit, it looked like you can do basic customization to your text (you can turn text bold, italic, change color etc...), so I was wondering how a TXT file was storing and parsing this information. It seems it uses RTF format instead of TXT if we add customizations to the text.
Cells Form Into ‘Xenobots’ on Their Own
Embryonic cells can self-assemble into new living forms that don’t resemble the bodies they usually generate, challenging old ideas of what defines an organism.
The end of TenFourFox and what I've learned from it
Counting cycles and instructions on the Apple M1 processor
Recently, one of the readers of my blog (Duc Tri Nguyen) showed me how, inspired by code from Dougall Johnson. Dougall has been doing interesting research on Apple’s processors. As far as I can tell, it is entirely undocumented and could blow up your computer. Thankfully, to access the performance counters, you need administrative access (wheel group). In practice, it means that you could start your instrumented program in a shell using sudo so that your program has, itself, administrative privileges.
To illustrate the approach, I have posted a full C++ project which builds an instrumented benchmark. You need administrative access and an Apple M1 system. I assume you have installed the complete developer kit with command-line utilities provided by Apple.
Substack's UI and 1Password just cost me $2,023
As part of a Zoom call today, I tried to sign up for a $10 monthly subscription on a Substack page to test the user journey. I paid $2,023.
When I’ve clicked my card details in 1Password, it’s entered my expiry year in the hidden, custom subscription amount box (I’m not sure why - is this a 1Password bug?). Because this box has now changed value, the Substack UI has automatically selected this option. I’ve then hit “Subscribe” before I had time to notice and 💸 $2,023.
Don't End The Week With Nothing
I’m a capitalist. A friend of mine is a devoted Marxist. I think we mutually agree that, considering any particular employee, it is in that employee’s personal interest to stop selling hours of labor and start renting access to his accumulated capital as soon as humanly possible.
A lot of day jobs structurally inhibit capital formation. If I were a Marxist I’d say “And this is an intended consequence of Capital’s desire to keep Labor subservient to it”, but I honestly think it’s true even without anybody needing to twirl their mustache.
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
We introduce the problem of perpetual view generation—long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image. This is a challenging problem that goes far beyond the capabilities of current view synthesis methods, which work for a limited range of viewpoints and quickly degenerate when presented with a large camera motion. Methods designed for video generation also have limited ability to produce long video sequences and are often agnostic to scene geometry. We take a hybrid approach that integrates both geometry and image synthesis in an iterative render, refine, and repeat framework, allowing for long-range generation that cover large distances after hundreds of frames. Our approach can be trained from a set of monocular video sequences without any manual annotation. We propose a dataset of aerial footage of natural coastal scenes, and compare our method with recent view synthesis and conditional video generation baselines, showing that it can generate plausible scenes for much longer time horizons over large camera trajectories compared to existing methods.
Cranelift, Part 3: Correctness in Register Allocation
In this post, I will cover how we worked to ensure correctness in our register allocator, regalloc.rs, by developing a symbolic checker that uses abstract interpretation to prove correctness for a specific register allocation result. By using this checker as a fuzzing oracle, and driving just the register allocator with a focused fuzzing target, we have been able to uncover some very interesting and subtle bugs, and achieve a fairly high confidence in the allocator’s robustness.
Wrecking sandwich traders for fun and profit
However, nothing is risk-free on the blockchain, and exploitative trading strategies such as sandwich trading and front-running actually increase in risk the more the engineer attempts to generalise their ability to capture opportunities.
To illustrate to novice traders the risks of playing in the mempool, I have conducted a demonstration of a new trading alpha I call “Salmonella”, which involves intentionally exploiting the generalised nature of front-running setups. The goal of sandwich trading is to exploit the slippage of unintended victims, so this strategy turns the tables on the exploiters.
SSH and User-mode IP WireGuard
For a couple hundred lines of code (not counting the entire user-mode Linux you’ll be pulling in from gVisor, HEY! Dependencies! What are you gonna do!) you can bring up a new, cryptographically authenticated network, any time you want to, in practically any program.
There really are some fun libraries out there if you want to build something crazy.
Lord of the Ring(s): Side Channel Attacks on the CPU On-Chip Ring Interconnect Are Practical
We introduce the first microarchitectural side channel attacks that leverage contention on the CPU ring interconnect. There are two challenges that make it uniquely difficult to exploit this channel. First, little is known about the ring interconnect’s functioning and architecture. Second, information that can be learned by an attacker through ring contention is noisy by nature and has coarse spatial granularity. To address the first challenge, we perform a thorough reverse engineering of the sophisticated protocols that handle communication on the ring interconnect. With this knowledge, we build a cross-core covert channel over the ring interconnect with a capacity of over 4 Mbps from a single thread, the largest to date for a cross-core channel not relying on shared memory. To address the second challenge, we leverage the fine-grained temporal patterns of ring contention to infer a victim program’s secrets. We demonstrate our attack by extracting key bits from vulnerable EdDSA and RSA implementations, as well as inferring the precise timing of keystrokes typed by a victim user.
The Kilobyte's Gambit
Lotus 1-2-3 reversing
A ton of hacking later, and I do now have a usable driver for dosemu that supports arbitrary resolutions, just look at all those columns!
How I cut GTA Online loading times by 70%
Some debug-stepping later it turns out it’s… JSON!
Of course it is. But a really solid reversing effort. And a nice fix.
An Exploration of JSON Interoperability Vulnerabilities
The same JSON document can be parsed with different values across microservices, leading to a variety of potential security risks. If you prefer a hands-on approach, try the labs and when they scare you, come back and read on.
Bilinear down/upsampling, aligning pixel grids, and that infamous GPU half pixel offset
It’s been more than two decades of me using bilinear texture filtering, a few months since I’ve written about bilinear resampling, but only two days since I discovered a bug of mine related to it. 😅 Similarly, just last week a colleague asked for a very fast implementation of bilinear on a CPU and it caused a series of questions “which kind of bilinear?”.
So I figured it’s an opportunity for another short blog post – on bilinear filtering, but in context of down/upsampling. We will touch here on GPU half pixel offsets, aligning pixel grids, a bug / confusion in Tensorflow, deeper signal processing analysis of what’s going on during bilinear operations, and analysis of the magic of the famous “magic kernel”.