How I implemented MegaTextures on real Nintendo 64 hardware
This showcases a demo of megatextures running on n64 hardware. A “megatexture” for the n64 is really just a normal sized textured by modern standards but with that you can do some prebaked scenes that look like they don’t belong on the n64.
The Internet Worm Program: An Analysis
This report gives a detailed description of the components of the worm program—data and functions. It is based on study of two completely independent reverse-compilations of the worm and a version disassembled to VAX assembly language. Almost no source code is given in the paper because of current concerns about the state of the ‘‘immune system’’ of Internet hosts, but the description should be detailed enough to allow the reader to understand the behavior of the program.
And some modern commentary: https://infosec.exchange/@hovav/110950949212380779
FreeBSD on Firecracker
Experiences porting FreeBSD 14 to run on the Firecracker VMM
Shamir Secret Sharing
It’s 3am. Paul, the head of PayPal database administration carefully enters his elaborate passphrase at a keyboard in a darkened cubicle of 1840 Embarcadero Road in East Palo Alto, for the fifth time. He hits Return. The green-on-black console window instantly displays one line of text: “Sorry, one or more wrong passphrases. Can’t reconstruct the key. Goodbye.”
This is the story of a catastrophic software bug I briefly introduced into the PayPal codebase that almost cost us the company (or so it seemed, in the moment.)
Today, should you try to read up the programmer’s manual (AKA the man page) on getpass, you will find it has been long declared obsolete and replaced with a more intelligent alternative in nearly all flavors of modern Unix.
Raytraced Order Independent Transparency
About a year ago I reviewed a number of Order Independent Transparency (OIT) techniques (part 1, part 2, part 3), each achieving a difference combination of performance, quality and memory requirements. None of them fully solved OIT though and I ended the series wondering what raytraced transparency would look like. Recently I added (some) DXR support to the toy engine and I was curious to see how it would work, so I did a quick implementation.
The implementation was really simple. Since there is no mechanism to sort the nodes of a BLAS/TLAS based on distance from the camera, the ray generation shader keeps tracing rays using the result of the closest hit shader as the origin for the next ray until there is nothing else to hit.
Commander Keen's Adaptive Tile Refresh
I have been reading Doom Guy by John Romero. It is an excellent book which I highly recommend. In the ninth chapter, John describes being hit by lightning upon seeing Adaptive Tile Refresh (ATS). That made me realize I never took the time to understand how this crucial piece of tech powers the Commander Keen (CK) series.
At its heart the problem ATS solves is bandwidth. Writing 320x200 nibbles (32 KiB) per frame is too much for the ISA bus. There is no way to maintain a 60Hz framerate while refreshing the whole screen. If we were to run the following code, which simply fills all banks, it would run at 5 frames per seconds.
What should happen if the processor speculatively executed a vzeroupper, but then discovers that there was a branch misprediction? Well, we will have to revert that operation and put things back the way they were… maybe we can just unset that z-bit?
If we return to the analogy of malloc and free, you can see that it can’t be that simple - that would be like calling free() on a pointer, and then changing your mind!
That would be a use-after-free vulnerability, but there is no such thing as a use-after-free in a CPU… or is there?
Shoot ’em up in style: the making of Gun Trails on Playdate
Enter Playdate. I had wanted to build a shmup for years, but for various reasons—primarily bad scoping—the efforts always sputtered out. This little yellow device could provide the constraints needed, with the added bonus of a programming challenge to hit consistently high framerates.
Let’s abuse a bug in java.lang.String to make some weird Strings. We’ll make “hello world” not start with “hello”, and show that not all empty Strings are equal to each other.
Initializing Large Static Maps in Go
This article discusses the technical details of static initialization for map data in Go binaries, and some alternative strategies for dealing with the performance impacts.
Porting FSR 2 to OpenGL
FSR 2, or FidelityFX Super Resolution 2, is a temporal upscaling (TAAU) algorithm developed by AMD. It is comparable to Nvidia’s DLSS, except it is completely open-source and doesn’t require vendor-specific GPU features (tensor cores) to run.
I’ve been floating the idea of making an OpenGL backend for FSR 2 for a while now. However, only recently have I acquired the motivation to actually do it. I knew that writing a bespoke TAA(U) implementation, let alone a good one, was a task worthy of the gods, so I wanted to defer it to them.
acme.sh runs arbitrary commands from a remote server
Now it became immediately obvious to my why HiCA only supports acme.sh. They are not conforming to ACME at all! (Bugs the heck outa me that they’re using the official ACME logo on their site even though they don’t implement the ACME standard.)
Instead, HiCA is stealthily crafting curl commands and piping the output to bash. acme.sh is (being tricked into?) running arbitrary code from a remote server.
When LIMIT 9 works but LIMIT 10 hangs
So then bytes 3 and 4 should be that 16-bit payload length — and this is where things fall apart. The ws message says we have 126 bytes (00000000 01111110) of payload. That sounds plausible. The undici message says we have 25,888 bytes (01100101 00100000) of payload … in a 222 byte packet? Yeah: this one is fishy.
FAAS in Go with WASM, WASI and Rust
This post is best described as a technology demonstration; it melds together web servers, plugins, WebAssembly, Go, Rust and ABIs. Here’s what it shows:
How to load WASM code with WASI in a Go environment and hook it up to a web server.
How to implement web server plugins in any language that can be compiled to WASM.
How to translate Go programs into WASM that uses WASI.
How to translate Rust programs into WASM that uses WASI.
How to write WAT (WebAssembly Text) code that uses WASI to interact with a non-JS environment.
Beautiful Branchless Binary Search
I read a blog post by Alex Muscar, “Beautiful Binary Search in D“. It describes a binary search called “Shar’s algorithm”. I’d never heard of it and it’s impossible to google, but looking at the algorithm I couldn’t help but think “this is branchless.” And who knew that there could be a branchless binary search? So I did the work to translate it into a algorithm for C++ iterators, no longer requiring one-based indexing or fixed-size arrays.
Dumb bugs: the PCI device that wasn't
So pci_notify() gets called with our VIO device (somehow), and we’re converting that struct device into a struct pci_dev with no error checking. We could solve this particular bug by just checking that our device is actually a PCI device before we proceed - but we’re in a function called pci_notify, we’re expecting a PCI device to come in, so this would just be a bandaid.
A fork() in the road
The received wisdom suggests that Unix’s unusual combination of fork() and exec() for process creation was an inspired design. In this paper, we argue that fork was a clever hack for machines and programs of the 1970s that has long outlived its usefulness and is now a liability. We catalog the ways in which fork is a terrible abstraction for the modern programmer to use, describe how it compromises OS implementations, and propose alternatives.
A world to win - WebAssembly for the rest of us
As it turns out, there is a reason that there is no good Scheme implementation on WebAssembly: the initial version of WebAssembly is a terrible target if your language relies on the presence of a garbage collector. There have been some advances but this observation still applies to the current standardized and deployed versions of WebAssembly. To better understand this issue, let’s dig into the guts of the system to see what the limitations are.
Paving the Road to Vulkan on Asahi Linux
In every modern OS, GPU drivers are split into two parts: a userspace part, and a kernel part. The kernel part is in charge of managing GPU resources and how they are shared between apps, and the userspace part is in charge of converting commands from a graphics API (such as OpenGL or Vulkan) into the hardware commands that the GPU needs to execute.
Between those two parts, there is something called the Userspace API or “UAPI”. This is the interface that they use to communicate between them, and it is specific to each class of GPUs! Since the exact split between userspace and the kernel can vary depending on how each GPU is designed, and since different GPU designs require different bits of data and parameters to be passed between userspace and the kernel, each new GPU driver requires its own UAPI to go along with it.
Discovering one bug after another in the UTF-8 decoding logic in OpenBSD, then going on to fix other aspects of related code.
Still, the debugging process we went through here to discover the cause of the problems in the first place is worth sharing from the beginning, as the code in question was particularly bad with plenty of textbook mistakes. Who knows what you might find in your own investigations elsewhere.