Annoying details of a Z-buffer rasterizer
I wrote a software rasterizer for occlusion culling and hit many small speed bumps along the way. Here I reveal what I’ve learned in the hope of you writing your own with more ease than I did.
How to (and how not to) fix color banding
I love to use soft gradients as backdrops when doing graphics programming, a love started by a Corona Renderer product shot sample scene shared by user romullus and its use of radial gradients to highlight the product. But they are quite horrible from a design standpoint, since they produce awful color banding, also referred to as posterization. Depending on things like screen type, gradient colors, viewing environment, etc., the effect can be sometimes not present at all, yet sometimes painfully obvious.
Real-time dreamy Cloudscapes with Volumetric Raymarching
I spent the past few months diving into the realm of Raymarching and studying some of its applications that may come in handy for future 3D projects, and while I managed to build a pretty diverse set of scenes, all of them consisted of rendering surfaces or solid objects. My blog post on Raymarching covered some of the many impressive capabilities of this rendering technique, and as I mentioned at the end of that post, that was only the tip of the iceberg; there is a lot more we can do with it.
One fascinating aspect of Raymarching I quickly encountered in my study was its capacity to be tweaked to render volumes. Instead of stopping the raymarched loop once the ray hits a surface, we push through and continue the process to sample the inside of an object. That is where my obsession with volumetric clouds started, and I think the countless hours I spent exploring the many Sky Islands in Zelda Tears of the Kingdom contributed a lot to my curiosity to learn more about how they work. I thus studied a lot of Shadertoy scenes leveraging many Volumetric Raymarching techniques to render smoke, clouds, and cloudscapes, which I obviously couldn’t resist giving a try rebuilding myself:
WebGPU Security Technical Report
In this document we outline how WebGPU works through the mind of an attacker, our vulnerability research methodologies, and our thought processes in some of the more difficult research areas. There are many interesting portions of Chrome graphics that we omitted from review to keep scope manageable. While our primary focus was WebGPU, we did explore a few attack surfaces shared by other graphics features. We will interleave background information on WebGPU with descriptions of the important bugs we found. We hope this report will give the security community a deeper understanding of the shape of vulnerabilities we may come to expect with the addition of WebGPU, along with a lens into the vulnerabilities we might encounter in the future.
How I implemented MegaTextures on real Nintendo 64 hardware
This showcases a demo of megatextures running on n64 hardware. A “megatexture” for the n64 is really just a normal sized textured by modern standards but with that you can do some prebaked scenes that look like they don’t belong on the n64.
Analyzing Starfield’s Performance on Nvidia’s 4090 and AMD’s 7900 XTX
We analyzed this scene using Nvidia’s Nsight Graphics and AMD’s Radeon GPU Profiler to get some insight into why Starfield performs the way it does. On the Nvidia side, we covered the last three generations of cards by testing the RTX 4090, RTX 3090, and Titan RTX. On AMD, we tested the RX 7900 XTX. The i9-13900K was used to collect data for all of these GPUs.
Hacking the Book8088 for Better Accuracy
The Book8088 is trying hard to basically be compatible with the original IBM PC, containing some of the same or equivalent chips. It’s natural to want to put it through its paces, and one of the best tests for IBM PC compatibility has to be the 8088MPH demo. If 8088MPH will run we must be operating pretty darn close to the original.
Now, as it turns out, most of the demo does run, albeit in RGBI mode which loses out on all the cool composite artifact color effects. But most notably the famous Kefrens Bars effect does not display - the screen just goes blank. What’s going wrong on the Book8088 vs a real IBM PC 5150?
Raytraced Order Independent Transparency
About a year ago I reviewed a number of Order Independent Transparency (OIT) techniques (part 1, part 2, part 3), each achieving a difference combination of performance, quality and memory requirements. None of them fully solved OIT though and I ended the series wondering what raytraced transparency would look like. Recently I added (some) DXR support to the toy engine and I was curious to see how it would work, so I did a quick implementation.
The implementation was really simple. Since there is no mechanism to sort the nodes of a BLAS/TLAS based on distance from the camera, the ray generation shader keeps tracing rays using the result of the closest hit shader as the origin for the next ray until there is nothing else to hit.
Commander Keen's Adaptive Tile Refresh
I have been reading Doom Guy by John Romero. It is an excellent book which I highly recommend. In the ninth chapter, John describes being hit by lightning upon seeing Adaptive Tile Refresh (ATS). That made me realize I never took the time to understand how this crucial piece of tech powers the Commander Keen (CK) series.
At its heart the problem ATS solves is bandwidth. Writing 320x200 nibbles (32 KiB) per frame is too much for the ISA bus. There is no way to maintain a 60Hz framerate while refreshing the whole screen. If we were to run the following code, which simply fills all banks, it would run at 5 frames per seconds.
Porting FSR 2 to OpenGL
FSR 2, or FidelityFX Super Resolution 2, is a temporal upscaling (TAAU) algorithm developed by AMD. It is comparable to Nvidia’s DLSS, except it is completely open-source and doesn’t require vendor-specific GPU features (tensor cores) to run.
I’ve been floating the idea of making an OpenGL backend for FSR 2 for a while now. However, only recently have I acquired the motivation to actually do it. I knew that writing a bespoke TAA(U) implementation, let alone a good one, was a task worthy of the gods, so I wanted to defer it to them.
MMC2 Magic - How Punch-Out's Graphics Work
Two boxers. No flicker? How is this possible? The MMC2 Mapper chip: Explained
How to draw too many sprites by making one a background, then how to smoothly animate it by side scrolling.
Paving the Road to Vulkan on Asahi Linux
In every modern OS, GPU drivers are split into two parts: a userspace part, and a kernel part. The kernel part is in charge of managing GPU resources and how they are shared between apps, and the userspace part is in charge of converting commands from a graphics API (such as OpenGL or Vulkan) into the hardware commands that the GPU needs to execute.
Between those two parts, there is something called the Userspace API or “UAPI”. This is the interface that they use to communicate between them, and it is specific to each class of GPUs! Since the exact split between userspace and the kernel can vary depending on how each GPU is designed, and since different GPU designs require different bits of data and parameters to be passed between userspace and the kernel, each new GPU driver requires its own UAPI to go along with it.
Exploiting aCropalypse: Recovering Truncated PNGs
Why some GitHub labels are illegible
essentially the text of the label will be colored white if perceived-lightness<0.453 and black otherwise. However, when the perceived-lightness is very close to the threshold, we don’t trigger the min or max and actually get some sort of grey color for the label.
Version 1 of Stable Attribution’s algorithm decodes an image generated by an A.I. model into the most similar examples from the data that the model was trained with. Usually, the image the model creates doesn’t exist in its training data - it’s new - but because of the training process, the most influential images are the most visually similar ones, especially in the details.
Alejandro Jodorowsky’s “Tron”
I was recently shown some frames from a film that I had never heard of: Alejandro Jodorowsky’s 1976 version of “Tron.” The sets were incredible. The actors, unfamiliar to me, looked fantastic in their roles. The costumes and lighting worked together perfectly. The images glowed with an extravagant and psychedelic sensibility that felt distinctly Jodorowskian.
The truth is that these weren’t stills from a long-lost movie. They weren’t photos at all. These evocative, well-composed and tonally immaculate images were generated in seconds with the magic of artificial intelligence.
The “interactive” elements are annoying, but some pretty pictures here.
The Apple GPU and the Impossible Bug
In late 2020, Apple debuted the M1 with Apple’s GPU architecture, AGX, rumoured to be derived from Imagination’s PowerVR series. Since then, we’ve been reverse-engineering AGX and building open source graphics drivers. Last January, I rendered a triangle with my own code, but there has since been a heinous bug lurking: The driver fails to render large amounts of geometry.
The Unreasonable Effectiveness of JPEG: A Signal Processing Approach
The JPEG algorithm is rather complex and in this video, we break down the core parts of the algorithm, specifically color spaces, YCbCr, chroma subsampling, the discrete cosine transform, quantization, and lossless encoding. The majority of the focus is on the mathematical and signal processing insights that lead to advancements in image compression and the big themes in compression as a whole that we can take away from it.
Fixing the entire SM64 Source Code (Insane N64 performance)
Recap of a lot of work, optimizing and rewriting code to squeeze out performance on limited hardware.
Exponentially Better Rotations
If you’ve done any 3D programming, you’ve likely encountered the zoo of techniques and representations used when working with 3D rotations. Some of them are better than others, depending on the situation.