Annoying details of a Z-buffer rasterizer
I wrote a software rasterizer for occlusion culling and hit many small speed bumps along the way. Here I reveal what I’ve learned in the hope of you writing your own with more ease than I did.
Low-level thinking in high-level shading languages 2023
This, and the followup, is a presentation that I recommend as required reading to people wanting to get deeper into shader programming, not just for the knowledge but also the attitude towards shader programming (check compiler output, never assume, always profile). It has been 10 years since it was released though; in those 10 years a lot of things have changed on the GPU/shader model/shader compiler front and not all the suggestions in those presentations are still valid. So I decided to do a refresh with a modern compiler and shader model to see what still holds true and what doesn’t. I will target the RDNA 2 GPU architecture on PC using HLSL, the 6.7 shader model and the DXC compiler (using https://godbolt.org/) in this blog post.
How to (and how not to) fix color banding
I love to use soft gradients as backdrops when doing graphics programming, a love started by a Corona Renderer product shot sample scene shared by user romullus and its use of radial gradients to highlight the product. But they are quite horrible from a design standpoint, since they produce awful color banding, also referred to as posterization. Depending on things like screen type, gradient colors, viewing environment, etc., the effect can be sometimes not present at all, yet sometimes painfully obvious.
Real-time dreamy Cloudscapes with Volumetric Raymarching
I spent the past few months diving into the realm of Raymarching and studying some of its applications that may come in handy for future 3D projects, and while I managed to build a pretty diverse set of scenes, all of them consisted of rendering surfaces or solid objects. My blog post on Raymarching covered some of the many impressive capabilities of this rendering technique, and as I mentioned at the end of that post, that was only the tip of the iceberg; there is a lot more we can do with it.
One fascinating aspect of Raymarching I quickly encountered in my study was its capacity to be tweaked to render volumes. Instead of stopping the raymarched loop once the ray hits a surface, we push through and continue the process to sample the inside of an object. That is where my obsession with volumetric clouds started, and I think the countless hours I spent exploring the many Sky Islands in Zelda Tears of the Kingdom contributed a lot to my curiosity to learn more about how they work. I thus studied a lot of Shadertoy scenes leveraging many Volumetric Raymarching techniques to render smoke, clouds, and cloudscapes, which I obviously couldn’t resist giving a try rebuilding myself:
WebGPU Security Technical Report
In this document we outline how WebGPU works through the mind of an attacker, our vulnerability research methodologies, and our thought processes in some of the more difficult research areas. There are many interesting portions of Chrome graphics that we omitted from review to keep scope manageable. While our primary focus was WebGPU, we did explore a few attack surfaces shared by other graphics features. We will interleave background information on WebGPU with descriptions of the important bugs we found. We hope this report will give the security community a deeper understanding of the shape of vulnerabilities we may come to expect with the addition of WebGPU, along with a lens into the vulnerabilities we might encounter in the future.
How I implemented MegaTextures on real Nintendo 64 hardware
This showcases a demo of megatextures running on n64 hardware. A “megatexture” for the n64 is really just a normal sized textured by modern standards but with that you can do some prebaked scenes that look like they don’t belong on the n64.
Raytraced Order Independent Transparency
About a year ago I reviewed a number of Order Independent Transparency (OIT) techniques (part 1, part 2, part 3), each achieving a difference combination of performance, quality and memory requirements. None of them fully solved OIT though and I ended the series wondering what raytraced transparency would look like. Recently I added (some) DXR support to the toy engine and I was curious to see how it would work, so I did a quick implementation.
The implementation was really simple. Since there is no mechanism to sort the nodes of a BLAS/TLAS based on distance from the camera, the ray generation shader keeps tracing rays using the result of the closest hit shader as the origin for the next ray until there is nothing else to hit.
Porting FSR 2 to OpenGL
FSR 2, or FidelityFX Super Resolution 2, is a temporal upscaling (TAAU) algorithm developed by AMD. It is comparable to Nvidia’s DLSS, except it is completely open-source and doesn’t require vendor-specific GPU features (tensor cores) to run.
I’ve been floating the idea of making an OpenGL backend for FSR 2 for a while now. However, only recently have I acquired the motivation to actually do it. I knew that writing a bespoke TAA(U) implementation, let alone a good one, was a task worthy of the gods, so I wanted to defer it to them.
Paving the Road to Vulkan on Asahi Linux
In every modern OS, GPU drivers are split into two parts: a userspace part, and a kernel part. The kernel part is in charge of managing GPU resources and how they are shared between apps, and the userspace part is in charge of converting commands from a graphics API (such as OpenGL or Vulkan) into the hardware commands that the GPU needs to execute.
Between those two parts, there is something called the Userspace API or “UAPI”. This is the interface that they use to communicate between them, and it is specific to each class of GPUs! Since the exact split between userspace and the kernel can vary depending on how each GPU is designed, and since different GPU designs require different bits of data and parameters to be passed between userspace and the kernel, each new GPU driver requires its own UAPI to go along with it.
The Apple GPU and the Impossible Bug
In late 2020, Apple debuted the M1 with Apple’s GPU architecture, AGX, rumoured to be derived from Imagination’s PowerVR series. Since then, we’ve been reverse-engineering AGX and building open source graphics drivers. Last January, I rendered a triangle with my own code, but there has since been a heinous bug lurking: The driver fails to render large amounts of geometry.
Fixing the entire SM64 Source Code (Insane N64 performance)
Recap of a lot of work, optimizing and rewriting code to squeeze out performance on limited hardware.
Exponentially Better Rotations
If you’ve done any 3D programming, you’ve likely encountered the zoo of techniques and representations used when working with 3D rotations. Some of them are better than others, depending on the situation.
Why are video games graphics (still) a challenge? Productionizing rendering algorithms
This post will cover challenges and aspects of production to consider when creating new rendering / graphics techniques and algorithms – especially in the context of applied research for real time rendering. I will base this on my personal experiences, working on Witcher 2, Assassin’s Creed 4: Black Flag, Far Cry 4, and God of War.
Many of those challenges are easily ignored – they are real problems in production, but not necessarily there only if you only read about those techniques, or if you work on pure research, writing papers, or create tech demos.
I have seen statements like “why is this brilliant research technique X not used in production?” both from gamers, but also from my colleagues with academic background. And there are always some good reasons!
This is quite extensive.
GPU architecture resources
I am often get asked in DMs about how GPUs work. There is a lot of information on GPU architectures online, one can start with these:
Real-Time Ray-Tracing in WebGPU
Note that RTX is not available officially for WebGPU (yet?) and is only available for the Node bindings for WebGPU. Recently I began adapting an unofficial Ray-Tracing extension for Dawn, which is the WebGPU implementation for Chromium. The Ray-Tracing extension is only implemented into the Vulkan backend so far, but a D3D12 implementation is on the Roadmap. You can find my Dawn Fork with Ray-Tracing capabilities here.
Now let me introduce you to the ideas and concepts of the Ray-Tracing extension.
Writing a Texture Painter: Part #1
Many programmers appreciate being able to see their code render something interesting to the screen. For a while I’ve wanted to write a texture painter, where I can import a model, paint colors on it, and then export those textures back to a file. I’m using OpenGL in my code, but I’ll focus on the actual mechanics and less on the language or code.
Signed distance fields
It would be fun, I thought, to be able to specify the desired cross-sections, and have something generate the required 3D shape (if it existed) in real-time.
Dealing with all of the details of creating a mesh with the right vertices etc. sounded painful though. Fortunately, I had been reading recently about a different kind of 3D rendering technique which makes these kind of boolean operations trivial – signed distance fields.
Vulkan Progress Report #5
Another month, another Vulkan progress report! October was a busy month, as most of it was split between working on the new Global Illumination system and Godotcon/GIC in Poland. Despite this, strong progress was made and the new GI system seems pretty much complete.
Godot 3.0 introduced GIProbes. They provide Global Illumination to scenes. They were, however, pretty limited. Only static geometry could provide GI and dynamic objects were ignored. Added to this, changes in light settings had significant frames of delay. Added to a not so great performance and quality, the feature was barely usable as is.
For Godot 4.0, GIProbes will see several significant changes, which will be outlined as follows:
Half The Precision, Twice The Fun: Working With FP16 In HLSL
It turns out that fp16 is still useful for the reasons it was originally useful back in the days of D3D9: it’s a good way to improve throughput on a limited transitor/power budget, and the smaller storage size means that you can store more values in general purpose registers without having your thread occupancy suffer due to register pressure. As of Nvidia’s new Turing architecture (AKA the RTX 2000 series), AMD’s Vega (AKA gfx900, AKA GCN 5) series1 and Intel’s Gen8 architecture (used in Broadwell) fp16 is now back in the desktop world. Which means that us desktop graphics programmers now have to deal with it again. And of course if you’re a mobile developer, it never really left in the first place. But how do you actually use fp16 in your shader code? That’s exactly what this blog will explain!
Dramatically reduced power usage in Firefox 70 on macOS with Core Animation
In Firefox 70 we changed how pixels get to the screen on macOS. This allows us to do less work per frame when only small parts of the screen change. As a result, Firefox 70 drastically reduces the power usage during browsing.
Every Firefox window contains one OpenGL context, which covers the entire window. Firefox 69 was using the API described above. So we were always redrawing the whole window on every change, and the window manager was always copying our entire window to the screen on every change. This turned out to be a problem despite the fact that these draws were fully hardware accelerated.
Core Animation is the name of an Apple framework which lets you create a tree of layers (CALayer). These layers usually contain textures with some pixel content. The layer tree defines the positions, sizes, and order of the layers within the window. Starting with macOS 10.14, all windows use Core Animation by default, as a way to share their rendering with the window manager.