How I implemented MegaTextures on real Nintendo 64 hardware
This showcases a demo of megatextures running on n64 hardware. A “megatexture” for the n64 is really just a normal sized textured by modern standards but with that you can do some prebaked scenes that look like they don’t belong on the n64.
Raytraced Order Independent Transparency
About a year ago I reviewed a number of Order Independent Transparency (OIT) techniques (part 1, part 2, part 3), each achieving a difference combination of performance, quality and memory requirements. None of them fully solved OIT though and I ended the series wondering what raytraced transparency would look like. Recently I added (some) DXR support to the toy engine and I was curious to see how it would work, so I did a quick implementation.
The implementation was really simple. Since there is no mechanism to sort the nodes of a BLAS/TLAS based on distance from the camera, the ray generation shader keeps tracing rays using the result of the closest hit shader as the origin for the next ray until there is nothing else to hit.
Porting FSR 2 to OpenGL
FSR 2, or FidelityFX Super Resolution 2, is a temporal upscaling (TAAU) algorithm developed by AMD. It is comparable to Nvidia’s DLSS, except it is completely open-source and doesn’t require vendor-specific GPU features (tensor cores) to run.
I’ve been floating the idea of making an OpenGL backend for FSR 2 for a while now. However, only recently have I acquired the motivation to actually do it. I knew that writing a bespoke TAA(U) implementation, let alone a good one, was a task worthy of the gods, so I wanted to defer it to them.
Paving the Road to Vulkan on Asahi Linux
In every modern OS, GPU drivers are split into two parts: a userspace part, and a kernel part. The kernel part is in charge of managing GPU resources and how they are shared between apps, and the userspace part is in charge of converting commands from a graphics API (such as OpenGL or Vulkan) into the hardware commands that the GPU needs to execute.
Between those two parts, there is something called the Userspace API or “UAPI”. This is the interface that they use to communicate between them, and it is specific to each class of GPUs! Since the exact split between userspace and the kernel can vary depending on how each GPU is designed, and since different GPU designs require different bits of data and parameters to be passed between userspace and the kernel, each new GPU driver requires its own UAPI to go along with it.
The Apple GPU and the Impossible Bug
In late 2020, Apple debuted the M1 with Apple’s GPU architecture, AGX, rumoured to be derived from Imagination’s PowerVR series. Since then, we’ve been reverse-engineering AGX and building open source graphics drivers. Last January, I rendered a triangle with my own code, but there has since been a heinous bug lurking: The driver fails to render large amounts of geometry.
Fixing the entire SM64 Source Code (Insane N64 performance)
Recap of a lot of work, optimizing and rewriting code to squeeze out performance on limited hardware.
Exponentially Better Rotations
If you’ve done any 3D programming, you’ve likely encountered the zoo of techniques and representations used when working with 3D rotations. Some of them are better than others, depending on the situation.
Why are video games graphics (still) a challenge? Productionizing rendering algorithms
This post will cover challenges and aspects of production to consider when creating new rendering / graphics techniques and algorithms – especially in the context of applied research for real time rendering. I will base this on my personal experiences, working on Witcher 2, Assassin’s Creed 4: Black Flag, Far Cry 4, and God of War.
Many of those challenges are easily ignored – they are real problems in production, but not necessarily there only if you only read about those techniques, or if you work on pure research, writing papers, or create tech demos.
I have seen statements like “why is this brilliant research technique X not used in production?” both from gamers, but also from my colleagues with academic background. And there are always some good reasons!
This is quite extensive.
GPU architecture resources
I am often get asked in DMs about how GPUs work. There is a lot of information on GPU architectures online, one can start with these:
Real-Time Ray-Tracing in WebGPU
Note that RTX is not available officially for WebGPU (yet?) and is only available for the Node bindings for WebGPU. Recently I began adapting an unofficial Ray-Tracing extension for Dawn, which is the WebGPU implementation for Chromium. The Ray-Tracing extension is only implemented into the Vulkan backend so far, but a D3D12 implementation is on the Roadmap. You can find my Dawn Fork with Ray-Tracing capabilities here.
Now let me introduce you to the ideas and concepts of the Ray-Tracing extension.
Writing a Texture Painter: Part #1
Many programmers appreciate being able to see their code render something interesting to the screen. For a while I’ve wanted to write a texture painter, where I can import a model, paint colors on it, and then export those textures back to a file. I’m using OpenGL in my code, but I’ll focus on the actual mechanics and less on the language or code.
Signed distance fields
It would be fun, I thought, to be able to specify the desired cross-sections, and have something generate the required 3D shape (if it existed) in real-time.
Dealing with all of the details of creating a mesh with the right vertices etc. sounded painful though. Fortunately, I had been reading recently about a different kind of 3D rendering technique which makes these kind of boolean operations trivial – signed distance fields.
Vulkan Progress Report #5
Another month, another Vulkan progress report! October was a busy month, as most of it was split between working on the new Global Illumination system and Godotcon/GIC in Poland. Despite this, strong progress was made and the new GI system seems pretty much complete.
Godot 3.0 introduced GIProbes. They provide Global Illumination to scenes. They were, however, pretty limited. Only static geometry could provide GI and dynamic objects were ignored. Added to this, changes in light settings had significant frames of delay. Added to a not so great performance and quality, the feature was barely usable as is.
For Godot 4.0, GIProbes will see several significant changes, which will be outlined as follows:
Half The Precision, Twice The Fun: Working With FP16 In HLSL
It turns out that fp16 is still useful for the reasons it was originally useful back in the days of D3D9: it’s a good way to improve throughput on a limited transitor/power budget, and the smaller storage size means that you can store more values in general purpose registers without having your thread occupancy suffer due to register pressure. As of Nvidia’s new Turing architecture (AKA the RTX 2000 series), AMD’s Vega (AKA gfx900, AKA GCN 5) series1 and Intel’s Gen8 architecture (used in Broadwell) fp16 is now back in the desktop world. Which means that us desktop graphics programmers now have to deal with it again. And of course if you’re a mobile developer, it never really left in the first place. But how do you actually use fp16 in your shader code? That’s exactly what this blog will explain!
Dramatically reduced power usage in Firefox 70 on macOS with Core Animation
In Firefox 70 we changed how pixels get to the screen on macOS. This allows us to do less work per frame when only small parts of the screen change. As a result, Firefox 70 drastically reduces the power usage during browsing.
Every Firefox window contains one OpenGL context, which covers the entire window. Firefox 69 was using the API described above. So we were always redrawing the whole window on every change, and the window manager was always copying our entire window to the screen on every change. This turned out to be a problem despite the fact that these draws were fully hardware accelerated.
Core Animation is the name of an Apple framework which lets you create a tree of layers (CALayer). These layers usually contain textures with some pixel content. The layer tree defines the positions, sizes, and order of the layers within the window. Starting with macOS 10.14, all windows use Core Animation by default, as a way to share their rendering with the window manager.
Hybrid screen-space reflections
As realtime raytracing is slowly, but steadily, gaining traction, a range of opportunities to mix rasteration-based rendering systems with raytracing are starting to become available: hybrid raytracing where rasterisation is used to provide the hit points for the primary rays, hybrid shadows where shadowmaps are combined with raytracing to achieve smooth or higher detail shadows, hybrid antialiasing where raytracing is used to antialias the edges only, hybrid reflections, where raytracing is used to fill-in the areas that screenspace reflections can’t resolve due to lack of information.
Of these, I found the last one particularly interesting: how well can a limited information lighting technique like SSR be combined with a full-scene aware one like raytracing, so I set about exploring this further.
Anime4K - A High-Quality Real Time Anime Upscaler
We present a state-of-the-art high-quality real-time SISR algorithm designed to work with japanese animation and cartoons that is extremely fast (~3ms with Vega 64 GPU), temporally coherent, simple to implement (~100 lines of code), yet very effective. We find it surprising that this method is not currently used ‘en masse’, since the intuition leading us to this algorithm is very straightforward. Remarkably, the proposed method does not use any machine-learning or statistical approach, and is tailored to content that puts importance to well defined lines/edges while tolerates a sacrifice of the finer textures.
Banding in Games: A Noisy Rant
If you use sRGB correctly, you’re doing pretty well - you will generally hardly notice banding (though dark areas remain)
If you are not on a platform where it’s readily available, or you want to get rid of the last issues, the rest of this presentation is for you
Dithering. Lots of dithering.
2D Graphics on Modern GPU
I have found that, if you can depend on modern compute capabilities, it seems quite viable to implement 2D rendering directly on GPU, with very promising quality and performance. The prototype I built strongly resembles a software renderer, just running on an outsized multicore GPU with wide SIMD vectors, much more so than rasterization-based pipelines.
Anti-Ghosting with Temporal Anti-Aliasing
We decided on TAA for The Grand Tour Game because it tends to produce a softer, more photorealistic image in both static and moving scenes. FXAA (Fast Approximate Anti-Aliasing) and SMAA (Subpixel Morphological Anti-Aliasing) work well for static scenes, but still produce artifacts for moving scenes. Lumberyard’s deferred lighting pipeline does not support MSAA (Multisample Anti-Aliasing). Like MSAA, TAA uses multiple samples per pixel to provide anti-aliasing. The difference is that with temporal anti-aliasing, the samples are spread across multiple frames. It uses a frame history buffer and a per-pixel velocity buffer to reproject each pixel to gather the additional sample. For each pixel, we use the per-pixel velocity as an offset, as well as the previous frame’s view projection matrix, to determine where to query the frame history buffer. Modifying the camera’s projection matrix with a sub-pixel jitter each frame allows us to produce anti-aliased results even in scenes where there is no camera motion. With fast rotation or linear motion, the history pixel (the sample retrieved from the frame history buffer after pixel reprojection) may correspond to a location with vastly different lighting conditions or to an entirely separate object. This history mismatch, if unaddressed, causes severe ghosting, as shown below.