GPU architecture resources
I am often get asked in DMs about how GPUs work. There is a lot of information on GPU architectures online, one can start with these:
Real-Time Ray-Tracing in WebGPU
Note that RTX is not available officially for WebGPU (yet?) and is only available for the Node bindings for WebGPU. Recently I began adapting an unofficial Ray-Tracing extension for Dawn, which is the WebGPU implementation for Chromium. The Ray-Tracing extension is only implemented into the Vulkan backend so far, but a D3D12 implementation is on the Roadmap. You can find my Dawn Fork with Ray-Tracing capabilities here.
Now let me introduce you to the ideas and concepts of the Ray-Tracing extension.
Writing a Texture Painter: Part #1
Many programmers appreciate being able to see their code render something interesting to the screen. For a while I’ve wanted to write a texture painter, where I can import a model, paint colors on it, and then export those textures back to a file. I’m using OpenGL in my code, but I’ll focus on the actual mechanics and less on the language or code.
Signed distance fields
It would be fun, I thought, to be able to specify the desired cross-sections, and have something generate the required 3D shape (if it existed) in real-time.
Dealing with all of the details of creating a mesh with the right vertices etc. sounded painful though. Fortunately, I had been reading recently about a different kind of 3D rendering technique which makes these kind of boolean operations trivial – signed distance fields.
Vulkan Progress Report #5
Another month, another Vulkan progress report! October was a busy month, as most of it was split between working on the new Global Illumination system and Godotcon/GIC in Poland. Despite this, strong progress was made and the new GI system seems pretty much complete.
Godot 3.0 introduced GIProbes. They provide Global Illumination to scenes. They were, however, pretty limited. Only static geometry could provide GI and dynamic objects were ignored. Added to this, changes in light settings had significant frames of delay. Added to a not so great performance and quality, the feature was barely usable as is.
For Godot 4.0, GIProbes will see several significant changes, which will be outlined as follows:
Half The Precision, Twice The Fun: Working With FP16 In HLSL
It turns out that fp16 is still useful for the reasons it was originally useful back in the days of D3D9: it’s a good way to improve throughput on a limited transitor/power budget, and the smaller storage size means that you can store more values in general purpose registers without having your thread occupancy suffer due to register pressure. As of Nvidia’s new Turing architecture (AKA the RTX 2000 series), AMD’s Vega (AKA gfx900, AKA GCN 5) series1 and Intel’s Gen8 architecture (used in Broadwell) fp16 is now back in the desktop world. Which means that us desktop graphics programmers now have to deal with it again. And of course if you’re a mobile developer, it never really left in the first place. But how do you actually use fp16 in your shader code? That’s exactly what this blog will explain!
Dramatically reduced power usage in Firefox 70 on macOS with Core Animation
In Firefox 70 we changed how pixels get to the screen on macOS. This allows us to do less work per frame when only small parts of the screen change. As a result, Firefox 70 drastically reduces the power usage during browsing.
Every Firefox window contains one OpenGL context, which covers the entire window. Firefox 69 was using the API described above. So we were always redrawing the whole window on every change, and the window manager was always copying our entire window to the screen on every change. This turned out to be a problem despite the fact that these draws were fully hardware accelerated.
Core Animation is the name of an Apple framework which lets you create a tree of layers (CALayer). These layers usually contain textures with some pixel content. The layer tree defines the positions, sizes, and order of the layers within the window. Starting with macOS 10.14, all windows use Core Animation by default, as a way to share their rendering with the window manager.
Hybrid screen-space reflections
As realtime raytracing is slowly, but steadily, gaining traction, a range of opportunities to mix rasteration-based rendering systems with raytracing are starting to become available: hybrid raytracing where rasterisation is used to provide the hit points for the primary rays, hybrid shadows where shadowmaps are combined with raytracing to achieve smooth or higher detail shadows, hybrid antialiasing where raytracing is used to antialias the edges only, hybrid reflections, where raytracing is used to fill-in the areas that screenspace reflections can’t resolve due to lack of information.
Of these, I found the last one particularly interesting: how well can a limited information lighting technique like SSR be combined with a full-scene aware one like raytracing, so I set about exploring this further.
Anime4K - A High-Quality Real Time Anime Upscaler
We present a state-of-the-art high-quality real-time SISR algorithm designed to work with japanese animation and cartoons that is extremely fast (~3ms with Vega 64 GPU), temporally coherent, simple to implement (~100 lines of code), yet very effective. We find it surprising that this method is not currently used ‘en masse’, since the intuition leading us to this algorithm is very straightforward. Remarkably, the proposed method does not use any machine-learning or statistical approach, and is tailored to content that puts importance to well defined lines/edges while tolerates a sacrifice of the finer textures.
Banding in Games: A Noisy Rant
If you use sRGB correctly, you’re doing pretty well - you will generally hardly notice banding (though dark areas remain)
If you are not on a platform where it’s readily available, or you want to get rid of the last issues, the rest of this presentation is for you
Dithering. Lots of dithering.
2D Graphics on Modern GPU
I have found that, if you can depend on modern compute capabilities, it seems quite viable to implement 2D rendering directly on GPU, with very promising quality and performance. The prototype I built strongly resembles a software renderer, just running on an outsized multicore GPU with wide SIMD vectors, much more so than rasterization-based pipelines.
Anti-Ghosting with Temporal Anti-Aliasing
We decided on TAA for The Grand Tour Game because it tends to produce a softer, more photorealistic image in both static and moving scenes. FXAA (Fast Approximate Anti-Aliasing) and SMAA (Subpixel Morphological Anti-Aliasing) work well for static scenes, but still produce artifacts for moving scenes. Lumberyard’s deferred lighting pipeline does not support MSAA (Multisample Anti-Aliasing). Like MSAA, TAA uses multiple samples per pixel to provide anti-aliasing. The difference is that with temporal anti-aliasing, the samples are spread across multiple frames. It uses a frame history buffer and a per-pixel velocity buffer to reproject each pixel to gather the additional sample. For each pixel, we use the per-pixel velocity as an offset, as well as the previous frame’s view projection matrix, to determine where to query the frame history buffer. Modifying the camera’s projection matrix with a sub-pixel jitter each frame allows us to produce anti-aliased results even in scenes where there is no camera motion. With fast rotation or linear motion, the history pixel (the sample retrieved from the frame history buffer after pixel reprojection) may correspond to a location with vastly different lighting conditions or to an entirely separate object. This history mismatch, if unaddressed, causes severe ghosting, as shown below.
Fyne - Cross platform GUI in Go based on Material Design
Fyne is an easy to use UI toolkit and app API written in Go. We use OpenGL (through the go-gl and go-glfw projects) to provide cross platform graphics.
The 1.0 release is now out and we encourage feedback and requests for the next major release :).
Log-spherical Mapping in SDF Raymarching
In this post, I’ll describe a set of techniques for manipulating signed distance fields (SDFs) allowing the creation of self-similar geometries like the flower above. Although these types of geometries have been known and renderable for a long time, I believe the techniques described here offer much unexplored creative potential, since they allow the creation of raymarchable SDFs of self-similar geometries with an infinite level of visual recursion, which can be explored in realtime on the average current-gen graphics card. This is done by crafting distance functions based on the log-spherical mapping, which I’ll explain starting with basic tilings and building up to the recursive shell transformation.
This page hosts the hg_sdf library for building signed distance functions (or more precise: signed distance bounds). Those are a very elegant and flexible representation of geometry that can be rendered or otherwise processed. Roughly, coded SDFs are to triangle meshes or voxels what vector graphics are to pixels.
Volume Rendering with WebGL
In scientific visualization, volume rendering is widely used to visualize 3D scalar fields. These scalar fields are often uniform grids of values, representing, for example, charge density around a molecule, an MRI or CT scan, air flow around an airplane, etc. Volume rendering is a conceptually straightforward method for turning such data into an image: by sampling the data along rays from the eye and assigning a color and transparency to each sample, we can produce useful and beautiful images of such scalar fields (see Figure 1). In a GPU renderer, these 3D scalar fields are stored as 3D textures; however, in WebGL1 3D textures were not supported, requiring additional hacks to emulate them for volume rendering. Recently, WebGL2 added support for 3D textures, allowing for an elegant and fast volume renderer to be implemented entirely in the browser. In this post we’ll discuss the mathematical background for volume rendering, and how it can be implemented in WebGL2 to create an interactive volume renderer entirely in the browser!
Q2VKPT is the first playable game that is entirely raytraced and efficiently simulates fully dynamic lighting in real-time, with the same modern techniques as used in the movie industry (see Disney’s practical guide to path tracing). The recent release of GPUs with raytracing capabilities has opened up entirely new possibilities for the future of game graphics, yet making good use of raytracing is non-trivial. While some games have started to explore improvements in shadow and reflection rendering, Q2VKPT is the first project to implement an efficient unified solution for all types of light transport: direct, scattered, and reflected light (see media). This kind of unification has led to a dramatic increase in both flexibility and productivity in the movie industry. The chance to have the same development in games promises a similar increase in visual fidelity and realism for game graphics in the coming years.
This project is meant to serve as a proof-of-concept for computer graphics research and the game industry alike, and to give enthusiasts a glimpse into the potential future of game graphics. Besides the use of hardware-accelerated raytracing, Q2VKPT mainly gains its efficiency from an adaptive image filtering technique that intelligently tracks changes in the scene illumination to re-use as much information as possible from previous computations.
The Rendering of Rise of the Tomb Raider
Tomb Raider used the Crystal Engine, developed by Crystal Dynamics also used in Deus Ex: Human Revolution. For the sequel a new engine called Foundation was used, previously developed for Lara Croft and the Temple of Osiris (2014). Its rendering can be broadly classified as a tiled light-prepass engine, and we’ll see what that means as we dive in. The engine offers the choice between a DX11 and DX12 renderer; I chose the latter for reasons we’ll see later. I used Renderdoc 1.2 to capture the frame, on a Geforce 980 Ti, and turned on all the bells and whistles.
How to Start Learning Graphics Programming
It is a bit nebulous, without doubt, but graphics programming can be approached from different angles, at different stages and degrees of difficulty to suit one’s experience and knowledge.
The key to learning graphics programming, and any programming for that matter, is in my opinion instant gratification and feedback. You must be able to immediately see the output of your code, with a rapid edit/preview iteration cycle.
Rendered Insecure: GPU Side Channel Attacks are Practical
Under a number of scenarios the GPU can be shared between multiple applications at a fine granularity allowing a spy application to monitor side channels and attempt to infer the behavior of the victim. For example, OpenGL and WebGL send workloads to the GPU at the granularity of a frame, allowing an attacker to interleave the use of the GPU to measure the side-effects of the victim computation through performance counters or other resource tracking APIs. We demonstrate the vulnerability using two applications. First, we show that an OpenGL based spy can fingerprint websites accurately, track user activities within the website, and even infer the keystroke timings for a password text box with high accuracy. The second application demonstrates how a CUDA spy application can derive the internal parameters of a neural network model being used by another CUDA application, illustrating these threats on the cloud.