The Apple GPU and the Impossible Bug
In late 2020, Apple debuted the M1 with Apple’s GPU architecture, AGX, rumoured to be derived from Imagination’s PowerVR series. Since then, we’ve been reverse-engineering AGX and building open source graphics drivers. Last January, I rendered a triangle with my own code, but there has since been a heinous bug lurking: The driver fails to render large amounts of geometry.
The Unreasonable Effectiveness of JPEG: A Signal Processing Approach
The JPEG algorithm is rather complex and in this video, we break down the core parts of the algorithm, specifically color spaces, YCbCr, chroma subsampling, the discrete cosine transform, quantization, and lossless encoding. The majority of the focus is on the mathematical and signal processing insights that lead to advancements in image compression and the big themes in compression as a whole that we can take away from it.
Fixing the entire SM64 Source Code (Insane N64 performance)
Recap of a lot of work, optimizing and rewriting code to squeeze out performance on limited hardware.
Exponentially Better Rotations
If you’ve done any 3D programming, you’ve likely encountered the zoo of techniques and representations used when working with 3D rotations. Some of them are better than others, depending on the situation.
It's always been you, Canvas2D
Admittedly, the API is a bit behind the times when it comes to state-of-the-art 2D drawing. Fortunately we’ve been hard at work implementing new features in Canvas2D to catch up to CSS, streamline ergonomics and improve performance.
How Does Perspective Work in Pictures?
Theories of perception and photography often tend to be all-or-nothing. Either linear perspective and cameras are correct, and cameras don’t lie. Or, there is no objective reality and everything is made-up. The reality is clearly far more complex. Our artwork employs all sorts of complex nonlinear structures, and our brains are able to understand and interpret them. Even more confusing, there’s some evidence that people with very different cultural backgrounds may vary in perspective perception in some cases. Understanding how and why perspective works is a hard problem (and one that I’m working on), as is developing new software tools to make images to easily convey what we want to convey.
The Fastest GIF Does Not Exist
It seems the reason for pushing values like 10ms back up to 100ms originates from a requirement to emulate the slowness of Netscape. Qt and Firefox source code comments both want to reduce CPU usage but since a value of 20ms is supported, I think they should just be clamping the value to 20ms instead. Or, don’t clamp the value at all! Modern browsers already render 20ms GIF frames just fine, and I’m not sure the “computers are too slow” argument holds up 30 years later.
Open To Conversion
Around this time 30 years ago, two separate working groups were putting the finishing touches on technical standards that would come to reshape the way people observed the world. One technical standard reshaped the way that people used an important piece of office equipment at the time: the fax machine. The other would basically reshape just about everything else, becoming the de facto way that high-quality images and low-quality memes alike are shared on the internet and in professional settings. They took two divergent paths, but they came from the same place: The world of compression standards. The average person has no idea what JBIG, the compression standard most fax machines use, is—but they’ve most assuredly heard about JPEG, which was first publicly released in 1992. The JPEG format is awesome and culture-defining, but this is Tedium, and I am of course more interested in the no-name formats of the world. Today’s Tedium discusses 10 image formats that time forgot. Hope you have the right conversion tool.
Moiré no more
I showed the original typewriter car scan, added my blurred-then-sharpened photo as a pathetic comparison, and asked: what is the latest in demoireing? Is there some new tech that could help me?
But this pales in comparison to the typewriter car photo I wanted to reuse, the one with all the dots, where we can see the FFT immediately betraying their repeated presence:
This sounded like a prank. You’re telling me that a problem I’ve witnessed for decades could be solved with a 1960s algorithm, and I don’t even have to be particularly careful? But I tried it out. I started crudely drawing over the peaks, one by one. Things were weird at the beginning, but then I saw something astonishing – the halftone dots started shrinking:
High Fidelity Image Generation Using Diffusion Models
Alternatively, diffusion models, originally proposed in 2015, have seen a recent revival in interest due to their training stability and their promising sample quality results on image and audio generation. Thus, they offer potentially favorable trade-offs compared to other types of deep generative models. Diffusion models work by corrupting the training data by progressively adding Gaussian noise, slowly wiping out details in the data until it becomes pure noise, and then training a neural network to reverse this corruption process. Running this reversed corruption process synthesizes data from pure noise by gradually denoising it until a clean sample is produced. This synthesis procedure can be interpreted as an optimization algorithm that follows the gradient of the data density to produce likely samples.
Two new color spaces for color picking
Picking colors is a common operation in many applications and over the years color pickers have become fairly standardized. Ubiquitous today are color pickers based on HSL and HSV. They are simple transformations of RGB values to alternative coordinates chosen to better correlate with perceptual qualities.
Is their dominance well deserved or would it be possible to create better alternatives? I at least think that this question deserves to be explored and that color picker design should be an active research topic. With this post I hope to contribute to the exploration of what a better color picker could and should be, and hopefully inspire others to do the same!
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
We introduce the problem of perpetual view generation—long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image. This is a challenging problem that goes far beyond the capabilities of current view synthesis methods, which work for a limited range of viewpoints and quickly degenerate when presented with a large camera motion. Methods designed for video generation also have limited ability to produce long video sequences and are often agnostic to scene geometry. We take a hybrid approach that integrates both geometry and image synthesis in an iterative render, refine, and repeat framework, allowing for long-range generation that cover large distances after hundreds of frames. Our approach can be trained from a set of monocular video sequences without any manual annotation. We propose a dataset of aerial footage of natural coastal scenes, and compare our method with recent view synthesis and conditional video generation baselines, showing that it can generate plausible scenes for much longer time horizons over large camera trajectories compared to existing methods.
Multimodal Neurons in Artificial Neural Networks
We’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually. This may explain CLIP’s accuracy in classifying surprising visual renditions of concepts, and is also an important step toward understanding the associations and biases that CLIP and similar models learn.
The good, and the bad...
By exploiting the model’s ability to read text robustly, we find that even photographs of hand-written text can often fool the model.
Bilinear down/upsampling, aligning pixel grids, and that infamous GPU half pixel offset
It’s been more than two decades of me using bilinear texture filtering, a few months since I’ve written about bilinear resampling, but only two days since I discovered a bug of mine related to it. 😅 Similarly, just last week a colleague asked for a very fast implementation of bilinear on a CPU and it caused a series of questions “which kind of bilinear?”.
So I figured it’s an opportunity for another short blog post – on bilinear filtering, but in context of down/upsampling. We will touch here on GPU half pixel offsets, aligning pixel grids, a bug / confusion in Tensorflow, deeper signal processing analysis of what’s going on during bilinear operations, and analysis of the magic of the famous “magic kernel”.
Improving texture atlas allocation in WebRender
This is a longer version of the piece I published in the mozilla gfx team blog where I focus on the atlas allocation algorithms. In this one I’ll go into more details about the process and methodology behind these improvements. The first part is about the making of guillotiere, a crate that I first released in March 2019. In the second part we’ll have a look at more recent work building upon what I did with guillotiere, to improve texture memory usage in WebRender/Firefox.
Dissecting the Apple M1 GPU
Apple’s latest line of Macs includes their in-house “M1” system-on-chip, featuring a custom GPU. This poses a problem for those of us in the Asahi Linux project who wish to run Linux on our devices, as this custom Apple GPU has neither public documentation nor open source drivers. Some speculate it might descend from PowerVR GPUs, as used in older iPhones, while others believe the GPU to be completely custom. But rumours and speculations are no fun when we can peek under the hood ourselves!
And part II where it really takes off: https://rosenzweig.io/blog/asahi-gpu-part-2.html
Leaking silhouettes of cross-origin images
This is a writeup of a vulnerability I found in Chromium and Firefox that could allow a malicious page to read some parts of an image located on an origin it is not supposed to be able to access. Although technically interesting, it is quite limited in scope—I am not aware of any major websites it could’ve been used against. As of November 17th, 2020, the vulnerability has been fixed in the most recent versions of both browsers.
The time that it takes CanvasRenderingContext2D.drawImage to draw a pixel depends on whether it is fully transparent, opaque, or semi-transparent. By timing a bunch of calls to drawImage, we can reliably infer the transparency of each pixel in a cross-origin image, which is enough to, for example, read text on a transparent background, like this:
Ditherpunk — The article I wish I had about monochrome image dithering
Why are video games graphics (still) a challenge? Productionizing rendering algorithms
This post will cover challenges and aspects of production to consider when creating new rendering / graphics techniques and algorithms – especially in the context of applied research for real time rendering. I will base this on my personal experiences, working on Witcher 2, Assassin’s Creed 4: Black Flag, Far Cry 4, and God of War.
Many of those challenges are easily ignored – they are real problems in production, but not necessarily there only if you only read about those techniques, or if you work on pure research, writing papers, or create tech demos.
I have seen statements like “why is this brilliant research technique X not used in production?” both from gamers, but also from my colleagues with academic background. And there are always some good reasons!
This is quite extensive.
Cameras and Lenses
Cameras and the lenses inside them may seem a little mystifying. In this blog post I’d like to explain not only how they work, but also how adjusting a few tunable parameters can produce fairly different results:
This is amazing work.