High Fidelity Image Generation Using Diffusion Models
Alternatively, diffusion models, originally proposed in 2015, have seen a recent revival in interest due to their training stability and their promising sample quality results on image and audio generation. Thus, they offer potentially favorable trade-offs compared to other types of deep generative models. Diffusion models work by corrupting the training data by progressively adding Gaussian noise, slowly wiping out details in the data until it becomes pure noise, and then training a neural network to reverse this corruption process. Running this reversed corruption process synthesizes data from pure noise by gradually denoising it until a clean sample is produced. This synthesis procedure can be interpreted as an optimization algorithm that follows the gradient of the data density to produce likely samples.
Two new color spaces for color picking
Picking colors is a common operation in many applications and over the years color pickers have become fairly standardized. Ubiquitous today are color pickers based on HSL and HSV. They are simple transformations of RGB values to alternative coordinates chosen to better correlate with perceptual qualities.
Is their dominance well deserved or would it be possible to create better alternatives? I at least think that this question deserves to be explored and that color picker design should be an active research topic. With this post I hope to contribute to the exploration of what a better color picker could and should be, and hopefully inspire others to do the same!
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
We introduce the problem of perpetual view generation—long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image. This is a challenging problem that goes far beyond the capabilities of current view synthesis methods, which work for a limited range of viewpoints and quickly degenerate when presented with a large camera motion. Methods designed for video generation also have limited ability to produce long video sequences and are often agnostic to scene geometry. We take a hybrid approach that integrates both geometry and image synthesis in an iterative render, refine, and repeat framework, allowing for long-range generation that cover large distances after hundreds of frames. Our approach can be trained from a set of monocular video sequences without any manual annotation. We propose a dataset of aerial footage of natural coastal scenes, and compare our method with recent view synthesis and conditional video generation baselines, showing that it can generate plausible scenes for much longer time horizons over large camera trajectories compared to existing methods.
Multimodal Neurons in Artificial Neural Networks
We’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually. This may explain CLIP’s accuracy in classifying surprising visual renditions of concepts, and is also an important step toward understanding the associations and biases that CLIP and similar models learn.
The good, and the bad...
By exploiting the model’s ability to read text robustly, we find that even photographs of hand-written text can often fool the model.
Bilinear down/upsampling, aligning pixel grids, and that infamous GPU half pixel offset
It’s been more than two decades of me using bilinear texture filtering, a few months since I’ve written about bilinear resampling, but only two days since I discovered a bug of mine related to it. 😅 Similarly, just last week a colleague asked for a very fast implementation of bilinear on a CPU and it caused a series of questions “which kind of bilinear?”.
So I figured it’s an opportunity for another short blog post – on bilinear filtering, but in context of down/upsampling. We will touch here on GPU half pixel offsets, aligning pixel grids, a bug / confusion in Tensorflow, deeper signal processing analysis of what’s going on during bilinear operations, and analysis of the magic of the famous “magic kernel”.
Improving texture atlas allocation in WebRender
This is a longer version of the piece I published in the mozilla gfx team blog where I focus on the atlas allocation algorithms. In this one I’ll go into more details about the process and methodology behind these improvements. The first part is about the making of guillotiere, a crate that I first released in March 2019. In the second part we’ll have a look at more recent work building upon what I did with guillotiere, to improve texture memory usage in WebRender/Firefox.
Dissecting the Apple M1 GPU
Apple’s latest line of Macs includes their in-house “M1” system-on-chip, featuring a custom GPU. This poses a problem for those of us in the Asahi Linux project who wish to run Linux on our devices, as this custom Apple GPU has neither public documentation nor open source drivers. Some speculate it might descend from PowerVR GPUs, as used in older iPhones, while others believe the GPU to be completely custom. But rumours and speculations are no fun when we can peek under the hood ourselves!
And part II where it really takes off: https://rosenzweig.io/blog/asahi-gpu-part-2.html
Leaking silhouettes of cross-origin images
This is a writeup of a vulnerability I found in Chromium and Firefox that could allow a malicious page to read some parts of an image located on an origin it is not supposed to be able to access. Although technically interesting, it is quite limited in scope—I am not aware of any major websites it could’ve been used against. As of November 17th, 2020, the vulnerability has been fixed in the most recent versions of both browsers.
The time that it takes CanvasRenderingContext2D.drawImage to draw a pixel depends on whether it is fully transparent, opaque, or semi-transparent. By timing a bunch of calls to drawImage, we can reliably infer the transparency of each pixel in a cross-origin image, which is enough to, for example, read text on a transparent background, like this:
Ditherpunk — The article I wish I had about monochrome image dithering
Why are video games graphics (still) a challenge? Productionizing rendering algorithms
This post will cover challenges and aspects of production to consider when creating new rendering / graphics techniques and algorithms – especially in the context of applied research for real time rendering. I will base this on my personal experiences, working on Witcher 2, Assassin’s Creed 4: Black Flag, Far Cry 4, and God of War.
Many of those challenges are easily ignored – they are real problems in production, but not necessarily there only if you only read about those techniques, or if you work on pure research, writing papers, or create tech demos.
I have seen statements like “why is this brilliant research technique X not used in production?” both from gamers, but also from my colleagues with academic background. And there are always some good reasons!
This is quite extensive.
Cameras and Lenses
Cameras and the lenses inside them may seem a little mystifying. In this blog post I’d like to explain not only how they work, but also how adjusting a few tunable parameters can produce fairly different results:
This is amazing work.
AVIF has landed
AVIF is a new image format derived from the keyframes of AV1 video. It’s a royalty-free format, and it’s already supported in Chrome 85 on desktop. Android support will be added soon, Firefox is working on an implementation, and although it took Safari 10 years to add WebP support, I don’t think we’ll see the same delay here, as Apple are a member of the group that created AV1.
Roughly speaking, at an acceptable quality, the WebP is almost half the size of JPEG, and AVIF is under half the size of WebP. I find it incredible that AVIF can do a good job of the image in just 18 kB.
Modernizing the OpenBSD console
At the beginning were text mode consoles. Traditionally, *BSD and Linux on i386 and amd64 used text mode consoles which by default provided 25 rows of 80 columns, the “80x25 mode”. This mode uses a 8x16 font stored in the VGA BIOS (which can be slightly different across vendors).
Rainbow – an attempt to display colour on a B&W monitor
The aim of this project was to display a colour image on a black and white monitor, by overlaying an acetate bayer filter over the monitor and mosaicing a colour image.
Is WebP really better than JPEG?
I think Google’s result of 25-34% smaller files is mostly caused by the fact that they compared their WebP encoder to the JPEG reference implementation, Independent JPEG Group’s cjpeg, not Mozilla’s improved MozJPEG encoder. I decided to run some tests to see how cjpeg, MozJPEG and WebP compare. I also tested the new AVIF format, based on the open AV1 video codec. AVIF support is already in Firefox behind a flag and should be coming soon to Chrome if this ticket is to be believed.
Ray Tracing In Notepad.exe At 30 FPS
A few months back, there was a post on Reddit (link), which described a game that used an open source clone of Notepad to handle all its input and rendering. While reading about it, I had the thought that it would be really cool to see something similar that worked with stock Windows Notepad. Then I spent way too much of my free time doing exactly that.
I ended up making a Snake game and a small ray tracer that use stock Notepad for all input and rendering tasks, and got to learn about DLL Injection, API Hooking and Memory Scanning along the way. It seemed like writing up the stuff I learned might make for an interesting read, and give me a chance to show off the dumb stuff I built at the same time, so that’s what these next couple blog posts will be about.
Animated optical illusions. These are very nice.
Augmented Reality Is Now Mainstream on Instagram
I am alone in my apartment, as always, and I’ve just replaced my left eyeball with an orange springing out of its peel. A mile away, a friend, also home alone, is taking her seat—every seat, actually—at the table in The Last Supper, yelling as the camera pans down the row of disciples and her face replaces that of one man after another. Another friend is watching a mouse dressed as the Pope dance across her kitchen floor. A third is smiling while a strange man wraps his arms around his throat.
GPU architecture resources
I am often get asked in DMs about how GPUs work. There is a lot of information on GPU architectures online, one can start with these:
Bilinear texture filtering – artifacts, alternatives, and frequency domain analysis
In this post we will look at one of the staples of real-time computer graphics – bilinear texture filtering. To catch your interest, I will start with focusing on something that is often referred to as “bilinear artifacts”, trapezoid/star-shaped artifact of bilinear interpolation – what causes them? I will discuss briefly some common bilinear filtering alternatives and how they fix those, link a few of my favorite papers on (fast) image interpolation, and analyze the frequency response of common cheap filters.