Tech’s hottest new job: Prompt Engineer
‘Prompt engineers’ are being hired for their skill in getting AI systems to produce exactly what they want.
Version 1 of Stable Attribution’s algorithm decodes an image generated by an A.I. model into the most similar examples from the data that the model was trained with. Usually, the image the model creates doesn’t exist in its training data - it’s new - but because of the training process, the most influential images are the most visually similar ones, especially in the details.
Alejandro Jodorowsky’s “Tron”
I was recently shown some frames from a film that I had never heard of: Alejandro Jodorowsky’s 1976 version of “Tron.” The sets were incredible. The actors, unfamiliar to me, looked fantastic in their roles. The costumes and lighting worked together perfectly. The images glowed with an extravagant and psychedelic sensibility that felt distinctly Jodorowskian.
The truth is that these weren’t stills from a long-lost movie. They weren’t photos at all. These evocative, well-composed and tonally immaculate images were generated in seconds with the magic of artificial intelligence.
The “interactive” elements are annoying, but some pretty pictures here.
High Fidelity Image Generation Using Diffusion Models
Alternatively, diffusion models, originally proposed in 2015, have seen a recent revival in interest due to their training stability and their promising sample quality results on image and audio generation. Thus, they offer potentially favorable trade-offs compared to other types of deep generative models. Diffusion models work by corrupting the training data by progressively adding Gaussian noise, slowly wiping out details in the data until it becomes pure noise, and then training a neural network to reverse this corruption process. Running this reversed corruption process synthesizes data from pure noise by gradually denoising it until a clean sample is produced. This synthesis procedure can be interpreted as an optimization algorithm that follows the gradient of the data density to produce likely samples.
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image
We introduce the problem of perpetual view generation—long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image. This is a challenging problem that goes far beyond the capabilities of current view synthesis methods, which work for a limited range of viewpoints and quickly degenerate when presented with a large camera motion. Methods designed for video generation also have limited ability to produce long video sequences and are often agnostic to scene geometry. We take a hybrid approach that integrates both geometry and image synthesis in an iterative render, refine, and repeat framework, allowing for long-range generation that cover large distances after hundreds of frames. Our approach can be trained from a set of monocular video sequences without any manual annotation. We propose a dataset of aerial footage of natural coastal scenes, and compare our method with recent view synthesis and conditional video generation baselines, showing that it can generate plausible scenes for much longer time horizons over large camera trajectories compared to existing methods.
Multimodal Neurons in Artificial Neural Networks
We’ve discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually. This may explain CLIP’s accuracy in classifying surprising visual renditions of concepts, and is also an important step toward understanding the associations and biases that CLIP and similar models learn.
The good, and the bad...
By exploiting the model’s ability to read text robustly, we find that even photographs of hand-written text can often fool the model.
Man who tokenized himself on Ethereum becomes AI deepfake
Ethereum’s tokenized man just became a synthetic deepfake—and you can decide what he says for $99.
Augmented Reality Is Now Mainstream on Instagram
I am alone in my apartment, as always, and I’ve just replaced my left eyeball with an orange springing out of its peel. A mile away, a friend, also home alone, is taking her seat—every seat, actually—at the table in The Last Supper, yelling as the camera pans down the row of disciples and her face replaces that of one man after another. Another friend is watching a mouse dressed as the Pope dance across her kitchen floor. A third is smiling while a strange man wraps his arms around his throat.
Dressing for the Surveillance Age
As cities become ever more packed with cameras that always see, public anonymity could disappear. Can stealth streetwear evade electronic eyes?
I liked this article because it at least acknowledged that these countermeasures are only a training data update away from becoming useless.
All about the new ML Super Resolution feature in Pixelmator Pro
To create the ML Super Resolution feature, we used a convolutional neural network. This type of deep neural network reduces raster images and their complex inter-pixel dependencies into a form that is easier to process (i.e. requires less computation) without losing important features (edges, patterns, colors, textures, gradients, and so on). The ML Super Resolution network includes 29 convolutional layers which scan the image and create an over-100-channel-deep version of it that contains a range of identified features. This is then upscaled, post-processed and turned back into a raster image. Below is a simplified representation of the neural network.
Not quite all about it, and there’s better references for the technique, but neat to see this trickle down to entry level photo editing.
The secret-sharer: evaluating and testing unintended memorization in neural networks
This is a really important paper for anyone working with language or generative models, and just in general for anyone interested in understanding some of the broader implications and possible unintended consequences of deep learning. There’s also a lovely sense of the human drama accompanying the discoveries that just creeps through around the edges.
Disclosure of secrets is of particular concern in neural network models that classify or predict sequences of natural language text… even if sensitive or private training data text is very rare, one should assume that well-trained models have paid attention to its precise details…. The users of such models may discover— either by accident or on purpose— that entering certain text prefixes causes the models to output surprisingly revealing text completions.
3D Ken Burns Effect from a Single Image
In this paper, we introduce a framework that synthesizes the 3D Ken Burns effect from a single image, supporting both a fully automatic mode and an interactive mode with the user controlling the camera. Our framework first leverages a depth prediction pipeline, which estimates scene depth that is suitable for view synthesis tasks. To address the limitations of existing depth estimation methods such as geometric distortions, semantic distortions, and inaccurate depth boundaries, we develop a semantic-aware neural network for depth prediction, couple its estimate with a segmentation-based depth adjustment process, and employ a refinement neural network that facilitates accurate depth predictions at object boundaries. According to this depth estimate, our framework then maps the input image to a point cloud and synthesizes the resulting video frames by rendering the point cloud from the corresponding camera positions. To address disocclusions while maintaining geometrically and temporally coherent synthesis results, we utilize context-aware color- and depth-inpainting to fill in the missing information in the extreme views of the camera path, thus extending the scene geometry of the point cloud.
Turning a MacBook into a Touchscreen Using the Webcam
Our idea was to retrofit a small mirror in front of a MacBook’s built-in webcam, so that the webcam would be looking down at the computer screen at a sharp angle. The camera would be able to see fingers hovering over or touching the screen, and we’d be able to translate the video feed into touch events using computer vision.
The AI of GoldenEye 007
GoldenEye 007: one of the most influential games of all time. A title that defined a generation of console gaming and paved the way forward for first-person shooters in the console market. In this article I’m winding the clock back over 20 years to learn the secrets of how one of the Nintendo 64’s most beloved titles built friendly and enemy AI that is still held in high regard today.
Natural Adversarial Examples
We introduce natural adversarial examples -- real-world, unmodified, and naturally occurring examples that cause classifier accuracy to significantly degrade. We curate 7,500 natural adversarial examples and release them in an ImageNet classifier test set that we call ImageNet-A. This dataset serves as a new way to measure classifier robustness. Like l_p adversarial examples, ImageNet-A examples successfully transfer to unseen or black-box classifiers. For example, on ImageNet-A a DenseNet-121 obtains around 2% accuracy, an accuracy drop of approximately 90%. Recovering this accuracy is not simple because ImageNet-A examples exploit deep flaws in current classifiers including their over-reliance on color, texture, and background cues. We observe that popular training techniques for improving robustness have little effect, but we show that some architectural changes can enhance robustness to natural adversarial examples. Future research is required to enable robust generalization to this hard ImageNet test set.
Weight Agnostic Neural Networks
Not all neural network architectures are created equal, some perform much better than others for certain tasks. But how important are the weight parameters of a neural network compared to its architecture? In this work, we question to what extent neural network architectures alone, without learning any weight parameters, can encode solutions for a given task. We propose a search method for neural network architectures that can already perform a task without any explicit weight training. To evaluate these networks, we populate the connections with a single shared weight parameter sampled from a uniform random distribution, and measure the expected performance. We demonstrate that our method can find minimal neural network architectures that can perform several reinforcement learning tasks without weight training. On supervised learning domain, we find architectures that can achieve much higher than chance accuracy on MNIST using random weights.
Some fun demos.
Google Thought My Phone Number Was Facebook’s and It Ruined My Life
As it turns out, if you Googled “Facebook phone number” on your phone earlier this week, you would see my cellphone as the fourth result, and Google has created a “card” that pulled my number out of the article and displayed it directly on the search page in a box. The effect is that it seemed like my phone number was Facebook’s phone number, because that is how Google has trained people to think.
The company behind the $16,000 AI-powered laundry-folding robot has filed for bankruptcy
Backed by companies like Panasonic and Daiwa House, Laundroid had ambitious dreams to be the ultimate wardrobe organizer for the entire household. It had multiple cameras and robotic arms to scan a load of laundry, and used Wi-Fi to connect to a server that would analyze the clothing using AI to figure out the best way to fold it. A companion app was supposed to be able to track every piece of clothing that went through Laundroid, and categorize the clothes by household member. One load of laundry would take a couple hours to be folded, as each T-shirt took about five to ten minutes.
That’s how it was supposed to work in theory, anyway — when I tested it out at CES 2018 with my own T-shirt, the machine ate it up and Laundroid engineers had to work for about 15 minutes to pry it out. The explanation was that its cameras couldn’t recognize my black shirt, only the brightly colored demo shirts they’d prepared on hand.
Unsolved research problems vs. real-world threat models
I personally think adversarial examples are highly worth studying, and should inspire serious concern. However, most of the justifications for why exactly they’re worrisome strike me as overly literal.
One: they’re a proof of concept: an incontrovertible demonstration that a certain type of problem exists. As a result of easily finding small-perturbation adversarial examples, we can say with certainty that if the safety of your system depends on the classifier never making obvious mistakes, then that guarantee is false, and your system is unsafe.
The Boombox Incident
In Seinfeld episode #163, “The Slicer”, George has just landed a cushy job at Kruger Industrial Smoothing, when he sees himself in the background of a family photo on his new boss’s desk. George enacts the plan and things are going fine until he receives the touched-up photo: the clerk has removed Kruger from the family photo instead of George.
The clerk mistook Kruger in the photo for George, since in the picture George had hair but Kruger was bald. Removing the only bald person from the photo was a pretty reasonable thing for the photo store clerk to do. I figured this is something that photo editors have to do frequently, so I decided to automate it.
Now that bald individuals have been identified, they can be removed from the image. OpenCV has an inpainting function, which is really only meant for removing small strokes from an image. The results of applying it here are…not ideal.