A tale of /dev/fd
Many versions of Unix provide a /dev/fd directory to work with open file handles as if they were regular files. As usual, the devil is in the details.
CVE-2023-38408: Remote Code Execution in OpenSSH's forwarded ssh-agent
While browsing through ssh-agent’s source code, we noticed that a remote attacker, who has access to the remote server where Alice’s ssh-agent is forwarded to, can load (dlopen()) and immediately unload (dlclose()) any shared library in /usr/lib on Alice’s workstation (via her forwarded ssh-agent, if it is compiled with ENABLE_PKCS11, which is the default).
Surprisingly, by chaining four common side effects of shared libraries from official distribution packages, we were able to transform this very limited primitive (the dlopen() and dlclose() of shared libraries from /usr/lib) into a reliable, one-shot remote code execution in ssh-agent (despite ASLR, PIE, and NX). Our best proofs of concept so far exploit default installations of Ubuntu Desktop plus three extra packages from Ubuntu’s “universe” repository. We believe that even better results can be achieved (i.e., some operating systems might be exploitable in their default installation):
The day my ping took countermeasures
While this doesn’t happen too often, a computer clock can be freely adjusted either forward or backward. However, it’s pretty rare for a regular network utility, like ping, to try to manage a situation like this. It’s even less common to call it “taking countermeasures”. I would totally expect ping to just print a nonsensical time value and move on without hesitation.
Ping developers clearly put some thought into that. I wondered how far they went. Did they handle clock changes in both directions? Are the bad measurements excluded from the final statistics? How do they test the software?
Dumb bugs: the PCI device that wasn't
So pci_notify() gets called with our VIO device (somehow), and we’re converting that struct device into a struct pci_dev with no error checking. We could solve this particular bug by just checking that our device is actually a PCI device before we proceed - but we’re in a function called pci_notify, we’re expecting a PCI device to come in, so this would just be a bandaid.
Paving the Road to Vulkan on Asahi Linux
In every modern OS, GPU drivers are split into two parts: a userspace part, and a kernel part. The kernel part is in charge of managing GPU resources and how they are shared between apps, and the userspace part is in charge of converting commands from a graphics API (such as OpenGL or Vulkan) into the hardware commands that the GPU needs to execute.
Between those two parts, there is something called the Userspace API or “UAPI”. This is the interface that they use to communicate between them, and it is specific to each class of GPUs! Since the exact split between userspace and the kernel can vary depending on how each GPU is designed, and since different GPU designs require different bits of data and parameters to be passed between userspace and the kernel, each new GPU driver requires its own UAPI to go along with it.
The Quest for Netflix on Asahi Linux
Thus begins the “do not violate the DMCA challenge 2023”. The goal of this challenge is to figure out how to watch Netflix on Asahi Linux without bypassing or otherwise breaking DRM. You may notice that this article is significantly longer than my 280-character publication on doing the latter, from 2019.
We’re on the home stretch now, right? Right??? Not quite, there is one last showstopper for Asahi users, and it’s a big one: Asahi Linux is built to use 16K page sizes. The Widevine blobs available to us only support 4K pages.
The futex_waitv() syscall and gaming on Linux
The futex_waitv syscall is a new syscall through which the process can wait for multiple futexes. The task wakes up when any futex in the list is awakened. This can be used to implement wait on multiple locks and wait lists, etc, without the limitations imposed by using eventfd.
How fast are Linux pipes anyway?
In this post, we will explore how Unix pipes are implemented in Linux by iteratively optimizing a test program that writes and reads data through a pipe.
We will proceed as follows:
A first slow version of our pipe test bench;
How pipes are implemented internally, and why writing and reading from them is slow;
How the vmsplice and splice syscalls let us get around some (but not all!) of the slowness;
A description of Linux paging, leading up to a faster version using huge pages;
The final optimization, replacing polling with busy looping;
Some closing thoughts.
Lotus 1-2-3 For Linux
I’ll cut to the chase; through a combination of unlikely discoveries, crazy hacks and the 90s BBS warez scene I’ve been able to port Lotus 1-2-3 natively to Linux – an operating system that literally didn’t exist when 1-2-3 was released!
An unexpected Redis sandbox escape affecting only Debian, Ubuntu, and other derivatives
This post describes how I broke the Redis sandbox, but only for Debian and Debian-derived Linux distributions. Upstream Redis is not affected. That makes it a Debian vulnerability, not a Redis one. The culprit, if you will, is dynamic linking, but there will be more on that later.
The Dirty Pipe Vulnerability
This is the story of CVE-2022-0847, a vulnerability in the Linux kernel since 5.8 which allows overwriting data in arbitrary read-only files. This leads to privilege escalation because unprivileged processes can inject code into root processes.
It all started a year ago with a support ticket about corrupt files. A customer complained that the access logs they downloaded could not be decompressed. And indeed, there was a corrupt log file on one of the log servers; it could be decompressed, but gzip reported a CRC error. I could not explain why it was corrupt, but I assumed the nightly split process had crashed and left a corrupt file behind. I fixed the file’s CRC manually, closed the ticket, and soon forgot about the problem.
Months later, this happened again and yet again. Every time, the file’s contents looked correct, only the CRC at the end of the file was wrong. Now, with several corrupt files, I was able to dig deeper and found a surprising kind of corruption. A pattern emerged.
The multi-generational LRU
One of the key tasks assigned to the memory-management subsystem is to optimize the system’s use of the available memory; that means pushing out pages containing unused data so that they can be put to better use elsewhere. Predicting which pages will be accessed in the near future is a tricky task, and the kernel has evolved a number of mechanisms designed to improve its chances of guessing right. But the kernel not only often gets it wrong, it also can expend a lot of CPU time to make the incorrect choice. The multi-generational LRU patch set posted by Yu Zhao is an attempt to improve that situation.
Uncovering a 24-year-old bug in the Linux Kernel
When one side’s receive buffer (Recv-Q) fills up (in this case because the rsync process is doing disk I/O at a speed slower than the network’s), it will send out a zero window advertisement, which will put that direction of the connection on hold. When buffer space eventually frees up, the kernel will send an unsolicited window update with a non-zero window size, and the data transfer continues. To be safe, just in case this unsolicited window update is lost, the other end will regularly poll the connection state using the so-called Zero Window Probes (the persist mode we are seeing here).
Apparently, the bug was in the bulk receiver fast-path, a code path that skips most of the expensive, strict TCP processing to optimize for the common case of bulk data reception. This is a significant optimization, outlined 28 years ago² by Van Jacobson in his “TCP receive in 30 instructions” email. Apparently the Linux implementation did not update snd_wl1 while in the receiver fast path. If a connection uses the fast path for too long, snd_wl1 will fall so far behind that ack_seq will wrap around with respect to it. And if this happens while the receive window is zero, there is no way to re-open the window, as demonstrated above. What’s more, this bug had been present in Linux since v2.1.8, dating back to 1996!
Achieving 11M IOPS & 66 GB/s IO on a Single ThreadRipper Workstation
In this post I’ll explain how I configured my AMD ThreadRipper Pro workstation with 10 PCIe 4.0 SSDs to achieve 11M IOPS with 4kB random reads and 66 GiB/s throughput with larger IOs - and what bottlenecks & issues I fixed to get there. We’ll look into Linux block I/O internals and their interaction with modern hardware. We’ll use tools & techniques, old and new, for measuring bottlenecks - and other adventures in the kernel I/O stack.
How to make Bash fail badly on Ubuntu 16.04 by typo'ing a command name
The simple thing to say about this is that it only happens on Ubuntu 16.04, not on 18.04 or 20.04, and it happens because Ubuntu’s normal /etc/bash.bashrc defines a command_not_found_handle function that winds up running a helper program to produce this ‘did you mean’ report. The helper program comes from the command-not-found package, which is installed because it’s Recommended by ubuntu-standard.
GNOME has no thumbnails in the file picker (and my toilets are blocked)
The file picker is the pop-up box thingy that appears when you’re opening a file, usually when uploading something online. The GNOME desktop environment uses the file picker package GtkFileChooser. This file picker does not have a thumbnail view. It is broken software. Thumbnails are not a cute little extra, they are essential. This is as bad as a file picker that doesn’t list the name of the files, only their creation date, or inode serial number. It is broken software.
Personally, not a big deal, but fair point.
PAM Bypass: when null(is not)ok
The commit attempts to avoid a timing attack against PAM. Some attacker can know valid user names by timing how quickly PAM returns an error, so the fix is to use an existing user in the system we always validate against to ensure a consistent timing. But which user is always present on a Linux system? root!
The code does not check if root has any valid passwords set. An invalid user would fail, loop over to root and try validate. root has no password. It’s blank. We have nullok set. And we have pam_permit.so. The invalid user is authenticated. We have enough information to do a quick POC.
1 + 1 = 3.
What they don’t tell you about demand paging in school
This post details my adventures with the Linux virtual memory subsystem, and my discovery of a creative way to taunt the OOM (out of memory) killer by accumulating memory in the kernel, rather than in userspace.
Good look at practical realities.
Major Bug in glibc is Killing Applications With a Memory Limit
malloc() preallocates large chunks of memory, per thread. This is meant as a performance optimization, to reduce memory contention in highly threaded applications. On a typical physical server, dual Xeon CPU with a terabyte of RAM. The core count is easily 40 or above. 10 cores * 2 CPU * 2 for hyper threading. This means a preallocation of up to 20 GB of RAM in the process.
KVM host in a few lines of code
KVM is a virtualization technology that comes with the Linux kernel. In other words, it allows you to run multiple virtual machines (VMs) on a single Linux VM host. VMs in this case are known as guests. If you ever used QEMU or VirtualBox on Linux - you know what KVM is capable of.
But how does it work under the hood?