The 5-hour CDN
The term “CDN” (“content delivery network“) conjures Google-scale companies managing huge racks of hardware, wrangling hundreds of gigabits per second. But CDNs are just web applications. That’s not how we tend to think of them, but that’s all they are. You can build a functional CDN on an 8-year-old laptop while you’re sitting at a coffee shop. I’m going to talk about what you might come up with if you spend the next five hours building a CDN.
It’s useful to define exactly what a CDN does. A CDN hoovers up files from a central repository (called an origin) and stores copies close to users. Back in the dark ages, the origin was a CDN’s FTP server. These days, origins are just web apps and the CDN functions as a proxy server. So that’s what we’re building: a distributed caching proxy.
SSH and User-mode IP WireGuard
For a couple hundred lines of code (not counting the entire user-mode Linux you’ll be pulling in from gVisor, HEY! Dependencies! What are you gonna do!) you can bring up a new, cryptographically authenticated network, any time you want to, in practically any program.
There really are some fun libraries out there if you want to build something crazy.
Uncovering a 24-year-old bug in the Linux Kernel
When one side’s receive buffer (Recv-Q) fills up (in this case because the rsync process is doing disk I/O at a speed slower than the network’s), it will send out a zero window advertisement, which will put that direction of the connection on hold. When buffer space eventually frees up, the kernel will send an unsolicited window update with a non-zero window size, and the data transfer continues. To be safe, just in case this unsolicited window update is lost, the other end will regularly poll the connection state using the so-called Zero Window Probes (the persist mode we are seeing here).
Apparently, the bug was in the bulk receiver fast-path, a code path that skips most of the expensive, strict TCP processing to optimize for the common case of bulk data reception. This is a significant optimization, outlined 28 years ago² by Van Jacobson in his “TCP receive in 30 instructions” email. Apparently the Linux implementation did not update snd_wl1 while in the receiver fast path. If a connection uses the fast path for too long, snd_wl1 will fall so far behind that ack_seq will wrap around with respect to it. And if this happens while the receive window is zero, there is no way to re-open the window, as demonstrated above. What’s more, this bug had been present in Linux since v2.1.8, dating back to 1996!
KEMTLS: Post-quantum TLS without signatures
KEMTLS, therefore, achieves the same goals as TLS 1.3 (authentication, confidentiality and integrity) in the face of quantum computers. But there’s one small difference compared to the TLS 1.3 handshake. KEMTLS allows the client to send encrypted application data in the second client-to-server TLS message flow when client authentication is not required, and in the third client-to-server TLS message flow when mutual authentication is required. Note that with TLS 1.3, the server is able to send encrypted and authenticated application data in its first response message (although, in most uses of TLS 1.3, this feature is not actually used). With KEMTLS, when client authentication is not required, the client is able to send its first encrypted application data after the same number of handshake round trips as in TLS 1.3.
Intuitively, the handshake signature in TLS 1.3 proves possession of the private key corresponding to the public key certified in the TLS 1.3 server certificate. For these signature schemes, this is the straightforward way to prove possession; another way to prove possession is through key exchanges. By carefully considering the key derivation sequence, a server can decrypt any messages sent by the client only if it holds the private key corresponding to the certified public key. Therefore, implicit authentication is fulfilled. It is worth noting that KEMTLS still relies on signatures by certificate authorities to authenticate the long-term KEM keys.
How NAT traversal works
We covered a lot of ground in our post about How Tailscale Works. However, we glossed over how we can get through NATs (Network Address Translators) and connect your devices directly to each other, no matter what’s standing between them. Let’s talk about that now!
node.example.com Is An IP Address
This takes a bit to get to the punchline, but man, good old duck typing for the win.
It turns out that, under certain conditions, the ipaddress module can create IPv6 addresses from raw bytes. My assumption is that it offers this behavior as a convenient way to parse IP addresses from data fresh off the wire.
Does node.example.com meet those certain conditions? You bet it does. Because we’re using Python 2 it’s just bytes and it happens to be 16 characters long.
NAT Slipstreaming allows an attacker to remotely access any TCP/UDP service bound to a victim machine, bypassing the victim’s NAT/firewall (arbitrary firewall pinhole control), just by the victim visiting a website.
This is neat, although you have to dig in a bit to learn it requires the NAT gateway to do some fancy SIP proxying.
Implementing traceroute in Go
This tool is very useful to inspect network paths and solve problems. But aside from that, this tool is extremely interesting and its actual implementation is pretty simple.
Under the Hood of a Simple DNS Server
For this post, I will talk mostly about the details of implementing a DNS server that follows the original two RFCs that laid out the spec: 1034 and 1035.
Chromium’s impact on root DNS traffic
The root server system is, out of necessity, designed to handle very large amounts of traffic. As we have shown here, under normal operating conditions, half of the traffic originates with a single library function, on a single browser platform, whose sole purpose is to detect DNS interception. Such interception is certainly the exception rather than the norm. In almost any other scenario, this traffic would be indistinguishable from a distributed denial of service (DDoS) attack.
Certificate Transparency: a bird's-eye view
Certificate Transparency (CT) is a still-evolving technology for detecting incorrectly issued certificates on the web. It’s cool and interesting, but complicated. I’ve given talks about CT, I’ve worked on Chrome’s CT implementation, and I’m actively involved in tackling ongoing deployment challenges – even so, I still sometimes lose track of how the pieces fit together. I find it easy to forget how the system defends against particular attacks, or what the purpose of some particular mechanism is.
A Massive Leak
“Memory leaks are impossible in a garbage collected language!” is one of my favorite lies. It feels true, but it isn’t. Sure, it’s much harder to make them, and they’re usually much easier to track down, but you can still create a memory leak. Most times, it’s when you create objects, dump them into a data structure, and never empty that data structure. Usually, it’s just a matter of finding out what object references are still being held. Usually.
A few months ago, I discovered a new variation on that theme. I was working on a C# application that was leaking memory faster than bad waterway engineering in the Imperial Valley.
How CDNs Generate Certificates
Obviously, to do stuff like this, you need to generate certificates. The reasonable way to do that in 2020 is with LetsEncrypt. We do that for our users automatically, but “it just works” makes for a pretty boring writeup, so let’s see how complicated and meandering I can make this.
It’s time to talk about certificate infrastructure.
Path Building vs Path Verifying: Implementation Showdown
In my previous post, I talked about what the issue with Sectigo’s expired root was, from the perspective of the PKI graph, and talked a bit about what makes a good certificate verifier implementation. Unfortunately, despite browsers and commercial OSes mostly handling this issue, the sheer variety of open-source implementations means that there’s a number of not-so-good verifiers out there.
In this post, I’ll dig in a little deeper, looking at specific implementations, and talking about how their strategies either lead to this issue, or avoided this issue but will lead to other issues.
It’s pretty much all terrible, except the parts that are extremely terrible.
Wireless is a trap
I used to be an anti-wire crusader. I hated the clutter of cables, and my tendency to unconsciously chew on them if they got anywhere near my face. But running into bug after tricky wireless bug—mostly while trying to make my video calls work better—I’ve apostasized. The more I’ve learned about wifi, Bluetooth and related protocols, the more I’m convinced that they’re often worse, on net, than wires.
This starts off with the usual, but then there’s some real wtf.
Qt included a component which would poll for networks every 30 seconds whenever a “network access manager” was instantiated, causing pretty much any Qt app using the network to degrade your wifi for ~5 out of every 30 seconds. There were already multiple bug reports for this issue, one of which was declared “closed” by an engineer because they allowed users to use an environment variable to disable the polling.
Fixing the Breakage from the AddTrust External CA Root Expiration
A lot of stuff on the Internet is currently broken on account of a Sectigo root certificate expiring at 10:48:38 UTC today. Generally speaking, this is affecting older, non-browser clients (notably OpenSSL 1.0.x) which talk to TLS servers which serve a Sectigo certificate chain ending in the expired certificate. See also this Twitter thread by Ryan Sleevi.
Vulnerabilities! We’ve got vulnerabilities here! … See? Nobody cares.
Jurassic Park is often (mistakenly) left out of the hacker movie canon. It clearly demonstrated the risk of an insider attack on control systems (Velociraptor rampage, amongst other tragedies…) nearly a decade ahead of the Maroochy sewage incident, it’s the first film I know of with a digital troll (“ah, ah, ah, you didn’t say the magic word!”), and Samuel L. Jackson correctly assesses the possible consequence of a hard reset (namely, everyone dying), resulting in his legendary “Hold on to your butts”. The quotable mayhem is seeded early in the film, when biotech spy Lewis Dodgson gives a sack of money to InGen’s Dennis Nedry to steal some dino DNA. Dodgson’s caricatured OPSEC (complete with trilby and dark glasses) is mocked by Nedry shouting, “Dodgson! Dodgson! We’ve got Dodgson here! See, nobody cares…” Three decades later, this quote still comes to mind* whenever conventional wisdom doesn’t seem to square with observed reality, and today we’re going to apply it to the oft-maligned world of Industrial Control System (ICS) security.
Three bugs in the Go MySQL Driver
Adding to this challenge, authzd is deployed to our Kubernetes clusters, where we’ve been experiencing issues with high latencies when opening new TCP connections, something that particularly affects the pooling of connections in the Go MySQL driver. One of the most dangerous lies that programmers tell themselves is that the network is reliable, because, well, most of the time the network is reliable. But when it gets slow or spotty, that’s when things start breaking, and we get to find out the underlying issues in the libraries we take for granted.
Good walkthrough of dealing with some unfriendly bugs.
Why is This Website Port Scanning me?
Recently, I was tipped off about certain sites performing localhost port scans against visitors, presumably as part of a user fingerprinting and tracking or bot detection. This didn’t sit well with me, so I went about investigating the practice, and it seems many sites are port scanning visitors for dubious reasons.
The case of the missing DNS packets
Troubleshooting is both a science and an art. The first step is to make a hypothesis about why something is behaving in an unexpected way, and then prove whether or not the hypothesis is correct. But before you can formulate a hypothesis, you first need to clearly identify the problem, and express it with precision. If the issue is too vague, then you need to brainstorm in order to narrow down the problem—this is where the “artistic” part of the process comes in.