Snap: a microkernel approach to host networking
> This paper describes the networking stack, Snap, that has been running in production at Google for the last three years+. It’s been clear for a while that software designed explicitly for the data center environment will increasingly want/need to make different design trade-offs to e.g. general-purpose systems software that you might install on your own machines. But wow, I didn’t think we’d be at the point yet where we’d be abandoning TCP/IP! You need a lot of software engineers and the willingness to rewrite a lot of software to entertain that idea.
> The initial distribution of names was via flooding of a common copy of the hosts file. Pretty obviously this does not scale, and the frustrations with this naming model drove much of the design of the DNS. The DNS is a hierarchal name structure, where every nodal point in the namespace can also be a delegation point. A delegation is completely autonomous, in that an entity who is delegated control of a nodal point in the namespace can populate it without reference to any other delegated operator of any other nodal point. The implementation of the matching namespace as a database follows the same structure, in that an authoritative server is responsible for answering all queries that relate to this nodal point in the database. Client systems that query these authoritative services also use a form of hierarchy, but for somewhat different reasons. End systems are usually equipped with a stub resolver service that can be queried by applications. They typically pass all queries to a recursive resolver. The recursive resolver takes on the role of traversing the database structure, resolving names by exposing the delegation points and discovering the authoritative servers for each of these zone delegations. It does so by using the same DNS protocol query and response mechanism as it uses once it finds the terminal zone that can provide the desired answer.
> Ever since then we’ve been testing out these reasons and discovering how we can break these assumptions!
U2F support in OpenSSH
Real-world measurements of structured-lattices and supersingular isogenies in TLS
> This is the third in a series of posts about running experiments on post-quantum confidentiality in TLS. The first detailed experiments that measured the estimated network overhead of three families of post-quantum key exchanges. The second detailed the choices behind a specific structured-lattice scheme. This one gives details of a full, end-to-end measurement of that scheme and a supersingular isogeny scheme, SIKE/p434. This was done in collaboration with Cloudflare, who integrated Microsoft’s SIKE code into BoringSSL for the tests, and ran the server-side of the experiment.
> Because optimised assembly implementations are labour-intensive to write, they were only available/written for AArch64 and x86-64. Because SIKE is computationally expensive, it wasn’t feasible to enable it without an assembly implementation, thus only AArch64 and x86-64 clients were included in the experiment and ARMv7 and x86 clients did not contribute to the results even if they were assigned to one of the experiment groups.
Migrating From Cloudflare
> Okay so here’s the thing: Cloudflare isn’t just the CDN provider for the instance, it is also the domain’s nameserver. That means that it holds all the DNS records that point mastodon.technology to the various IP addresses used for HTTP requests, email, and even public DKIM keys for mail server verification. These DNS settings are really, really important. If they get messed up, everything about the instance can break.
> So I split up the migration from Cloudflare to BunnyCDN into two phases: first migrate the CDN provider, and then migrate the DNS provider. Getting this right is really important, and I mostly did okay, but hopefully you can learn from my experiences.
Fixing up KA9Q-unix, or "neck deep in 30 year old codebases.."
> Anyhoo, I’ve finally been mucking around with AX.25 packet radio. I’ve been wanting to do this since I was a teenager and found out about its existence, but back in high school and .. well, until a few years ago really .. I didn’t have my amateur radio licence. But, now I do, and I’ve done a bunch of other stuff with a bunch of other radios. The main stumbling block? All my devices are either Apple products or run FreeBSD - and none of them have useful AX.25 stacks. The main stacks of choice these days run on Linux, Windows or are a full hardware TNC.
Looking back at the Snowden revelations
> It’s no coincidence that this is a cryptography blog, which means that I’m not concerned with the same things as the general public. That is, I’m not terribly interested in debating the value of whistleblower laws (for some of that, see this excellent Twitter thread by Jake Williams). Instead, when it comes to Snowden’s leaks, I think the question we should be asking ourselves is very different. Namely:
> What did the Snowden leaks tell us about modern surveillance capabilities? And what did we learn about our ability to defend against them?
HTTP Mock – Intercept, debug and mock HTTP
> HTTP Mock is the latest tool in HTTP Toolkit, a suite of beautiful & open-source tools for debugging, testing and building with HTTP(S), on Windows, Linux & Mac.
This does look useful.
Interesting implementation note: https://news.ycombinator.com/item?id=21072087
> The trick is that it starts the application to be intercepted for you, so it can control it a little. It then does some magic to get that specific instance of the application to trust the certificate. There’s a lot going on there, but as an example: Chrome has a --ignore-certificate-errors-spki-list to inject the hashes of extra CAs that can be trusted in this specific Chrome instance. When HTTP Toolkit starts a Chrome process, it adds that command line option, with the hash of your locally generated CA.
The Synchronization of Periodic Routing Messages
> The paper considers a network with many apparently-independent periodic processes and discusses one method by which these processes can inadvertently become synchronized. In particular, we study the synchronization of periodic routing messages, and offer guidelines on how to avoid inadvertent synchronization. Using simulations and analysis, we study the process of synchronization and show that the transition from unsynchronized to synchronized traffic is not one of gradual degradation but is instead a very abrupt ‘phase transition’: in general, the addition of a single router will convert a completely unsynchronized traffic stream into a completely synchronized one. We show that synchronization can be avoided by the addition of randomization to the traffic sources and quantify how much randomization is necessary. In addition, we argue that the inadvertent synchronization of periodic processes is likely to become an increasing problem in computer networks.
Public Suffix List Problems
> This is a collection of thoughts from a maintainer of the Public Suffix List (PSL) about the importance of avoiding new Web Platform features, security, or privacy boundaries assuming the PSL is a good starting point.
> Equally terrifying, however, is how many providers only discovered the existence of the PSL once LE was using it to rate limit - meaning that their users were able to influence cookies and other storage without restriction, until an incidental change (wanting to get more certs) caused the server operator to realize.
In Memoriam: J. C. R. Licklider
Two papers. Man-Computer Symbiosis and The Computer as a Communication Device.
The first argues for interactive systems. The computer can’t be an extension of our mind if it’s not responsive.
The second is a vision for networked communications. It sounds a lot like today, but more optimistic. Where did we go wrong?
HTTP/2 Denial of Service Advisory
> Netflix has discovered several resource exhaustion vectors affecting a variety of third-party HTTP/2 implementations. These attack vectors can be used to launch DoS attacks against servers that support HTTP/2 communication.
Son of Slowloris returns!
> While this added complexity enables some exciting new features, it also raises implementation questions.
Here comes trouble...
> The Security Considerations section of RFC 7540 (see Section 10.5) addresses some of this in a general way. However, unlike the expected “normal” behavior—which is well-documented and which implementations seem to follow very closely—the algorithms and mechanisms for detecting and mitigating “abnormal” behavior are significantly more vague and left as an exercise for the implementer. From a review of various software packages, it appears that this has led to a variety of implementations with a variety of good ideas, but also some weaknesses.
Spying on HTTPS
> While most users probably would have no idea what to make of this, I happened to know what it means– Chrome is warning me that the system configuration has instructed it to leak the secret keys it uses to encrypt and decrypt HTTPS traffic to a stream on the local computer.
Beginner Problems With TCP & The socket Module in Python
> Your operating system will deceive you and re-assemble the string you sock.recv(n) differently from the ones you sock.send(data). But here is the deceptive part. It will work sometimes, but not always. These bugs will be difficult to chase. If you have two programs communicating over TCP via the loopback device in your operating system (the virtual network device with IP 127.0.0.1), then the data does not leave your RAM, and packets are never fragmented to fit into the maximum size of an Ethernet frame or 802.11 WLAN transmission. The data arrives immediately because it’s already there, and the other side gets to read via sock.recv(n) exactly the bytestring you sent over sock.send(data). If you connect to localhost via IPv6, the maximum packet size is 64 kB, and all the packets are already there to be reassembled into a bytestream immediately! But when you try to run the same code over the real Internet, with lag and packet loss, or when you are unlucky with the multitasking/scheduling of your OS, you will either get more data than you expected, leftover data from the last sock.send(data), or incomplete data.
Not strictly a python problem, either.
One core problem with DNSSEC
> One fundamental problem of DNSSEC today is that it suffers from the false positive problem, the same one that security alerts suffer from. In practice today, for almost all people almost all of the time, a DNSSEC failure is not a genuine attack; it is a configuration mistake, and the configuration mistake is almost never on the side making the DNS query. This means that almost all of the time, DNSSEC acts by stopping you from doing something safe that you want to do and further, you can’t fix the DNSSEC problem except by turning off DNSSEC, because it’s someone else’s mistake (in configuration, in operation, or in whatever).
HTTP Desync Attacks: Request Smuggling Reborn
> HTTP requests are traditionally viewed as isolated, standalone entities. In this paper, I’ll explore forgotten techniques for remote, unauthenticated attackers to smash through this isolation and splice their requests into others, through which I was able to play puppeteer with the web infrastructure of numerous commercial and military systems, rain exploits on their visitors, and harvest over $70k in bug bounties.
> The protocol is extremely simple - HTTP requests are simply placed back to back, and the server parses headers to work out where each one ends and the next one starts. This is often confused with HTTP pipelining, which is a rarer subtype that’s not required for the attacks described in this paper. By itself, this is harmless. However, modern websites are composed of chains of systems, all talking over HTTP. This multi-tiered architecture takes HTTP requests from multiple different users and routes them over a single TCP/TLS connection:
The Fully Remote Attack Surface of the iPhone
> We investigated the remote attack surface of the iPhone, and reviewed SMS, MMS, VVM, Email and iMessage. Several tools which can be used to further test these attack surfaces were released. We reported a total of 10 vulnerabilities, all of which have since been fixed. The majority of vulnerabilities occurred in iMessage due to its broad and difficult to enumerate attack surface. Most of this attack surface is not part of normal use, and does not have any benefit to users. Visual Voicemail also had a large and unintuitive attack surface that likely led to a single serious vulnerability being reported in it. Overall, the number and severity of the remote vulnerabilities we found was substantial. Reducing the remote attack surface of the iPhone would likely improve its security.
Better Encrypted Group Chat
> End-to-end encrypted group messaging is also a hard problem to solve. Existing solutions such as Signal, WhatsApp, and iMessage have inherent problems with scaling, which I’ll discuss in detail, that make it infeasible to conduct group chats of more than a few hundred people. The Message Layer Security (MLS) protocol aims to make end-to-end encrypted group chat more efficient while still providing security guarantees like forward secrecy and post-compromise security.
> The primary contribution of molasses has been in detecting errors in the specification and other implementations through unit and interoperability testing. Molasses implements most of MLS draft 6. Why not all of draft 6? There was an error in the spec that made it impossible for members to be added to any group. This broke all the unit tests that create non-trivial groups. Errors like this are hard to catch just by reading the spec; they require some amount of automated digging. Once they are found, the necessary revisions tend to be pretty obvious, and they are swiftly incorporated into the subsequent draft.
Nice work and a very nice explanation of the protocol.
Why file and directory operations are synchronous in NFS
> One simple answer is that the Unix API provides no way to report delayed errors for file and directory operations. If you write() data, it is an accepted part of the Unix API that errors stemming from that write may not be reported until much later, such as when you close() the file. This includes not just ‘IO error’ type errors, but also problems such as ‘out of space’ or ‘disk quota exceeded’; they may only appear and become definite when the system forces the data to be written out. However, there’s no equivalent of close() for things like removing files or renaming them, or making directories; the Unix API assumes that these either succeed or fail on the spot.
Some items from my "reliability list"
> I’ll list some of them here and some of the thinking behind them. Just about everything here has happened at some point in time, and probably has happened more than once... way more than once.
I like a lot of this. Very much.
> On the other hand, if you only need 53 bits of your 64 bit numbers, and enjoy blowing CPU on ridiculously inefficient marshaling and unmarshaling steps, hey, it’s your funeral.