The Hidden Number Problem
> The Hidden Number Problem (HNP) is a problem that poses the question: Are the most signficant bits of a Diffie-Hellman shared key as hard to compute as the entire secret? The original problem was defined in the paper “Hardness of computing the most significant bits of secret keys in Diffie-Hellman and related schemes” by Dan Boneh and Ramarathnam Venkatesan.
> In this paper Boneh and Venkatesan demonstrate that a bounded number of most signifcant bits of a shared secret are as hard to compute as the entire secret itself. They also demonstrate an efficient algorithm for recovering secrets given a significant enough bit leakage. This notebook walks through some of the paper and demonstrates some of the results.
Precision Opportunities for Demanded Bits in LLVM
> A fun thing that optimizing compilers can do is to automatically infer when the full power of some operation is not needed, in which case it may be that the operation can be replaced by a cheaper one. This happens in several different ways; the method we care about today is driven by LLVM’s demanded bits static analysis, whose purpose is to prove that certain bits of an SSA value are irrelevant. For example, if a 32-bit value is truncated to 8 bits, and if that value has no other uses, then the 24 high bits of the original value clearly do not matter.
The Soundness Pledge
> This post is an opportunity to share some thoughts I’ve had about soundness, Rust, and open source community.
> I believe one of the most important contributions of Rust is the cultural ideal of perfect soundness: that code using a sound library, no matter how devious, is unable to trigger undefined behavior (which is often thought of in terms of crashes but can be far more insidious). Any deviation from this is a bug. The Rust language itself clearly subscribes to this ideal, even as it sometimes falls short of attaining it (at this writing, there are 44 I-unsound bugs, the oldest of which is more than 6 years old).
Gathering Intel on Intel AVX-512 Transitions
> This is a post about AVX and AVX-512 related frequency scaling. Now, something more than nothing has been written about this already, including cautionary tales of performance loss and some broad guidelines, so do we really need to add to the pile?
> Perhaps not, but I’m doing it anyway. My angle is a lower level look, almost microscopic really, at the specific transition behaviors. One would hope that this will lead to specific, quantitative advice about exactly when various instruction types are likely to pay off, but (spoiler) I didn’t make it there in this post.
How Go's net.DialContext() stops things when the context is cancelled
> When I started looking into the relevant standard library code I expected to find that things like net.Dialer.DialContext() had special hooks into the runtime’s network poller (netpoller) to do this. This turns out to not be the case; instead dialing uses an interesting and elegant approach that’s open to everyone doing network IO.
> In order to abort an outstanding dial operation if the context is cancelled, the net package simply sets an expired (write) deadline.
Autocomplete as an interface
> I’m used to thinking of autocomplete as a convenience tool that saves you a few keystrokes, but it’s much more than that. Good autocompletion has become a driving factor in which tools I choose. If I were writing a sophisticated user interface today—say, a programming language or a complex application—autocompletion is one of the primary constraints I would design it around. It’s that important.
Real-Time Ray-Tracing in WebGPU
> Note that RTX is not available officially for WebGPU (yet?) and is only available for the Node bindings for WebGPU. Recently I began adapting an unofficial Ray-Tracing extension for Dawn, which is the WebGPU implementation for Chromium. The Ray-Tracing extension is only implemented into the Vulkan backend so far, but a D3D12 implementation is on the Roadmap. You can find my Dawn Fork with Ray-Tracing capabilities here.
> Now let me introduce you to the ideas and concepts of the Ray-Tracing extension.
Mercurial's Journey to and Reflections on Python 3
> Speaking as a maintainer of Mercurial and an avid user of Python, I feel like the experience of making Mercurial work with Python 3 is worth sharing because there are a number of lessons to be learned.
> This post is logically divided into two sections: a mostly factual recount of Mercurial’s Python 3 porting effort and a more opinionated commentary of the transition to Python 3 and the Python language ecosystem as a whole. Those who don’t care about the mechanics of porting a large Python project to Python 3 may want to skip the next section or two
Translating Quake 3 into Rust
> The Rust-loving team at Immunant has been hard at work on C2Rust, a migration framework that takes the drudgery out of migrating to Rust. Our goal is to make safety improvements to the translated Rust automatically where we can, and help the programmer do the same where we cannot. First, however, we have to build a rock-solid translator that gets people up and running in Rust. Testing on small CLI programs gets old eventually, so we decided to try translating Quake 3 into Rust. After a couple of days, we were likely the first people to ever play Quake3 in Rust!
The Polygons Of Another World
> An other choice would be Eric Chahi’s 1991 critically acclaimed” title “Another World”, better known in North America as “Out Of This World” which also happens to be ubiquitous. I would argue it is in fact more interesting to study than DOOM because of its polygon based graphics which are suitable to wild optimizations. In some cases, clever tricks allowed Another World to run on hardware built up to five years prior to the game release.
> This series is a journey through the video-games hardware of the early 90s. From the Amiga 500, Atari ST, IBM PC, Super Nintendo, up to the Sega Genesis. For each machine, I attempted to discover how Another World was implemented. I found an environment made rich by its diversity where the now ubiquitous CPU/GPU did not exist yet. In the process, I discovered the untold stories of seemingly impossible problems heroically solved by lone programmers.
Purgeable Memory Allocations for Linux
> It allocates anonymous pages using mmap(2). When the allocation is “unlocked” — i.e. the process isn’t actively using it — its pages are marked with MADV_FREE so that the kernel can reclaim them at any time. To lock the allocation so that the process can safely make use of them, the MADV_FREE is canceled. This is all a little trickier than it sounds, and that’s the subject of this article.
CPU Introspection: Intel Load Port Snooping
> We’re going to go into a unique technique for observing and sequencing all load port traffic on Intel processors. By using a CPU vulnerability from the MDS set of vulnerabilities, specifically multi-architectural load port data sampling (MLPDS, CVE-2018-12127), we are able to observe values which fly by on the load ports. Since (to my knowledge) all loads must end up going through load ports, regardless of requestor, origin, or caching, this means in theory, all contents of loads ever performed can be observed. By using a creative scanning technique we’re able to not only view “random” loads as they go by, but sequence loads to determine the ordering and timing of them.
> We’ll go through some examples demonstrating that this technique can be used to view all loads as they are performed on a cycle-by-cycle basis. We’ll look into an interesting case of the micro-architecture updating accessed and dirty bits using a microcode assist. These are invisible loads dispatched on the CPU on behalf of the user when a page is accessed for the first time.
Xor Filters: Faster and Smaller Than Bloom Filters
> Among other alternatives, Fan et al. introduced Cuckoo filters which use less space and are faster than Bloom filters. While implementing a Bloom filter is a relatively simple exercise, Cuckoo filters require a bit more engineering.
> Could we do even better while limiting the code to something you can hold in your head?
> It turns out that you can with xor filters. We just published a paper called Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters that will appear in the Journal of Experimental Algorithmics.
Introducing Glush: a robust, human readable, top-down parser compiler
> It’s been 45 years since Stephen Johnson wrote Yacc (Yet another compiler-compiler), a parser generator that made it possible for anyone to write fast, efficient parsers. Yacc, and its many derivatives, quickly became popular and were included in many Unix distributions. You would imagine that in 45 years we would have further perfected the art of creating parsers and would have standardized on a single tool. A lot of progress has been made, but there are still annoyances and problems affecting every tool out there.
This is great, even just for the overview of parsing.
> The CYK algorithm (named after Cocke–Younger–Kasami) is in my opinion of great theoretical importance when it comes to parsing context-free grammars. CYK will parse all context-free parsers in O(n3), including the “simple” grammars that LL/LR can parse in linear time. It accomplishes this by converting parsing into a different problem: CYK shows that parsing context-free languages is equivalent to doing a boolean matrix multiplication. Matrix multiplication can be done naively in cubic time, and as such parsing context-free languages can be done in cubic time. It’s a very satisfying theoretical result, and the actual algorithm is small and easy to understand.
OpenSSH Key Shielding
> On June 21, 2019, support for SSH key shielding was introduced into the OpenBSD tree, from which the OpenSSH releases are derived. SSH key shielding is a measure intended to protect private keys in RAM against attacks that abuse bugs in speculative execution that current CPUs exhibit. This functionality has been part of OpenSSH since the 8.1 release. SSH private keys are now being held in memory in a shielded form; keys are only unshielded when they are used and re‐shielded as soon as they are no longer in active use. When a key is shielded, it is encrypted in memory with AES‐256‐CTR; this is how it works:
Don't use booleans
> Use enums instead.
Stop worrying about blocking: the new async-std runtime, inspired by Go
> async-std is a mature and stable port of the Rust standard library to its new async/await world, designed to make async programming easy, efficient, worry- and error-free.
> Today, we’re introducing the new async-std runtime. It features a lot of improvements, but the main news is that it eliminates a major source of bugs and performance issues in concurrent programs: accidental blocking.
A new cycle-stepped 6502 CPU emulator
> I wrote a new version of my 6502/6510 emulator in the last weeks which can be stepped forward in clock cycles instead of full instructions.
> Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution. When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
> While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs. Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically). This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI. Considerations that are implementation-specific are clearly identified throughout.
Introducing sqlc - Compile SQL queries to type-safe Go
> sqlc accomplishes all of this by taking a fundamentally different approach: compiling SQL into fully type-safe, idiomatic Go code.