Proposal: Register-based Go calling convention
We propose switching the Go ABI from its current stack-based calling convention to a register-based calling convention. Preliminary experiments indicate this will achieve at least a 5–10% throughput improvement across a range of applications. This will remain backwards compatible with existing assembly code that assumes Go’s current stack-based calling convention through Go’s multiple ABI mechanism.
This also presents a very nice overview of existing calling conventions.
Ensmallening Go binaries by prohibiting comparisons
In this post I’ll dig into what equality, in the context of a Go program, means and why changes like this have a measurable impact on the size of a Go program.
Addendum: thanks to Brad’s prodding, Go 1.15 already has a bunch of improvements by Cherry Zhang and Keith Randall that fix the most egregious of the failures to eliminate unnecessary equality and hash functions (although I suspect it was also to avoid the proliferation of this class of CLs).
firefox's low-latency webassembly compiler
The goals of high throughput and low latency conflict with each other. To get best throughput, a compiler needs to spend time on code motion, register allocation, and instruction selection; to get low latency, that’s exactly what a compiler should not do. Web browsers therefore take a two-pronged approach: they have a compiler optimized for throughput, and a compiler optimized for latency. As a WebAssembly file is being downloaded, it is first compiled by the quick-and-dirty low-latency compiler, with the goal of producing machine code as soon as possible. After that “baseline” compiler has run, the “optimizing” compiler works in the background to produce high-throughput code. The optimizing compiler can take more time because it runs on a separate thread. When the optimizing compiler is done, it replaces the baseline code. (The actual heuristics about whether to do baseline + optimizing (“tiering“) or just to go straight to the optimizing compiler are a bit hairy, but this is a summary.)
This article is about the WebAssembly baseline compiler in Firefox. It’s a surprising bit of code and I learned a few things from it.
Precision Opportunities for Demanded Bits in LLVM
A fun thing that optimizing compilers can do is to automatically infer when the full power of some operation is not needed, in which case it may be that the operation can be replaced by a cheaper one. This happens in several different ways; the method we care about today is driven by LLVM’s demanded bits static analysis, whose purpose is to prove that certain bits of an SSA value are irrelevant. For example, if a 32-bit value is truncated to 8 bits, and if that value has no other uses, then the 24 high bits of the original value clearly do not matter.
Speculative Load Hardening
While several approaches are being actively pursued to mitigate specific branches and/or loads inside especially risky software (most notably various OS kernels), these approaches require manual and/or static analysis aided auditing of code and explicit source changes to apply the mitigation. They are unlikely to scale well to large applications. We are proposing a comprehensive mitigation approach that would apply automatically across an entire program rather than through manual changes to the code. While this is likely to have a high performance cost, some applications may be in a good position to take this performance / security tradeoff.
Unintuitive JSON Parsing
Based on the title, could be just about anything...
The parser will not complain about leading zeros because JSON has no concept of leading zeros.
Unexpectedly a parse error, not a lex error.
Introducing Glush: a robust, human readable, top-down parser compiler
It’s been 45 years since Stephen Johnson wrote Yacc (Yet another compiler-compiler), a parser generator that made it possible for anyone to write fast, efficient parsers. Yacc, and its many derivatives, quickly became popular and were included in many Unix distributions. You would imagine that in 45 years we would have further perfected the art of creating parsers and would have standardized on a single tool. A lot of progress has been made, but there are still annoyances and problems affecting every tool out there.
This is great, even just for the overview of parsing.
The CYK algorithm (named after Cocke–Younger–Kasami) is in my opinion of great theoretical importance when it comes to parsing context-free grammars. CYK will parse all context-free parsers in O(n3), including the “simple” grammars that LL/LR can parse in linear time. It accomplishes this by converting parsing into a different problem: CYK shows that parsing context-free languages is equivalent to doing a boolean matrix multiplication. Matrix multiplication can be done naively in cubic time, and as such parsing context-free languages can be done in cubic time. It’s a very satisfying theoretical result, and the actual algorithm is small and easy to understand.
Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution. When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs. Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically). This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI. Considerations that are implementation-specific are clearly identified throughout.
So We Don'T Have A Solution For Catalina...Yet
With the release of macOS 10.15 (Catalina), Apple has dropped support for running 32-bit executables and removed the 32-bit versions of system frameworks and libraries. Most Windows applications our users run with CrossOver are 32-bit and CrossOver uses a 32-bit Mac executable, system frameworks, and libraries to run them. This will break with Catalina.
And then comes the fun part:
We have built a modified version of the standard C language compiler for macOS, Clang, to automate many of the changes we need to make to Wine’s behavior without pervasive changes to Wine’s source code.
First, our version of Clang understands both 32- and 64-bit pointers. We are able to control from a broad level down to a detailed level which pointers in Wine’s source code need to be 32-bit and which 64-bit. Any code which substitutes for Windows at the interface with the Windows app has to use 32-bit pointers. On the other hand, the interfaces to the system libraries are always 64-bit.
Chunking Optimizations: Let the Knife Do the Work
“Letting the knife do the work” means writing a correct program and lifting unnecessary constraints so that the compiler can use whatever chunk size is appropriate for the target.
Tearing apart printf()
If ‘Hello World’ is the first program for C students, then printf() is probably the first function. I’ve had to answer questions about printf() many times over the years, so I’ve finally set aside time for an informal writeup.
This wild goose chase is not only a great learning experience, but also an interesting test for the dedicated beginner. Will they come back with an answer? If so, how detailed is it? What IS a good answer?
How Swift Achieved Dynamic Linking Where Rust Couldn't
For those who don’t follow Swift’s development, ABI stability has been one of its most ambitious projects and possibly it’s defining feature, and it finally shipped in Swift 5. The result is something I find endlessly fascinating, because I think Swift has pushed the notion of ABI stability farther than any language without much compromise.
AddressSanitizer (ASan) for Windows with MSVC
We are pleased to announce AddressSanitizer (ASan) support for the MSVC toolset. ASan is a fast memory error detector that can find runtime memory issues such as use-after-free and perform out of bounds checks. Support for sanitizers has been one of our more popular suggestions on Developer Community, and we can now say that we have an experience for ASan on Windows, in addition to our existing support for Linux projects.
MSVC support for ASan is available in our second Preview release of Visual Studio 2019 version 16.4.
Most functional compiler
One-letter variable names abound in IOCCC entries, and for good reason. These tiny pieces of confetti are hard to read, and leave room for more code. Then why not go further and use zero-letter variable names? That is, tacit programming or point-free style.
I had been playing with an algorithm devised by Oleg Kiselyov that effortlessly and efficiently eliminates those pesky variables, leaving behind terms composed from a small set of combinators. No need for lambda lifting or supercombinators.
By adding a handful of lines of mostly parsing code, we get a Haskell compiler, or rather, a compiler that accepts a subset of Haskell sufficiently large to self-host. You might say I wrote a tool for this contest, then ran it on itself to make an entry for it.
And more: https://www.ioccc.org/years.html#2019
Python is not context free
The interesting thing about Python’s syntax is, of course, its use of indentation to indicate program structure.
A nice review of interaction between lexing and parsing.
"Stubs" in the .NET Runtime
‘Stubs’, as they’re known in the runtime (sometimes ‘Thunks’), provide a level of indirection throughout the source code, there’s almost 500 mentions of them!
This post will explore what they are, how they work and why they’re needed.
Closing the gap: cross-language LTO between Rust and C/C++
Link time optimization (LTO) is LLVM’s way of implementing whole-program optimization. Cross-language LTO is a new feature in the Rust compiler that enables LLVM’s link time optimization to be performed across a mixed C/C++/Rust codebase. It is also a feature that beautifully combines two respective strengths of the Rust programming language and the LLVM compiler platform:
Security flaws caused by compiler optimizations
I think I would prefer “enabled” to “caused” since the flaw is in the code. A collection of bug patterns worth considering.
DROB (Dynamic Rewriter and Optimizer of Binary code)
This library implements application-guided rewriting of binary functions at runtime. Binary functions can be optimized and specialized based on runtime information. In contrast to transparent binary optimization, only selected binary functions are rewritten. No metadata (e.g. debug information) is required.
Yaegi is Another Elegant Go Interpreter
Despite being static and strongly typed, Go feels like a dynamic language. The standard library even provides the Go parser used by the compiler and the reflection system to interact dynamically with the runtime. So why not just take the last logical step and finally build a complete Go interpreter?
Programming languages for high level scripting and for low level implementation are usually different. This time, with Go, we have an opportunity to unify both. Imagine all the C/C++/Java fast libraries for Python being written in Python instead. That’s what Yaegi is for Go (or, the reverse). No burden due to syntax switch, no need to rewrite or modify slow code to make it fast, and full access to goroutines, channels, type safety, etc. at script level.