An instruction oddity in the ppc64 (PowerPC 64-bit) architecture
https://utcc.utoronto.ca/~cks/space/blog/tech/PowerPCInstructionOddity [utcc.utoronto.ca]
2023-01-21 19:45
tags:
bugfix
compiler
cpu
programming
turtles
As Raymond Chen notes, ‘or rd, ra, ra’ has the effect of ‘move ra to rd’. Moving a register to itself is a NOP, but several Power versions (the Go code’s comment says Power8, 9, and 10) overload this particular version of a NOP (and some others) to signal that the priority of your hardware thread should be changed by the CPU; in the specific case of ‘or r1, r1, r1’ it drops you to low priority. That leaves us with the mystery of why such an instruction would be used by a compiler, instead of the official NOP (per Raymond Chen, this is ‘or r0, r0, 0’).
As covered in the specific ppc64 diff in the change that introduced this issue, Go wanted to artificially mark a particular runtime function this way (see CL 425396 and Go issue #54332 for more). To do this it needed to touch the stack pointer in a harmless way, which would trigger the toolchain’s weirdness detector. On ppc64, the stack pointer is in r1. So the obvious and natural thing to do is to move r1 to itself, which encodes as ‘or r1, r1, r1’, and which then triggers this special architectural behavior of lowering the priority of that hardware thread. Oops.
https://devblogs.microsoft.com/oldnewthing/20180809-00/?p=99455
https://github.com/golang/go/issues/54332
Building the fastest Lua interpreter.. automatically!
https://sillycross.github.io/2022/11/22/2022-11-22/ [sillycross.github.io]
2022-11-22 23:10
tags:
compiler
jit
lua
perf
programming
I have been working on a research project to make writing VMs easier. The idea arises from the following observation: writing a naive interpreter is not hard (just write a big switch-case), but writing a good interpreter (or JIT compiler) is hard, as it unavoidably involves hand-coding assembly. So why can’t we implement a special compiler to automatically generate a high-performance interpreter (and even the JIT) from “the big switch-case”, or more formally, a semantical description of what each bytecode does?
source: HN
The Applesoft Compiler (TASC): We have the source code, in a sense
https://devblogs.microsoft.com/oldnewthing/20220419-00/?p=106496 [devblogs.microsoft.com]
2022-04-19 22:55
tags:
compiler
mac
programming
retro
Chaining was a common technique when your program got too large to fit into memory all at once, so you broke it into multiple programs that each handed off control to each other.
As the author added features, he kept hitting the Apple ][‘s 48KB RAM limit and was forced to delete all the comments from the code, and when that wasn’t enough, he resorted to shortening all the important variable names to one character.
How to speed up the Rust compiler in April 2022
https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html [nnethercote.github.io]
2022-04-13 20:08
tags:
compiler
development
perf
rust
update
In my last post I introduced the Compiler performance roadmap for 2022. Let’s see how things are progressing.
Along the way I had to undo some optimizations I had added to this code a couple of years ago. Those optimizations turned out to be useful for one kind of expensive macro (with many rules but no metavariables) present in the html5ever benchmark. But such macros aren’t common in practice, and these optimizations were unhelpful for more typical expensive macros, which are recursive, have fewer rules, and use metavariables. This shows the value of a good benchmark suite.
source: L
Generics can make your Go code slower
https://planetscale.com/blog/generics-can-make-your-go-code-slower [planetscale.com]
2022-03-30 18:46
tags:
article
compiler
go
perf
programming
type-system
Go 1.18 is here, and with it, the first release of the long-awaited implementation of Generics is finally ready for production usage. Generics are a frequently requested feature that has been highly contentious throughout the Go community. On the one side, vocal detractors worry about the added complexity. They fear the inescapable evolution of Go towards either a verbose and Enterprisey Java-lite with Generic Factories or, most terrifyingly, a degenerate HaskellScript that replaces ifs with Monads. In all fairness, both these fears may be overblown. On the other side, proponents of generics believe that they are a critical feature to implement clean and reusable code at scale.
This blog post does not take sides in that debate, or advise where and when to use Generics in Go. Instead, this blog post is about the third side of the generics conundrum: It’s about systems engineers who are not excited about generics per se, but about monomorphization and its performance implications. There are dozens of us! Dozens! And we’re all due for some serious disappointment.
Very thorough.
source: HN
PartialExecuter: Reducing WebAssembly size by exploring all executions in LLVM
https://leaningtech.com/reducing-webassembly-size-by-exploring-all-executions-in-llvm/ [leaningtech.com]
2022-03-16 05:11
tags:
compiler
fuzzing
perf
programming
Partial Executer is a brand-new LLVM optimization pass that uses an Interpreter-like engine to prove some code will never be executed, making it safe to eliminate it.
source: HN
Parsing Protobuf at 2+GB/s: How I Learned To Love Tail Calls in C
https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html [blog.reverberate.org]
2021-04-25 19:54
tags:
c
compiler
perf
programming
While tail calls are usually associated with a functional programming style, I am interested in them purely for performance reasons. It turns out that in some cases we can use tail calls to get better code out of the compiler than would otherwise be possible—at least given current compiler technology—without dropping to assembly.
source: HN
Eliminating Data Races in Firefox – A Technical Report
https://hacks.mozilla.org/2021/04/eliminating-data-races-in-firefox-a-technical-report/ [hacks.mozilla.org]
2021-04-07 00:02
tags:
compiler
concurrency
cxx
development
programming
update
We successfully deployed ThreadSanitizer in the Firefox project to eliminate data races in our remaining C/C++ components. In the process, we found several impactful bugs and can safely say that data races are often underestimated in terms of their impact on program correctness. We recommend that all multithreaded C/C++ projects adopt the ThreadSanitizer tool to enhance code quality.
source: HN
Cranelift, Part 3: Correctness in Register Allocation
https://cfallin.org/blog/2021/03/15/cranelift-isel-3/ [cfallin.org]
2021-03-19 23:12
tags:
compiler
compsci
development
fuzzing
jit
programming
rust
testing
In this post, I will cover how we worked to ensure correctness in our register allocator, regalloc.rs, by developing a symbolic checker that uses abstract interpretation to prove correctness for a specific register allocation result. By using this checker as a fuzzing oracle, and driving just the register allocator with a focused fuzzing target, we have been able to uncover some very interesting and subtle bugs, and achieve a fairly high confidence in the allocator’s robustness.
source: HN
Pointers Are Complicated II, or: We need better language specs
https://www.ralfj.de/blog/2020/12/14/provenance.html [www.ralfj.de]
2020-12-14 23:12
tags:
c
compiler
perf
programming
Below, I will show a series of three compiler transformations that each seem “intuitively justified”, but when taken together they lead to a clearly incorrect result. I will use LLVM for these examples, but the goal is not to pick on LLVM—other compilers suffer from similar issues. The goal is to convince you that to build a correct compiler for languages permitting unsafe pointer manipulation such as C, C++, or Rust, we need to take IR semantics (and specifically provenance) more seriously. I use LLVM for the examples because it is particularly easy to study with its single, extensively-documented IR that a lot of infrastructure evolved around. Let’s get started!
source: HN
Rust programming language exploit mitigations
http://rcvalle.blog/2020/09/16/rust-lang-exploit-mitigations/ [rcvalle.blog]
2020-10-02 21:28
tags:
compiler
defense
development
rust
security
This section documents the exploit mitigations applicable to the Rust compiler when building programs for the Linux operating system on the AMD64 architecture and equivalent.
source: L
Zig's New Relationship with LLVM
https://kristoff.it/blog/zig-new-relationship-llvm/ [kristoff.it]
2020-09-30 01:28
tags:
compiler
development
update
zig
In the early days, Zig was but a thin frontend in front of LLVM. This was instrumental for getting started quickly and filling in gaps of Andrew’s knowledge as a compiler developer. Now, the training wheels of the bicycle are coming off, and LLVM is transitioning into an optional component.
The move to a self-hosted compiler for Zig has similar advantages for the core contributors, but it also makes LLVM an optional dependency, increases compilation speed (instead of losing it), and adds an amazing feature for debug builds of your code: incremental compilation with in-place binary patching, another unique Zig feature.
source: L
Proposal: Register-based Go calling convention
https://go.googlesource.com/proposal/+/refs/changes/78/248178/1/design/40724-register-calling.md [go.googlesource.com]
2020-08-17 04:29
tags:
compiler
cpu
go
programming
We propose switching the Go ABI from its current stack-based calling convention to a register-based calling convention. Preliminary experiments indicate this will achieve at least a 5–10% throughput improvement across a range of applications. This will remain backwards compatible with existing assembly code that assumes Go’s current stack-based calling convention through Go’s multiple ABI mechanism.
This also presents a very nice overview of existing calling conventions.
source: L
Ensmallening Go binaries by prohibiting comparisons
https://dave.cheney.net/2020/05/09/ensmallening-go-binaries-by-prohibiting-comparisons [dave.cheney.net]
2020-05-09 18:00
tags:
compiler
go
programming
In this post I’ll dig into what equality, in the context of a Go program, means and why changes like this have a measurable impact on the size of a Go program.
Addendum: thanks to Brad’s prodding, Go 1.15 already has a bunch of improvements by Cherry Zhang and Keith Randall that fix the most egregious of the failures to eliminate unnecessary equality and hash functions (although I suspect it was also to avoid the proliferation of this class of CLs).
firefox's low-latency webassembly compiler
http://wingolog.org/archives/2020/03/25/firefoxs-low-latency-webassembly-compiler [wingolog.org]
2020-03-26 20:45
tags:
browser
compiler
programming
wasm
The goals of high throughput and low latency conflict with each other. To get best throughput, a compiler needs to spend time on code motion, register allocation, and instruction selection; to get low latency, that’s exactly what a compiler should not do. Web browsers therefore take a two-pronged approach: they have a compiler optimized for throughput, and a compiler optimized for latency. As a WebAssembly file is being downloaded, it is first compiled by the quick-and-dirty low-latency compiler, with the goal of producing machine code as soon as possible. After that “baseline” compiler has run, the “optimizing” compiler works in the background to produce high-throughput code. The optimizing compiler can take more time because it runs on a separate thread. When the optimizing compiler is done, it replaces the baseline code. (The actual heuristics about whether to do baseline + optimizing (“tiering“) or just to go straight to the optimizing compiler are a bit hairy, but this is a summary.)
This article is about the WebAssembly baseline compiler in Firefox. It’s a surprising bit of code and I learned a few things from it.
source: HN
Precision Opportunities for Demanded Bits in LLVM
https://blog.regehr.org/archives/1714 [blog.regehr.org]
2020-01-22 22:37
tags:
compiler
perf
programming
A fun thing that optimizing compilers can do is to automatically infer when the full power of some operation is not needed, in which case it may be that the operation can be replaced by a cheaper one. This happens in several different ways; the method we care about today is driven by LLVM’s demanded bits static analysis, whose purpose is to prove that certain bits of an SSA value are irrelevant. For example, if a 32-bit value is truncated to 8 bits, and if that value has no other uses, then the 24 high bits of the original value clearly do not matter.
Speculative Load Hardening
https://llvm.org/docs/SpeculativeLoadHardening.html [llvm.org]
2020-01-09 20:40
tags:
c
compiler
cxx
defense
development
security
sidechannel
While several approaches are being actively pursued to mitigate specific branches and/or loads inside especially risky software (most notably various OS kernels), these approaches require manual and/or static analysis aided auditing of code and explicit source changes to apply the mitigation. They are unlikely to scale well to large applications. We are proposing a comprehensive mitigation approach that would apply automatically across an entire program rather than through manual changes to the code. While this is likely to have a high performance cost, some applications may be in a good position to take this performance / security tradeoff.
Unintuitive JSON Parsing
https://nullprogram.com/blog/2019/12/28/ [nullprogram.com]
2019-12-30 23:30
tags:
compiler
format
javascript
Based on the title, could be just about anything...
The parser will not complain about leading zeros because JSON has no concept of leading zeros.
Unexpectedly a parse error, not a lex error.
Introducing Glush: a robust, human readable, top-down parser compiler
https://www.sanity.io/blog/why-we-wrote-yet-another-parser-compiler [www.sanity.io]
2019-12-18 17:54
tags:
compiler
compsci
programming
release
swtools
text
It’s been 45 years since Stephen Johnson wrote Yacc (Yet another compiler-compiler), a parser generator that made it possible for anyone to write fast, efficient parsers. Yacc, and its many derivatives, quickly became popular and were included in many Unix distributions. You would imagine that in 45 years we would have further perfected the art of creating parsers and would have standardized on a single tool. A lot of progress has been made, but there are still annoyances and problems affecting every tool out there.
This is great, even just for the overview of parsing.
The CYK algorithm (named after Cocke–Younger–Kasami) is in my opinion of great theoretical importance when it comes to parsing context-free grammars. CYK will parse all context-free parsers in O(n3), including the “simple” grammars that LL/LR can parse in linear time. It accomplishes this by converting parsing into a different problem: CYK shows that parsing context-free languages is equivalent to doing a boolean matrix multiplication. Matrix multiplication can be done naively in cubic time, and as such parsing context-free languages can be done in cubic time. It’s a very satisfying theoretical result, and the actual algorithm is small and easy to understand.
source: trivium
Pointer Authentication
https://github.com/apple/llvm-project/blob/apple/master/clang/docs/PointerAuthentication.rst [github.com]
2019-12-15 21:48
tags:
c
compiler
cpu
defense
development
hash
programming
security
Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution. When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs. Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically). This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI. Considerations that are implementation-specific are clearly identified throughout.
source: HN