inks

tag: compiler

Making the rav1d Video Decoder 1% Faster

https://ohadravid.github.io/posts/2025-05-rav1d-faster/ [ohadravid.github.io]

2025-05-25 00:24

rav1d is a port of dav1d, created by (1) running c2rust on dav1d, (2) incorporating dav1d’s asm-optimized functions, and (3) changing the code to be more Rust-y and safer.

Video decoders are notoriously complex pieces of software, but because we are comparing the performance of two similar deterministic binaries we might be able to avoid a lot of that complexity - with the right tooling.

source: HN

Evolution of Rust compiler errors

https://kobzol.github.io/rust/rustc/2025/05/16/evolution-of-rustc-errors.html [kobzol.github.io]

2025-05-16 22:13

tags: compiler development rust ux

I wrote a script that downloaded all stable Rust releases all the way back to 1.0, executed each stable version of the compiler on a set of small programs containing an error and gathered the compiler standard (error) output.

source: HN

Beating the Fastest Lexer Generator in Rust

https://alic.dev/blog/fast-lexing [alic.dev]

2025-05-09 19:07

tags: compiler perf programming rust text

I was aware of the efficiency of state machine driven lexers, but most generators have one problem: they can’t be arbitrarily generic and consistently optimal at the same time. There will always be some assumptions about your data that are either impossible to express, or outside the scope of the generator’s optimizations. Either way, I was curious to find out how my hand-rolled implementation would fare.

source: L

The Defer Technical Specification: It Is Time

https://thephd.dev/c2y-the-defer-technical-specification-its-time-go-go-go [thephd.dev]

2025-03-19 22:48

tags: c compiler programming standard

Time for me to write this blog post and prepare everyone for the implementation blitz that needs to happen to make defer a success for the C programming language.

source: HN

Constant-Time Code: The Pessimist Case

https://eprint.iacr.org/2025/435 [eprint.iacr.org]

2025-03-08 06:09

tags: compiler cpu crypto paper pdf perf programming turtles

This note discusses the problem of writing cryptographic implementations in software, free of timing-based side-channels, and many ways in which that endeavour can fail in practice. It is a pessimist view: it highlights why such failures are expected to become more common, and how constant-time coding is, or will soon become, infeasible in all generality.

From compiler optimizations to CPU pipelines and register renaming.

0+0 > 0: C++ thread-local storage performance

https://yosefk.com/blog/cxx-thread-local-storage-performance.html [yosefk.com]

2025-02-17 21:29

tags: compiler concurrency cxx library perf programming

We’ll discuss how to make sure that your access to TLS (thread-local storage) is fast. If you’re interested strictly in TLS performance guidelines and don’t care about the details, skip right to the end — but be aware that you’ll be missing out on assembly listings of profound emotional depth, which can shake even a cynical, battle-hardened programmer. If you don’t want to miss out on that — and who would?! — read on, and you shall learn the computer-scientific insight behind the intriguing inequality 0+0 > 0.

source: HN

How do modern compilers choose which variables to put in registers?

https://langdev.stackexchange.com/questions/4325/how-do-modern-compilers-choose-which-variables-to-put-in-registers [langdev.stackexchange.com]

2025-02-17 20:59

tags: compiler cpu programming

This is a very broad subject. The problem of deciding how to map a program with arbitrarily many variables onto a fixed set of registers is known as register allocation, and it has been the subject of much research, study, and engineering effort since the very earliest compilers. One of the canonical approaches, graph coloring, was first proposed in 1981. Countless other approaches and variants have been explored since then, and I cannot hope to cover the full breadth of the topic in a single answer.

source: HN

Don't clobber the frame pointer

https://nsrip.com/posts/clobberfp.html [nsrip.com]

2025-01-05 09:34

tags: bugfix compiler cpu go programming

Recently I diagnosed and fixed two frame pointer unwinding crashes in Go. The root causes were two flavors of the same problem: buggy assembly code clobbered a frame pointer. By “clobbered” I mean wrote over the value without saving & restoring it. One bug clobbered the frame pointer register. The other bug clobbered a frame pointer saved on the stack. This post explains the bugs, talks a bit about ABIs and calling conventions, and makes some recommendations for how to avoid the bugs.

source: L

Blazingly Fast Shadow Stacks for Go

https://blog.felixge.de/blazingly-fast-shadow-stacks-for-go/ [blog.felixge.de]

2024-05-30 07:32

tags: compiler go perf programming

Software shadow stacks could deliver up to 8x faster stack trace capturing in the Go runtime when compared to the frame pointer unwinding that landed in go1.21. This doesn’t mean that this idea should escape from the laboratory right away, but it offers a fun glimpse into a potential future of hardware accelerated stack trace capturing via shadow stacks.

source: HN

An improved chkstk function on Windows

https://nullprogram.com/blog/2024/02/05/ [nullprogram.com]

2024-02-06 23:47

tags: compiler cpu programming windows

If you’ve spent much time developing with Mingw-w64 you’ve likely seen the symbol ___chkstk_ms, perhaps in an error message. It’s a little piece of runtime provided by GCC via libgcc which ensures enough of the stack is committed for the caller’s stack frame. The “function” uses a custom ABI and is implemented in assembly. So is the subject of this article, a slightly improved implementation soon to be included in w64devkit as libchkstk (-lchkstk).

source: L

Running the “Reflections on Trusting Trust” Compiler

https://research.swtch.com/nih [research.swtch.com]

2023-10-26 19:09

tags: c compiler development programming retro security turtles unix

In October 1983, 40 years ago this week, Ken Thompson chose supply chain security as the topic for his Turing award lecture, although the specific term wasn’t used back then. (The field of computer science was still young and small enough that the ACM conference where Ken spoke was the “Annual Conference on Computers.”) Ken’s lecture was later published in Communications of the ACM under the title “Reflections on Trusting Trust.” It is a classic paper, and a short one (3 pages); if you haven’t read it yet, you should. This post will still be here when you get back.

In the lecture, Ken explains in three steps how to modify a C compiler binary to insert a backdoor when compiling the “login” program, leaving no trace in the source code. In this post, we will run the backdoored compiler using Ken’s actual code. But first, a brief summary of the important parts of the lecture.

source: L

Polonius update

https://blog.rust-lang.org/inside-rust/2023/10/06/polonius-update.html [blog.rust-lang.org]

2023-10-08 19:10

tags: compiler compsci programming rust update

Polonius refers to a few things. It is a new formulation of the borrow checker. It is also a specific project that implemented that analysis, based on datalog. Our current plan does not make use of that datalog-based implementation, but uses what we learned implementing it to focus on reimplementing Polonius within rustc.

source: L

An instruction oddity in the ppc64 (PowerPC 64-bit) architecture

https://utcc.utoronto.ca/~cks/space/blog/tech/PowerPCInstructionOddity [utcc.utoronto.ca]

2023-01-21 19:45

tags: bugfix compiler cpu programming turtles

As Raymond Chen notes, ‘or rd, ra, ra’ has the effect of ‘move ra to rd’. Moving a register to itself is a NOP, but several Power versions (the Go code’s comment says Power8, 9, and 10) overload this particular version of a NOP (and some others) to signal that the priority of your hardware thread should be changed by the CPU; in the specific case of ‘or r1, r1, r1’ it drops you to low priority. That leaves us with the mystery of why such an instruction would be used by a compiler, instead of the official NOP (per Raymond Chen, this is ‘or r0, r0, 0’).

As covered in the specific ppc64 diff in the change that introduced this issue, Go wanted to artificially mark a particular runtime function this way (see CL 425396 and Go issue #54332 for more). To do this it needed to touch the stack pointer in a harmless way, which would trigger the toolchain’s weirdness detector. On ppc64, the stack pointer is in r1. So the obvious and natural thing to do is to move r1 to itself, which encodes as ‘or r1, r1, r1’, and which then triggers this special architectural behavior of lowering the priority of that hardware thread. Oops.

https://devblogs.microsoft.com/oldnewthing/20180809-00/?p=99455

https://github.com/golang/go/issues/54332

Building the fastest Lua interpreter.. automatically!

https://sillycross.github.io/2022/11/22/2022-11-22/ [sillycross.github.io]

2022-11-22 23:10

tags: compiler jit lua perf programming

I have been working on a research project to make writing VMs easier. The idea arises from the following observation: writing a naive interpreter is not hard (just write a big switch-case), but writing a good interpreter (or JIT compiler) is hard, as it unavoidably involves hand-coding assembly. So why can’t we implement a special compiler to automatically generate a high-performance interpreter (and even the JIT) from “the big switch-case”, or more formally, a semantical description of what each bytecode does?

source: HN

The Applesoft Compiler (TASC): We have the source code, in a sense

https://devblogs.microsoft.com/oldnewthing/20220419-00/?p=106496 [devblogs.microsoft.com]

2022-04-19 22:55

tags: compiler mac programming retro

Chaining was a common technique when your program got too large to fit into memory all at once, so you broke it into multiple programs that each handed off control to each other.

As the author added features, he kept hitting the Apple ][‘s 48KB RAM limit and was forced to delete all the comments from the code, and when that wasn’t enough, he resorted to shortening all the important variable names to one character.

How to speed up the Rust compiler in April 2022

https://nnethercote.github.io/2022/04/12/how-to-speed-up-the-rust-compiler-in-april-2022.html [nnethercote.github.io]

2022-04-13 20:08

tags: compiler development perf rust update

In my last post I introduced the Compiler performance roadmap for 2022. Let’s see how things are progressing.

Along the way I had to undo some optimizations I had added to this code a couple of years ago. Those optimizations turned out to be useful for one kind of expensive macro (with many rules but no metavariables) present in the html5ever benchmark. But such macros aren’t common in practice, and these optimizations were unhelpful for more typical expensive macros, which are recursive, have fewer rules, and use metavariables. This shows the value of a good benchmark suite.

source: L

Generics can make your Go code slower

https://planetscale.com/blog/generics-can-make-your-go-code-slower [planetscale.com]

2022-03-30 18:46

tags: article compiler go perf programming type-system

Go 1.18 is here, and with it, the first release of the long-awaited implementation of Generics is finally ready for production usage. Generics are a frequently requested feature that has been highly contentious throughout the Go community. On the one side, vocal detractors worry about the added complexity. They fear the inescapable evolution of Go towards either a verbose and Enterprisey Java-lite with Generic Factories or, most terrifyingly, a degenerate HaskellScript that replaces ifs with Monads. In all fairness, both these fears may be overblown. On the other side, proponents of generics believe that they are a critical feature to implement clean and reusable code at scale.

This blog post does not take sides in that debate, or advise where and when to use Generics in Go. Instead, this blog post is about the third side of the generics conundrum: It’s about systems engineers who are not excited about generics per se, but about monomorphization and its performance implications. There are dozens of us! Dozens! And we’re all due for some serious disappointment.

Very thorough.

source: HN

PartialExecuter: Reducing WebAssembly size by exploring all executions in LLVM

https://leaningtech.com/reducing-webassembly-size-by-exploring-all-executions-in-llvm/ [leaningtech.com]

2022-03-16 05:11

tags: compiler fuzzing perf programming

Partial Executer is a brand-new LLVM optimization pass that uses an Interpreter-like engine to prove some code will never be executed, making it safe to eliminate it.

source: HN

Parsing Protobuf at 2+GB/s: How I Learned To Love Tail Calls in C

https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html [blog.reverberate.org]

2021-04-25 19:54

tags: c compiler perf programming

While tail calls are usually associated with a functional programming style, I am interested in them purely for performance reasons. It turns out that in some cases we can use tail calls to get better code out of the compiler than would otherwise be possible—at least given current compiler technology—without dropping to assembly.

source: HN

Eliminating Data Races in Firefox – A Technical Report

https://hacks.mozilla.org/2021/04/eliminating-data-races-in-firefox-a-technical-report/ [hacks.mozilla.org]

2021-04-07 00:02

tags: compiler concurrency cxx development programming update

We successfully deployed ThreadSanitizer in the Firefox project to eliminate data races in our remaining C/C++ components. In the process, we found several impactful bugs and can safely say that data races are often underestimated in terms of their impact on program correctness. We recommend that all multithreaded C/C++ projects adopt the ThreadSanitizer tool to enhance code quality.

source: HN