phkmalloc
https://phk.freebsd.dk/sagas/phkmalloc/ [phk.freebsd.dk]
2025-06-17 21:08
tags:
c
development
malloc
programming
systems
Jason Evans laid jemalloc to rest yesterday, and gave a kind shoutout to my malloc, aka. “phkmalloc”, and it occured to me, that I should write that story down.
source: L
jemalloc Postmortem
https://jasone.github.io/2025/06/12/jemalloc-postmortem/ [jasone.github.io]
2025-06-17 21:07
tags:
c
development
malloc
programming
systems
The jemalloc memory allocator was first conceived in early 2004, and has been in public use for about 20 years now. Thanks to the nature of open source software licensing, jemalloc will remain publicly available indefinitely. But active upstream development has come to an end. This post briefly describes jemalloc’s development phases, each with some success/failure highlights, followed by some retrospective commentary.
source: HN
Pure vs. impure iterators in Go
https://jub0bs.com/posts/2025-05-29-pure-vs-impure-iterators-in-go/ [jub0bs.com]
2025-06-01 01:34
tags:
go
programming
Because iterators are so powerful, they’re likely to mushroom in libraries even beyond Go’s standard library. Therefore, to forestall any confusion in the discourse about iterators, the terminology surrounding them should be as precise as possible.
This passage of the documentation seemingly divides iterators into two categories. I’ll attempt to elucidate them through a couple of examples.
source: HN
parking_lot: ffffffffffffffff...
https://fly.io/blog/parking-lot-ffffffffffffffff/ [fly.io]
2025-05-31 02:27
tags:
bugfix
concurrency
programming
rust
You’re reading a 3,000 word blog post about a single concurrency bug, so my guess is you’re the kind of person who compulsively wants to understand how everything works. That’s fine, but a word of advice: there are things where, if you find yourself learning about them in detail, something has gone wrong.
source: L
The radix 2^51 trick
https://www.chosenplaintext.ca/articles/radix-2-51-trick.html [www.chosenplaintext.ca]
2025-05-31 00:51
tags:
cpu
math
perf
programming
The obvious solution would be to break up each 256-bit number into four 64-bit pieces (commonly referred to as “limbs”).
The first reason is that adc is just slower to execute than a normal add on most popular x86 CPUs. Since adc has a third input (the carry flag), it’s a more complex instruction than add. It’s also used less often than add, so there is less incentive for CPU designers to spend chip area on optimizing adc performance.
The key insight here is that we can use this technique to delay carry propagation until the end. We can’t avoid carry propagation altogether, but we can avoid it temporarily. If we save up the carries that occur during the intermediate additions, we can propagate them all in one go at the end.
source: L
UCSD Pascal In Depth
https://markbessey.blog/2025/04/29/ucsd-pascal-in-depth/ [markbessey.blog]
2025-05-28 05:09
tags:
pascal
programming
retro
series
systems
text
The p-System comes with an editor. It’s a full-screen editor, with some fairly advanced features for the time, like auto-indent, bookmarks, and cut and paste. It’s modal, which is hardly surprising, considering that modal editors were the latest usability improvement of the age, compared to the line-oriented editors of the previous decade.
Also: https://markbessey.blog/2025/04/30/ucsd-pascal-in-depth-2/
Some features of the p-System were really ahead of their time. And then, there is the filesystem. Whenever you set out to create any software, but especially an operating system, which you intend to be aggressively cross-platform, you inevitably run into conflicts between being sophisticated, and hitting the lowest common denominator.
Also: https://markbessey.blog/2025/05/08/ucsd-pascal-in-depth-3-n/
But the 1970s were a very different time. So let’s talk about the text file format for the USCD p-System. This is not just something that applies to the text editor, incidentally. If you declare a file as “text” type in Pascal, it gets the same formatting applied. The formatting is transparently stripped from the file if you send it to the PRINTER: or CONSOLE: device.
Overview: https://markbessey.blog/ucsd-p-system-info/
Also: https://github.com/mbessey/p-system-tools
source: trivium
Making the rav1d Video Decoder 1% Faster
https://ohadravid.github.io/posts/2025-05-rav1d-faster/ [ohadravid.github.io]
2025-05-25 00:24
tags:
c
compiler
perf
programming
rust
rav1d is a port of dav1d, created by (1) running c2rust on dav1d, (2) incorporating dav1d’s asm-optimized functions, and (3) changing the code to be more Rust-y and safer.
Video decoders are notoriously complex pieces of software, but because we are comparing the performance of two similar deterministic binaries we might be able to avoid a lot of that complexity - with the right tooling.
source: HN
Go Scheduler
https://nghiant3223.github.io/2025/04/15/go-scheduler.html [nghiant3223.github.io]
2025-05-21 22:40
tags:
article
concurrency
go
programming
systems
Understanding the Go scheduler is crucial for Go programmer to write efficient concurrent programs. It also helps us become better at troubleshooting performance issues or tuning the performance of our Go programs. In this post, we will explore how Go scheduler evolved over time, and how the Go code we write happens under the hood.
source: HN
Build your own ResponseWriter: safer HTTP in Go
https://anto.pt/articles/go-http-responsewriter [anto.pt]
2025-05-09 19:14
tags:
go
programming
web
Go’s http.ResponseWriter writes directly to the socket, which can lead to subtle bugs like forgetting to set a status code or accidentally modifying headers too late.
source: L
Beating the Fastest Lexer Generator in Rust
https://alic.dev/blog/fast-lexing [alic.dev]
2025-05-09 19:07
tags:
compiler
perf
programming
rust
text
I was aware of the efficiency of state machine driven lexers, but most generators have one problem: they can’t be arbitrarily generic and consistently optimal at the same time. There will always be some assumptions about your data that are either impossible to express, or outside the scope of the generator’s optimizations. Either way, I was curious to find out how my hand-rolled implementation would fare.
source: L
Write the most clever code you possibly can
https://buttondown.com/hillelwayne/archive/write-the-most-clever-code-you-possibly-can/ [buttondown.com]
2025-05-09 18:55
tags:
development
essay
ideas
programming
How do we make something utterly mundane? By using it and working at the boundaries of our skills. Almost everything I’m “good at” comes from banging my head against it more than is healthy. That suggests a really good reason to write clever code: it’s an excellent form of purposeful practice. Writing clever code forces us to code outside of our comfort zone, developing our skills as software engineers.
source: L
Cheating the Reaper in Go
https://mcyoung.xyz/2025/04/21/go-arenas/ [mcyoung.xyz]
2025-04-21 23:49
tags:
garbage-collection
go
malloc
programming
These things mean that despite Go having a GC, it’s possible to do manual memory management in pure Go and in cooperation with the GC (although without any help from the runtime package). To demonstrate this, we will be building an untyped, garbage-collected arena abstraction in Go which relies on several GC implementation details.
source: HN
Marching Events: What does iCalendar have to do with ray marching?
https://pwy.io/posts/marching-events/ [pwy.io]
2025-04-18 05:31
tags:
format
programming
rust
I’ve found a way of describing occurrences through distance functions. This means that instead of implementing logic for all combinations of frequencies and parameters - as that spooky table from before suggests one might do - we can simply compose a couple of distance functions together.
source: HN
I want a good parallel computer
https://raphlinus.github.io/gpu/2025/03/21/good-parallel-computer.html [raphlinus.github.io]
2025-03-22 17:56
tags:
concurrency
cpu
graphics
hardware
programming
I believe a simpler, more powerful parallel computer is possible, and that there are signs in the historical record. In a slightly alternate universe, we would have those computers now, and be doing the work of designing algorithms and writing programs to run well on them, for a very broad range of tasks.
source: L
The Defer Technical Specification: It Is Time
https://thephd.dev/c2y-the-defer-technical-specification-its-time-go-go-go [thephd.dev]
2025-03-19 22:48
tags:
c
compiler
programming
standard
Time for me to write this blog post and prepare everyone for the implementation blitz that needs to happen to make defer a success for the C programming language.
source: HN
Robust Wavefront OBJ model parsing in C
https://nullprogram.com/blog/2025/03/02/ [nullprogram.com]
2025-03-15 19:25
tags:
c
graphics
programming
Wavefront OBJ is a line-oriented, text format for 3D geometry. It’s widely supported by modeling software, easy to parse, and trivial to emit, much like Netpbm for 2D image data. Poke around hobby 3D graphics projects and you’re likely to find a bespoke OBJ parser. While typically only loading their own model data, so robustness doesn’t much matter, they usually have hard limitations and don’t stand up to fuzz testing. This article presents a robust, partial OBJ parser in C with no hard-coded limitations, written from scratch. Like similar articles, it’s not really about OBJ but demonstrating some techniques you’ve probably never seen before.
Quicksort with Jenkins for Fun and No Profit
https://susam.net/jenkins-quicksort.html [susam.net]
2025-03-14 22:48
tags:
programming
sorting
swtools
turtles
Jenkins supports pipeline scripts written in Groovy as a first-class entity. A pipeline script effectively defines the build job. It can define build properties, build stages, build steps, etc. It can even invoke other build jobs, including itself.
Wait a minute! If a pipeline can invoke itself, can we, perhaps, solve a recursive problem with it? Absolutely! This is precisely what we are going to do in this post. We are going to implement quicksort as a Jenkins pipeline for fun and not a whit of profit!
source: trivium
Constant-Time Code: The Pessimist Case
https://eprint.iacr.org/2025/435 [eprint.iacr.org]
2025-03-08 06:09
tags:
compiler
cpu
crypto
paper
pdf
perf
programming
turtles
This note discusses the problem of writing cryptographic implementations in software, free of timing-based side-channels, and many ways in which that endeavour can fail in practice. It is a pessimist view: it highlights why such failures are expected to become more common, and how constant-time coding is, or will soon become, infeasible in all generality.
From compiler optimizations to CPU pipelines and register renaming.
Zen and the Art of Microcode Hacking
https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking [bughunters.google.com]
2025-03-08 06:03
tags:
bios
cpu
exploit
hash
programming
security
systems
In this post, we first discuss the background of what microcode is, why microcode patches exist, why the integrity of microcode is important for security, and how AMD attempts to prevent tampering with microcode. Next, we focus on the microcode patch signature validation process and explain in detail the vulnerability present (using CMAC as a hash function). Finally, we discuss how to use some of the tools we’ve released today which can help researchers reproduce and expand on our work (skip to the Zentool section of this blogpost for a “how to” on writing your own microcode).
source: HN
0+0 > 0: C++ thread-local storage performance
https://yosefk.com/blog/cxx-thread-local-storage-performance.html [yosefk.com]
2025-02-17 21:29
tags:
compiler
concurrency
cxx
library
perf
programming
We’ll discuss how to make sure that your access to TLS (thread-local storage) is fast. If you’re interested strictly in TLS performance guidelines and don’t care about the details, skip right to the end — but be aware that you’ll be missing out on assembly listings of profound emotional depth, which can shake even a cynical, battle-hardened programmer. If you don’t want to miss out on that — and who would?! — read on, and you shall learn the computer-scientific insight behind the intriguing inequality 0+0 > 0.
source: HN