phkmalloc
https://phk.freebsd.dk/sagas/phkmalloc/ [phk.freebsd.dk]
2025-06-17 21:08
tags:
c
development
malloc
programming
systems
Jason Evans laid jemalloc to rest yesterday, and gave a kind shoutout to my malloc, aka. “phkmalloc”, and it occured to me, that I should write that story down.
source: L
parking_lot: ffffffffffffffff...
https://fly.io/blog/parking-lot-ffffffffffffffff/ [fly.io]
2025-05-31 02:27
tags:
bugfix
concurrency
programming
rust
You’re reading a 3,000 word blog post about a single concurrency bug, so my guess is you’re the kind of person who compulsively wants to understand how everything works. That’s fine, but a word of advice: there are things where, if you find yourself learning about them in detail, something has gone wrong.
source: L
Bootstrapping HTTP/1.1, HTTP/2, and HTTP/3
https://www.netmeister.org/blog/http-123.html [www.netmeister.org]
2025-05-31 01:09
tags:
browser
networking
standard
web
HTTP/1.1 (RFC2616 and onwards) remains the lowest common denominator that clients and servers need to support, and of course modern stacks will want to use HTTP/2 (RFC9113) and HTTP/3 (RFC9114), but just how do they determine each others’ capabilities and bootstrap their connection?
source: L
The radix 2^51 trick
https://www.chosenplaintext.ca/articles/radix-2-51-trick.html [www.chosenplaintext.ca]
2025-05-31 00:51
tags:
cpu
math
perf
programming
The obvious solution would be to break up each 256-bit number into four 64-bit pieces (commonly referred to as “limbs”).
The first reason is that adc is just slower to execute than a normal add on most popular x86 CPUs. Since adc has a third input (the carry flag), it’s a more complex instruction than add. It’s also used less often than add, so there is less incentive for CPU designers to spend chip area on optimizing adc performance.
The key insight here is that we can use this technique to delay carry propagation until the end. We can’t avoid carry propagation altogether, but we can avoid it temporarily. If we save up the carries that occur during the intermediate additions, we can propagate them all in one go at the end.
source: L
Iconography of the X Window System: The Boot Stipple
https://matttproud.com/blog/posts/x-window-system-boot-stipple.html [matttproud.com]
2025-05-31 00:45
tags:
graphics
retro
unix
x11
For the uninitiated, what are we looking at? Could it be the Moiré Error from Doom? Well, no. You are looking at (part of) the boot up screen for the X Window System, specifically the pattern it uses as the background of the root window. This pattern is technically called a stipple.
source: L
Remote Prompt Injection in GitLab Duo Leads to Source Code Theft
https://www.legitsecurity.com/blog/remote-prompt-injection-in-gitlab-duo [www.legitsecurity.com]
2025-05-24 23:48
tags:
ai
development
exploit
security
turtles
web
A hidden comment was enough to make GitLab Duo leak private source code and inject untrusted HTML into its responses. GitLab patched the issue, and we’ll walk you through the full attack chain — which demonstrates five vulnerabilities from the 2025 OWASP Top 10 for LLMs.
source: L
Build your own ResponseWriter: safer HTTP in Go
https://anto.pt/articles/go-http-responsewriter [anto.pt]
2025-05-09 19:14
tags:
go
programming
web
Go’s http.ResponseWriter writes directly to the socket, which can lead to subtle bugs like forgetting to set a status code or accidentally modifying headers too late.
source: L
Beating the Fastest Lexer Generator in Rust
https://alic.dev/blog/fast-lexing [alic.dev]
2025-05-09 19:07
tags:
compiler
perf
programming
rust
text
I was aware of the efficiency of state machine driven lexers, but most generators have one problem: they can’t be arbitrarily generic and consistently optimal at the same time. There will always be some assumptions about your data that are either impossible to express, or outside the scope of the generator’s optimizations. Either way, I was curious to find out how my hand-rolled implementation would fare.
source: L
Write the most clever code you possibly can
https://buttondown.com/hillelwayne/archive/write-the-most-clever-code-you-possibly-can/ [buttondown.com]
2025-05-09 18:55
tags:
development
essay
ideas
programming
How do we make something utterly mundane? By using it and working at the boundaries of our skills. Almost everything I’m “good at” comes from banging my head against it more than is healthy. That suggests a really good reason to write clever code: it’s an excellent form of purposeful practice. Writing clever code forces us to code outside of our comfort zone, developing our skills as software engineers.
source: L
runtime: green tea garbage collector
https://github.com/golang/go/issues/73581 [github.com]
2025-05-04 17:19
tags:
beta
garbage-collector
go
The core idea behind the new parallel marking algorithm is simple. Instead of scanning individual objects, the garbage collector scans memory in much larger, contiguous blocks. The shared work queue tracks these coarse blocks instead of individual objects, and the individual objects waiting to be scanned in a block are tracked in that block itself. The core hypothesis is that while a block waits on the queue to be scanned, it will accumulate more objects to be scanned within that block, such that when a block does get dequeued, it’s likely that scanning will be able to scan more than one object in that block. This, in turn, improves locality of memory access, in addition to better amortizing per-scan costs.
source: L
Unsure Calculator
https://filiph.github.io/unsure/ [filiph.github.io]
2025-04-17 20:57
tags:
ideas
math
visualization
The idea is simple: apart from regular numbers (like 4, 3.14 or 43942), you can also input ranges (like 4~6, 3.1~3.2 or 40000~45000). The character between the two extremes of the range is a tilde (~), a little wave symbol. You can find it on most keyboards, but for convenience, I also included it in the keypad above. The range notation says the following to the calculator: I am not sure about the exact number here, but I am 95% sure it’s somewhere in this range.
source: L
I want a good parallel computer
https://raphlinus.github.io/gpu/2025/03/21/good-parallel-computer.html [raphlinus.github.io]
2025-03-22 17:56
tags:
concurrency
cpu
graphics
hardware
programming
I believe a simpler, more powerful parallel computer is possible, and that there are signs in the historical record. In a slightly alternate universe, we would have those computers now, and be doing the work of designing algorithms and writing programs to run well on them, for a very broad range of tasks.
source: L
Kerning, the Hard Way
https://home.octetfont.com/blog/kerning-hard.html [home.octetfont.com]
2025-03-14 20:29
tags:
design
graphics
text
It looks a bit like L and T have been clipped, but in fact they’ve been drawn over. Black parts of L overlap the T, and vice versa: black parts of the T overlap L. The effect is what you can see, where L and T share a space, the black bars overlap and are solid, obliterating the reversed out letterforms. So how do i kern this font, if not with GSPOS lookups?
source: L
Can atproto scale down?
https://bsky.bad-example.com/can-atproto-scale-down/ [bsky.bad-example.com]
2025-02-17 21:10
tags:
networking
perf
programming
social
storage
And skipping right to the end, my answer to “can it scale down” is just: “yes!”. Here’s my Raspberry Pi 4b, at home, consuming a few watts and pulling around 20GB of simplified firehose events per day. It’s an AppView indexing all cross-repo references (backlinks) in the AT-mosphere, often up to 1,500 created per second. It’s closing in on one billion backlinks, eating up an old SATA SSD connected over a salvaged USB adapter.
source: L
The hardest working font in Manhattan
https://aresluna.org/the-hardest-working-font-in-manhattan/ [aresluna.org]
2025-02-17 21:05
tags:
article
design
history
photos
text
urban
In 2007, on my first trip to New York City, I grabbed a brand-new DSLR camera and photographed all the fonts I was supposed to love. I admired American Typewriter in all of the I <3 NYC logos, watched Akzidenz Grotesk and Helvetica fighting over the subway signs, and even caught an occasional appearance of the flawlessly-named Gotham, still a year before it skyrocketed in popularity via Barack Obama’s first campaign.
But there was one font I didn’t even notice, even though it was everywhere around me. Last year in New York, I walked over 100 miles and took thousands of photos of one and one font only. The font’s name is Gorton.
source: L
Go 1.24 interactive tour
https://antonz.org/go-1-24/ [antonz.org]
2025-01-15 21:07
tags:
garbage-collection
go
programming
update
Go 1.24 is scheduled for release in February, so it’s a good time to explore what’s new. The official release notes are pretty dry, so I prepared an interactive version with lots of examples showing what has changed and what the new behavior is.
source: L
Justified Text: Better Than Expected?
https://cloudfour.com/thinks/justified-text-better-than-expected/ [cloudfour.com]
2025-01-15 21:06
tags:
design
html
web
I was pleasantly surprised by the results in Chromium browsers at medium and large container widths. Hyphenation seems conservative and readable, yet there are no unsightly gaps or “rivers” between words. Safari and Firefox hyphenate a bit more frequently, but not distractingly so.
source: L
Don't clobber the frame pointer
https://nsrip.com/posts/clobberfp.html [nsrip.com]
2025-01-05 09:34
tags:
bugfix
compiler
cpu
go
programming
Recently I diagnosed and fixed two frame pointer unwinding crashes in Go. The root causes were two flavors of the same problem: buggy assembly code clobbered a frame pointer. By “clobbered” I mean wrote over the value without saving & restoring it. One bug clobbered the frame pointer register. The other bug clobbered a frame pointer saved on the stack. This post explains the bugs, talks a bit about ABIs and calling conventions, and makes some recommendations for how to avoid the bugs.
source: L
The Alder Lake SHLX anomaly
https://tavianator.com/2025/shlx.html [tavianator.com]
2025-01-03 09:54
tags:
benchmark
cpu
perf
programming
It seems like SHLX performs differently depending on how the shift count register is initialized. If you use a 64-bit instruction with an immediate, performance is slow. This is also true for instructions like INC (which is similar to ADD with a 1 immediate). On the other hand, 32-bit instructions, and 64-bit instructions without immediates (even no-op ones), make it fast. All of these ways to initialize RCX lead to 1-cycle latency:
source: L
How I helped fix sleep-wake hangs on Linux with AMD GPUs
https://nyanpasu64.gitlab.io/blog/amdgpu-sleep-wake-hang/ [nyanpasu64.gitlab.io]
2025-01-03 09:52
tags:
bugfix
investigation
linux
malloc
programming
systems
Through some digging, I found that when a desktop enters S3 sleep, the system cuts power to PCIe GPUs, causing their VRAM chips to lose data. To preserve this data, GPU drivers copy VRAM in use to system RAM before the system sleeps, then restore it after the system wakes. However the Linux amdgpu driver has a bug where, if there is not enough free RAM to store all VRAM in use, the system will run out of memory and crash, instead of moving RAM to disk-based swap.
source: L