0+0 > 0: C++ thread-local storage performance
https://yosefk.com/blog/cxx-thread-local-storage-performance.html [yosefk.com]
2025-02-17 21:29
tags:
compiler
concurrency
cxx
library
perf
programming
We’ll discuss how to make sure that your access to TLS (thread-local storage) is fast. If you’re interested strictly in TLS performance guidelines and don’t care about the details, skip right to the end — but be aware that you’ll be missing out on assembly listings of profound emotional depth, which can shake even a cynical, battle-hardened programmer. If you don’t want to miss out on that — and who would?! — read on, and you shall learn the computer-scientific insight behind the intriguing inequality 0+0 > 0.
source: HN
Can atproto scale down?
https://bsky.bad-example.com/can-atproto-scale-down/ [bsky.bad-example.com]
2025-02-17 21:10
tags:
networking
perf
programming
social
storage
And skipping right to the end, my answer to “can it scale down” is just: “yes!”. Here’s my Raspberry Pi 4b, at home, consuming a few watts and pulling around 20GB of simplified firehose events per day. It’s an AppView indexing all cross-repo references (backlinks) in the AT-mosphere, often up to 1,500 created per second. It’s closing in on one billion backlinks, eating up an old SATA SSD connected over a salvaged USB adapter.
source: L
"A calculator app? Anyone could make that."
https://chadnauseam.com/coding/random/calculator-app [chadnauseam.com]
2025-02-17 21:02
tags:
android
compsci
math
programming
ux
A calculator should show you the result of the mathematical expression you entered. That’s much, much harder than it sounds.
source: HN
How do modern compilers choose which variables to put in registers?
https://langdev.stackexchange.com/questions/4325/how-do-modern-compilers-choose-which-variables-to-put-in-registers [langdev.stackexchange.com]
2025-02-17 20:59
tags:
compiler
cpu
programming
This is a very broad subject. The problem of deciding how to map a program with arbitrarily many variables onto a fixed set of registers is known as register allocation, and it has been the subject of much research, study, and engineering effort since the very earliest compilers. One of the canonical approaches, graph coloring, was first proposed in 1981. Countless other approaches and variants have been explored since then, and I cannot hope to cover the full breadth of the topic in a single answer.
source: HN
Get in loser. We're rewinding the stack.
https://andrews.substack.com/p/get-in-loser-were-rewinding-the-stack [andrews.substack.com]
2025-02-17 20:57
tags:
perl
programming
series
wasm
In my last post, I expressed frustration at how the lack of exnref support in most WebAssembly runtimes made zeroperl effectively unusable. However, complaining alone doesn’t solve problems—if something is broken, fix it. Don’t accept the status quo or let it derail your goals.
Using libsetjmp from the WASI SDK for setjmp/longjmp breaks compatibility across WebAssembly runtimes, so I decided to implement it myself. Binaryen has an Asyncify feature, which provides more than enough functionality to build a setjmp implementation from scratch.
Part of a series on getting perl running in wasm.
https://andrews.substack.com/p/zeroperl-sandboxed-perl-with-webassembly
source: HN
The Art of Dithering and Retro Shading for the Web
https://blog.maximeheckel.com/posts/the-art-of-dithering-and-retro-shading-web/ [blog.maximeheckel.com]
2025-02-03 19:47
tags:
gl
graphics
interactive
programming
visualization
web
I spent the past few months building my personal website from the ground up, finally taking the time to incorporate some 3D work to showcase my shader and WebGL skills. Throughout this work, I got to truly understand the crucial role that post-processing plays in making a scene actually look good, which brought some resolutions to long-term frustrations I had with my past React Three Fiber and shader projects where my vision wouldn’t materialize regardless of the amount of work and care I was putting into them.
Taking the time to build, combine, and experiment with custom post-processing effects gave me an additional creative outlet, and among the many types I got to iterate on, I always had a particular affection for the several “retro” effects I came up with. With subtle details such as dithering, color quantization, or pixelization/CRT RGB cells, they bring a pleasant contrast between the modern web landscape and a long-gone era of technology we 90s/early 2000s kids are sometime longing for.
source: HN
JavaScript Temporal is coming
https://developer.mozilla.org/en-US/blog/javascript-temporal-is-coming/ [developer.mozilla.org]
2025-01-30 20:14
tags:
browser
javascript
library
programming
update
web
Implementations of the new JavaScript Temporal object are starting to be shipped in experimental releases of browsers. This is big news for web developers because working with dates and times in JavaScript will be hugely simplified and modernized.
source: HN
Bilinear down/upsampling, aligning pixel grids, and that infamous GPU half pixel offset
https://bartwronski.com/2021/02/15/bilinear-down-upsampling-pixel-grids-and-that-half-pixel-offset/ [bartwronski.com]
2025-01-27 23:28
tags:
graphics
programming
So I figured it’s an opportunity for another short blog post – on bilinear filtering, but in context of down/upsampling. We will touch here on GPU half pixel offsets, aligning pixel grids, a bug / confusion in Tensorflow, deeper signal processing analysis of what’s going on during bilinear operations, and analysis of the magic of the famous “magic kernel”.
source: HN
Go 1.24 interactive tour
https://antonz.org/go-1-24/ [antonz.org]
2025-01-15 21:07
tags:
garbage-collection
go
programming
update
Go 1.24 is scheduled for release in February, so it’s a good time to explore what’s new. The official release notes are pretty dry, so I prepared an interactive version with lots of examples showing what has changed and what the new behavior is.
source: L
Why is my CPU usage always 100%?
https://www.downtowndougbrown.com/2024/04/why-is-my-cpu-usage-always-100-upgrading-my-chumby-8-kernel-part-9/ [www.downtowndougbrown.com]
2025-01-13 22:14
tags:
bugfix
c
investigation
linux
programming
systems
That’s really weird! Why would top be using all of my CPU? It says 100% usr in the second line. Sometimes the usage showed up as 50% usr and 50% sys. Other times it would show up as 100% sys. And very rarely, it would show 100% idle. In that rare case, top would actually show up with 0% usage as I would expect. The 2.6.28 kernel did not have this problem, so it was something different about my newer kernel.
source: HN
WorstFit: Unveiling Hidden Transformers in Windows ANSI!
https://blog.orange.tw/posts/2025-01-worstfit-unveiling-hidden-transformers-in-windows-ansi/ [blog.orange.tw]
2025-01-10 14:54
tags:
exploit
programming
security
text
turtles
windows
The research unveils a new attack surface in Windows by exploiting Best-Fit, an internal charset conversion feature. Through our work, we successfully transformed this feature into several practical attacks, including Path Traversal, Argument Injection, and even RCE, affecting numerous well-known applications!
source: HN
How to triangulate a polyline with thickness
https://jvernay.fr/en/blog/polyline-triangulation/ [jvernay.fr]
2025-01-05 22:33
tags:
c
gl
graphics
interactive
programming
visualization
To render any geometric figure to a GPU (with OpenGL / Direct3D / Vulkan / ...), they must first be triangulated, i.e. decomposed as a series of triangles. Some figures are trivial to transform into triangles: for instance, a segment with thickness is represented by a rectangle, which can be rendered with two triangles. But a segment strip with thickness (aka. polyline) is not trivial.
Ultimately, this exploration has been a rabbit hole, also partly due to some digressions along the path — let’s prototype with a bare implementation of GeoGebra in vanilla JavaScript — let’s do a WebGL + WASM demo to verify the algorithm works correctly ... 😅 At least, it gives some fancy interactive visuals for this blog post. 😁
source: HN
Don't clobber the frame pointer
https://nsrip.com/posts/clobberfp.html [nsrip.com]
2025-01-05 09:34
tags:
bugfix
compiler
cpu
go
programming
Recently I diagnosed and fixed two frame pointer unwinding crashes in Go. The root causes were two flavors of the same problem: buggy assembly code clobbered a frame pointer. By “clobbered” I mean wrote over the value without saving & restoring it. One bug clobbered the frame pointer register. The other bug clobbered a frame pointer saved on the stack. This post explains the bugs, talks a bit about ABIs and calling conventions, and makes some recommendations for how to avoid the bugs.
source: L
Way too many ways to wait on a child process with a timeout
https://gaultier.github.io/blog/way_too_many_ways_to_wait_for_a_child_process_with_a_timeout.html [gaultier.github.io]
2025-01-04 18:00
tags:
best
c
concurrency
programming
systems
unix
So let’s implement our own that does both! As we’ll see, it’s much less straightforward, and thus more interesting, than I thought. It’s a whirlwind tour through Unix deeps. If you’re interested in systems programming, Operating Systems, multiplexed I/O, data races, weird historical APIs, and all the ways you can shoot yourself in the foot with just a few system calls, you’re in the right place!
Very good.
source: trivium
B-Trees: More Than I Thought I'd Want to Know
https://benjamincongdon.me/blog/2021/08/17/B-Trees-More-Than-I-Thought-Id-Want-to-Know/ [benjamincongdon.me]
2025-01-04 11:26
tags:
compsci
database
programming
storage
systems
In my college Data Structures and Algorithms course, we covered B-Trees, but I didn’t grok why I’d choose to use one. As presented, B-Trees were essentially “better” Binary Search Trees, with some hand-waving done that they had improved performance when used in database applications. I remember needing to memorize a bunch of equations to determine the carrying capacity of a M-degree B-Tree, and a vague understanding of B-Tree lookup/insertion/deletion, but not much else. Which is a shame! They’re interesting structures.
source: HN
5 ways to draw an outline
https://ameye.dev/notes/rendering-outlines/ [ameye.dev]
2025-01-04 11:09
tags:
gl
graphics
programming
Rendering outlines is a technique that is often used in games either for aesthetic reasons or for supporting gameplay by using it for highlights and selections around an object. For example in the game Sable, outlines are used to create a comic-book-like style. In The Last of Us, outlines are used to highlight enemies when the player goes into stealth mode.
source: HN
The Alder Lake SHLX anomaly
https://tavianator.com/2025/shlx.html [tavianator.com]
2025-01-03 09:54
tags:
benchmark
cpu
perf
programming
It seems like SHLX performs differently depending on how the shift count register is initialized. If you use a 64-bit instruction with an immediate, performance is slow. This is also true for instructions like INC (which is similar to ADD with a 1 immediate). On the other hand, 32-bit instructions, and 64-bit instructions without immediates (even no-op ones), make it fast. All of these ways to initialize RCX lead to 1-cycle latency:
source: L
How I helped fix sleep-wake hangs on Linux with AMD GPUs
https://nyanpasu64.gitlab.io/blog/amdgpu-sleep-wake-hang/ [nyanpasu64.gitlab.io]
2025-01-03 09:52
tags:
bugfix
investigation
linux
malloc
programming
systems
Through some digging, I found that when a desktop enters S3 sleep, the system cuts power to PCIe GPUs, causing their VRAM chips to lose data. To preserve this data, GPU drivers copy VRAM in use to system RAM before the system sleeps, then restore it after the system wakes. However the Linux amdgpu driver has a bug where, if there is not enough free RAM to store all VRAM in use, the system will run out of memory and crash, instead of moving RAM to disk-based swap.
source: L
A Tour of WebAuthn
https://www.imperialviolet.org/tourofwebauthn/tourofwebauthn.html [www.imperialviolet.org]
2025-01-03 08:29
tags:
auth
opsec
programming
security
web
Don’t blindly prefer emplace_back to push_back
https://quuxplusone.github.io/blog/2021/03/03/push-back-emplace-back/ [quuxplusone.github.io]
2025-01-03 08:11
tags:
cxx
programming
I recommend sticking with push_back for day-to-day use. You should definitely use emplace_back when you need its particular set of skills — for example, emplace_back is your only option when dealing with a deque<mutex> or other non-movable type — but push_back is the appropriate default.