The case of the application that used thread local storage it never allocated
https://devblogs.microsoft.com/oldnewthing/20221128-00/?p=107456 [devblogs.microsoft.com]
2024-03-15 22:42
tags:
bugfix
concurrency
development
malloc
programming
windows
Upon closer inspection, the real problem was not that the application’s TLS was being corrupted. The problem was that the application was using TLS slots it never allocated, so it was inadvertently using somebody else’s TLS slots as its own. And of course, when the true owner updated the TLS value, the application interpreted that as corruption.
Smashing the state machine: the true potential of web race conditions
https://portswigger.net/research/smashing-the-state-machine [portswigger.net]
2023-08-10 16:24
tags:
concurrency
exploit
networking
security
web
HTTP request processing isn’t atomic - any endpoint might be sending an application through invisible sub-states. This means that with race conditions, everything is multi-step. The single-packet attack solves network jitter, making it as though every attack is on a local system. This exposes vulnerabilities that were previously near-impossible to detect or exploit.
source: L
Breaking java.lang.String
https://wouter.coekaerts.be/2023/breaking-string [wouter.coekaerts.be]
2023-07-11 23:58
tags:
concurrency
java
programming
Let’s abuse a bug in java.lang.String to make some weird Strings. We’ll make “hello world” not start with “hello”, and show that not all empty Strings are equal to each other.
source: HN
Paving the Road to Vulkan on Asahi Linux
https://asahilinux.org/2023/03/road-to-vulkan/ [asahilinux.org]
2023-03-20 18:25
tags:
concurrency
gl
graphics
linux
programming
systems
In every modern OS, GPU drivers are split into two parts: a userspace part, and a kernel part. The kernel part is in charge of managing GPU resources and how they are shared between apps, and the userspace part is in charge of converting commands from a graphics API (such as OpenGL or Vulkan) into the hardware commands that the GPU needs to execute.
Between those two parts, there is something called the Userspace API or “UAPI”. This is the interface that they use to communicate between them, and it is specific to each class of GPUs! Since the exact split between userspace and the kernel can vary depending on how each GPU is designed, and since different GPU designs require different bits of data and parameters to be passed between userspace and the kernel, each new GPU driver requires its own UAPI to go along with it.
source: HN
The futex_waitv() syscall and gaming on Linux
https://www.collabora.com/news-and-blog/blog/2023/02/17/the-futex-waitv-syscall-gaming-on-linux/ [www.collabora.com]
2023-02-17 23:48
tags:
concurrency
gaming
linux
perf
programming
systems
The futex_waitv syscall is a new syscall through which the process can wait for multiple futexes. The task wakes up when any futex in the list is awakened. This can be used to implement wait on multiple locks and wait lists, etc, without the limitations imposed by using eventfd.
source: L
How fast are Linux pipes anyway?
https://mazzo.li/posts/fast-pipes.html [mazzo.li]
2022-06-02 22:56
tags:
concurrency
linux
malloc
perf
programming
systems
In this post, we will explore how Unix pipes are implemented in Linux by iteratively optimizing a test program that writes and reads data through a pipe.
We will proceed as follows:
A first slow version of our pipe test bench;
How pipes are implemented internally, and why writing and reading from them is slow;
How the vmsplice and splice syscalls let us get around some (but not all!) of the slowness;
A description of Linux paging, leading up to a faster version using huge pages;
The final optimization, replacing polling with busy looping;
Some closing thoughts.
source: L
What's new in CPUs since the 80s?
https://danluu.com/new-cpu-features/ [danluu.com]
2022-04-19 17:10
tags:
article
concurrency
cpu
perf
programming
systems
Everything below refers to x86 and linux, unless otherwise indicated. History has a tendency to repeat itself, and a lot of things that were new to x86 were old hat to supercomputing, mainframe, and workstation folks.
x86 chips have picked up a lot of new features and whiz-bang gadgets.
Overall, a pretty good introduction to modern CPUs, performance, and concurrency.
Why Rust mutexes look like they do
http://cliffle.com/blog/rust-mutexes/ [cliffle.com]
2022-04-02 05:25
tags:
concurrency
programming
rust
In the rest of this post I’ll walk through a typical C mutex API, compare with a typical Rust mutex API, and look at what happens if we change the Rust API to resemble C in various ways.
source: HN
Curious lack of sprintf scaling
https://aras-p.info/blog/2022/02/25/Curious-lack-of-sprintf-scaling/ [aras-p.info]
2022-02-25 22:08
tags:
c
concurrency
investigation
mac
perf
programming
Some days ago I noticed that on a Mac, doing snprintf calls from multiple threads shows curious lack of scaling (see tweet). Replacing snprintf with {fmt} library can speed up the OBJ exporter in Blender 3.2 by 3-4 times. This could have been the end of the story, filed under a “eh, sprintf is bad!” drawer, but I started to wonder why it shows this lack of scaling.
source: HN
Beyond the Remake of 'Shadow of the Colossus': A Technical Perspective
https://www.youtube.com/watch?v=fcBZEZWGYek [www.youtube.com]
2022-02-23 06:20
tags:
concurrency
development
gaming
malloc
programming
video
Intro to porting games between platforms, then also a deep walkthrough of a custom allocator libary.
Eliminating Data Races in Firefox – A Technical Report
https://hacks.mozilla.org/2021/04/eliminating-data-races-in-firefox-a-technical-report/ [hacks.mozilla.org]
2021-04-07 00:02
tags:
compiler
concurrency
cxx
development
programming
update
We successfully deployed ThreadSanitizer in the Firefox project to eliminate data races in our remaining C/C++ components. In the process, we found several impactful bugs and can safely say that data races are often underestimated in terms of their impact on program correctness. We recommend that all multithreaded C/C++ projects adopt the ThreadSanitizer tool to enhance code quality.
source: HN
ARM and Lock-Free Programming
https://randomascii.wordpress.com/2020/11/29/arm-and-lock-free-programming/ [randomascii.wordpress.com]
2020-12-11 04:33
tags:
concurrency
cxx
programming
systems
This is intended to be a casual introduction to the perils of lock-free programming (which I last wrote about some fifteen years ago), but also some explanation of why ARM’s weak memory model breaks some code, and why that code was probably broken already. I also want to explain why C++11 made the lock-free situation strictly better (objections to the contrary notwithstanding).
What went wrong with the libdispatch. A tale of caution for the future of concurrency.
https://tclementdev.com/posts/what_went_wrong_with_the_libdispatch.html [tclementdev.com]
2020-11-25 01:48
tags:
concurrency
development
library
mac
programming
The future was multithreading and we had to use the libdispatch to get there. So we did.
As we went down that rabbit hole, things got progressively worse.
source: L
Windows Timer Resolution: The Great Rule Change
https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/ [randomascii.wordpress.com]
2020-10-11 22:01
tags:
concurrency
systems
update
windows
The behavior of the Windows scheduler changed significantly in Windows 10 2004, in a way that will break a few applications, and there appears to have been no announcement, and the documentation has not been updated. This isn’t the first time this has happened, but this change seems bigger than last time.
The short version is that calls to timeBeginPeriod from one process now affect other processes less than they used to, but there is still an effect.
The Watchdog Hydra
https://thedailywtf.com/articles/the-watchdog-hydra [thedailywtf.com]
2020-09-29 01:23
tags:
auth
bugfix
concurrency
Ammar checked, and sure enough, his code was sending hundreds of thousands of requests per second. It didn’t take him long to figure out why: requests from the watchdog were failing with a 500 error, so it called the login method. The login method had been succeeding, so another watchdog got scheduled. Thirty seconds later, that failed, as did all the previously scheduled watchdogs, which all called login again. Which, on success, scheduled a fresh round of watchdogs. Every thirty seconds, the number of scheduled calls doubled. Before long, Ammar’s code was DoSing the API.
Fun times.
xi-editor retrospective
https://raphlinus.github.io/xi/2020/06/27/xi-retrospective.html [raphlinus.github.io]
2020-07-01 00:55
tags:
compsci
concurrency
development
programming
rust
swtools
text
I still believe it would be possible to build a high quality editor based on the original design. But I also believe that this would be quite a complex system, and require significantly more work than necessary.
A few good ideas and observations could be mined out of this post.
source: L
Memory Ordering in Modern Microprocessors
https://www.linuxjournal.com/article/8211 [www.linuxjournal.com]
2020-06-18 00:17
tags:
concurrency
cpu
programming
systems
Latency in Asynchronous Python
https://nullprogram.com/blog/2020/05/24/ [nullprogram.com]
2020-05-26 22:00
tags:
concurrency
programming
python
This week I was debugging a misbehaving Python program that makes significant use of Python’s asyncio. The program would eventually take very long periods of time to respond to network requests. My first suspicion was a CPU-heavy coroutine hogging the thread, preventing the socket coroutines from running, but an inspection with pdb showed this wasn’t the case. Instead, the program’s author had made a couple of fundamental mistakes using asyncio. Let’s discuss them using small examples.
When Parallel: Pull, Don't Push
https://nullprogram.com/blog/2020/04/30/ [nullprogram.com]
2020-05-01 04:54
tags:
concurrency
programming
I’ve noticed a small pattern across a few of my projects where I had vectorized and parallelized some code. The original algorithm had a “push” approach, the optimized version instead took a “pull” approach. In this article I’ll describe what I mean, though it’s mostly just so I can show off some pretty videos, pictures, and demos.
What Outranks Thread Priority?
https://randomascii.wordpress.com/2020/04/14/what-outranks-thread-priority/ [randomascii.wordpress.com]
2020-04-15 11:45
tags:
concurrency
investigation
perf
systems
turtles
ux
windows
This investigation started, as so many of mine do, with me minding my own business, not looking for trouble. In this case all I was doing was opening my laptop lid and trying to log on. The first few times that this resulted in a twenty-second delay I ignored the problem, hoping that it would go away. The next few times I thought about investigating, but performance problems that occur before you have even logged on are trickier to solve, and I was feeling lazy. When I noticed that I was avoiding closing my laptop because I dreaded the all-too-frequent delays when opening it I realized it was time to get serious.
A lot of effort for a rather unsatisfactory conclusion, but I won’t spoil the surprise.