Thread-safety, torn reads, and the like
> I was on a mail thread today, the topic for which was the meaning—and perhaps lack of comprehensiveness—of the statement: “This type is thread safe.” Similar statements are scattered throughout our product documentation, without any good central explanation of its meaning and any caveats.
Gomium pwn challenge
> By doing the above we create a race where we access the implementation of X with the context of good that means if we “win” then inside the unsafe implementation f from the bad context will be used as a function pointer instead of a pointer to an integer. This will result in calling an arbitrary address (0x1337 in our case) which sounds quite promising.
Exploiting torn reads for fun and profit.
Async-await on stable Rust!
> On this coming Thursday, November 7, async-await syntax hits stable Rust, as part of the 1.39.0 release. This work has been a long time in development -- the key ideas for zero-cost futures, for example, were first proposed by Aaron Turon and Alex Crichton in 2016! -- and we are very proud of the end result. We believe that Async I/O is going to be an increasingly important part of Rust’s story.
Why can’t I create a “Please wait” dialog from a background thread to inform the user that the main UI thread is busy?
> When the dialog box sets the main UI window as its owner, this causes the input queues to become attached, at which point their fates become linked. In particular, the dialog box cannot show itself because doing so requires it to notify the owner window that the owner has lost activation, but that owner window is not responding to messages because it’s off doing the really long operation.
Specific instance of a more general problem. Doing the wrong thing with the wrong thread leads to sadness.
Making the Tokio scheduler 10x faster
> We’ve been hard at work on the next major revision of Tokio, Rust’s asynchronous runtime. Today, a complete rewrite of the scheduler has been submitted as a pull request. The result is huge performance and latency improvements. Some benchmarks saw a 10x speed up! It is always unclear how much these kinds of improvements impact “full stack” use cases, so we’ve also tested how these scheduler improvements impacted use cases like Hyper and Tonic (spoiler: it’s really good).
> In preparation for working on the new scheduler, I spent time searching for resources on scheduler implementations. Besides existing implementations, I did not find much. I also found the source of existing implementations difficult to navigate. To remedy this, I tried to keep Tokio’s new scheduler implementation as clean as possible. I also am writing this detailed article on implementing the scheduler in hope that others in similar positions find it useful.
> The article starts with a high level overview of scheduler design, including work-stealing schedulers. It then gets into the details of specific optimizations made in the new Tokio scheduler.
Async-await hits beta!
> Big news! As of this writing, syntactic support for async-await is available in the Rust beta channel! It will be available in the 1.39 release, which is expected to be released on November 7th, 2019. Once async-await hits stable, that will mark the culmination of a multi-year effort to enable efficient and ergonomic asynchronous I/O in Rust. It will not, however, mark the end of the road: there is still more work to do, both in terms of polish (some of the error messages we get today are, um, not great) and in terms of feature set (async fn in traits, anyone?).
Fighting the Async fragmentation
> This is about some dead ends when trying to fix the problem of Rust’s async networking fragmentation. I haven’t been successful, but I can at least share what I tried and discovered, maybe someone else is having the same bugging feeling so they don’t have to repeat them. Or just maybe some of the approaches would work for some other problems. And because we have a bunch of success stories out there, having some failure stories to balance it doesn’t hurt.
Common Systems Programming Optimizations & Tricks
> Today’s blog post is an overview of some common optimization techniques and neat tricks for doing “systems programming” – whatever that means today. We’ll walk through some methods to make your code run faster, be more efficient, and to squeeze just a little more juice from whatever you got.
Benchmarking Fibers, Threads and Processes
> Awhile back, I set out to look at Fiber performance and how it’s improved in recent Ruby versions. After all, concurrency is one of the three pillars of Ruby 3x3! Also, there have been some major speedups in Ruby’s Fiber class by Samuel Williams.
> It’s not hard to write a microbenchmark for something like Fiber.yield. But it’s harder, and more interesting, to write a benchmark that’s useful and representative.
Announcing composable multi-threaded parallelism in Julia
> Software performance depends more and more on exploiting multiple processor cores. The free lunch from Moore’s Law is still over. Well, we here in the Julia developer community have something of a reputation for caring about performance. In pursuit of it, we have already built a lot of functionality for multi-process, distributed programming and GPUs, but we’ve known for years that we would also need a good story for composable multi-threading. Today we are happy to announce a major new chapter in that story. We are releasing a preview of an entirely new threading interface for Julia programs: general task parallelism, inspired by parallel programming systems like Cilk, Intel Threading Building Blocks (TBB) and Go. Task parallelism is now available in the v1.3.0-alpha release, an early preview of Julia version 1.3.0 likely to be released in a couple months. You can find binaries with this feature on the downloads page, or build the master branch from source.
Ten Years of Erlang
> I wanted to take a bit of time to reflect over most of that decade. In this post, I’ll cover a few things such as hype phases and how this related to Erlang, the ladder of ideas within the language and how that can impact adoption, what changed in my ten years here, and I’ll finish up with what I think Erlang still has to bring to the programming community at large.
Provide protection against starvation of the ll/sc loops when accessing userpace.
> Casueword(9) on ll/sc architectures must be prepared for userspace constantly modifying the same cache line as containing the CAS word, and not loop infinitely. Otherwise, rogue userspace livelock kernel.
The convoy phenomenon
> The duration of a lock is the average number of instructions executed while the lock is held. The execution interval of a lock is the average number of instructions executed between successive requests for that lock by a process. The collision cross section of the lock is the fraction of time it is granted, i.e., the lock grant probability.
> Most of us are stuck with a pre-emptive scheduler (i.e., general purpose operating system with virtual memory). Hence convoys will occur. The problem is to make them evaporate quickly when they do occur rather than have them persist forever.
Trash talk: the Orinoco garbage collector
> Over the past years the V8 garbage collector (GC) has changed a lot. The Orinoco project has taken a sequential, stop-the-world garbage collector and transformed it into a mostly parallel and concurrent collector with incremental fallback.
In-DRAM Bulk Bitwise Execution Engine
> Many applications heavily use bitwise operations on large bitvectors as part of their computation. In existing systems, performing such bulk bitwise operations requires the processor to transfer a large amount of data on the memory channel, thereby consuming high latency, memory bandwidth, and energy. In this paper, we describe Ambit, a recently-proposed mechanism to perform bulk bitwise operations completely inside main memory. Ambit exploits the internal organization and analog operation of DRAM-based memory to achieve low cost, high performance, and low energy. Ambit exposes a new bulk bitwise execution model to the host processor. Evaluations show that Ambit significantly improves the performance of several applications that use bulk bitwise operations, including databases.
If each thread’s TEB is referenced by the fs selector, does that mean that the 80386 is limited to 1024 threads?
> No, it doesn’t, because nobody said that the distinct values had to be different simultaneously.
Understanding real-world concurrency bugs in Go
> We perform the first systematic study on concurrency bugs in real Go programs. We studied six popular Go software [projects] including Docker, Kubernetes, and gRPC. We analyzed 171 concurrency bugs in total, with more than half of them caused by non-traditional, Go-specific problems. Apart from root causes of these bugs, we also studied their fixes, performed experiments to reproduce them, and evaluated them with two publicly-available Go bug detectors.
Improvements in forking, threading, and signal code
> I am improving signaling code in the NetBSD kernel, covering corner cases with regression tests, and improving the documentation. I’ve been working at the level of sytems calls (syscalls): forking, threading, handling these with GDB, and tracing syscalls. Some work happens behind the scenes as I support the work of Michal Gorny on LLDB/ptrace features.
Bad utmp implementations in Glibc and FreeBSD
> I wondered: If the files consist of fixed-sized records, and are readable by regular users, how is consistency maintained? That is – how can a process ensure that, when it updates the database, it doesn’t conflict with another process also attempting to update the database at the same time? Similarly, how can a process reading an entry from the database be sure that it receives a consistent, full record and not a record which has been partially updated? (after all, POSIX allows that a write(2) call can return without having written all the requested bytes, and I’m not aware of Linux or any of the *BSDs documenting that this cannot happen for regular files). Clearly, some kind of locking is needed; a process that wants to write to or read from the database locks it first, performs its operation, and then unlocks the database. Once again, this happens under the hood, in the implementation of the getutent/pututline functions or their equivalents.
Go Concurrency from the Ground Up
> Sometimes the best way to learn something is to build it. This guide will walk you through how to reproduce Go’s concurrency features in another programming language.