Understanding DeepMind's Sorting Algorithm
A few days ago, DeepMind published a blog post talking about a paper they wrote, where they discovered tinier kernels for sorting algorithms. They did this by taking their deep learning wisdom, which they gained by building AlphaGo, and applying it to the discipline of of superoptimization. That piqued my interest, since as a C library author, I’m always looking for opportunities to curate the best stuff. In some ways that’s really the whole purpose of the C library. There are so many functions that we as programmers take for granted, which are the finished product of decades of research, distilled into plain and portable code.
DeepMind earned a fair amount of well-deserved attention for this discovery, but unfortunately they could have done a much better job explaining it.
Speeding up sort performance in Postgres 15
Let’s explore each of the 4 improvements in PostgreSQL 15 that make sort performance go faster:
Change 1: Improvements sorting a single column
Change 2: Reduce memory consumption by using generation memory context
Change 3: Add specialized sort routines for common datatypes
Change 4: Replace polyphase merge algorithm with k-way merge
Changing std::sort at Google’s Scale and Beyond
We are changing std::sort in LLVM’s libcxx. That’s a long story of what it took us to get there and all possible consequences, bugs you might encounter with examples from open source. We provide some benchmarks, perspective, why we did this in the first place and what it cost us with exciting ideas from Hyrum’s Law to reinforcement learning. All changes went into open source and thus I can freely talk about all of them.
This article is split into 3 parts, the first is history with all details of recent (and not so) past of sorting in C++ standard libraries. Second part is about what it takes to switch from one sorting algorithm to another with various bugs. The final one is about the implementation we have chosen with all optimizations we have done.
This Goes to Eleven - Decimating Array.Sort with AVX2
Let’s get in the ring and show what AVX/AVX2 intrinsics can really do for a non-trivial problem, and even discuss potential improvements that future CoreCLR versions could bring to the table.
Everyone needs to sort arrays, once in a while, and many algorithms we take for granted rely on doing so. We think of it as a solved problem and that nothing can be further done about it in 2020, except for waiting for newer, marginally faster machines to pop-up. However, that is not the case, and while I’m not the first to have thoughts about it; or the best at implementing it, if you join me in this rather long journey, we’ll end up with a replacement function for Array.Sort, written in pure C# that outperforms CoreCLR’s C++2 code by a factor north of 10x on most modern Intel CPUs, and north of 11x on my laptop. Sounds interesting? If so, down the rabbit hole we go…
Very well done.
How many ways are there to sort GUIDs? How much time do you have?
A few years ago I heard about an interesting sorting algorithm (invented by the legendary Edsger Dijkstra) called smoothsort with great memory and runtime guarantees. Although it is a comparison sort and thus on average cannot run faster than Ω(n lg n), smoothsort is an adaptive sort, meaning that if the input is somewhat sorted, smoothsort will run in time closer to O(n) than O(n lg n). In the best case, when the input is sorted, smoothsort will run in linear time. Moreover, smoothsort is an in-place sorting algorithm, and requires O(1) auxiliary storage space. Compare this to mergesort, which needs O(n) auxiliary space, or quicksort, which needs O(lg n) space on average and O(n) space in the worst case. In the worst case, smoothsort is an asymptotically optimal O(n lg n) sort. With all these advantages, smoothsort seemed like an excellent sort to code up and add to my archive of interesting code project, and I set out to learn enough about it to build my own implementation.
Unfortunately, I quickly found out that smoothsort is one of the least-documented sorts on the web. Sure, you can find many sites that mention smoothsort or its runtime characteristics, and in a few cases sites that provide an implementation, but hardly any sites explained the full intuition behind how the sort worked or where the runtime guarantees came from. Moreover, Dijkstra’s original paper on smoothsort is extremely difficult to read and gives little intuition behind some of the trickier optimizations. After spending a few days reading over existing sites and doing a bit of my own work, I finally managed to figure out the intuition behind smoothsort, as well as the source of many of the complex optimizations necessary to get smoothsort working in constant space. It turns out that smoothsort is actually a generalization of heapsort using a novel heap data structure. Surprisingly, I haven’t found this structure mentioned anywhere on the web, and this page may be the first time it’s been mentioned online.
On the Worst-Case Complexity of TimSort
TimSort is an intriguing sorting algorithm designed in 2002 for Python, whose worst-case complexity was announced, but not proved until our recent preprint. In fact, there are two slightly different versions of TimSort that are currently implemented in Python and in Java respectively. We propose a pedagogical and insightful proof that the Python version runs in O(n log n). The approach we use in the analysis also applies to the Java version, although not without very involved technical details. As a byproduct of our study, we uncover a bug in the Java implementation that can cause the sorting method to fail during the execution. We also give a proof that Python’s TimSort running time is in O(n + n log rho), where rho is the number of runs (i.e. maximal monotonic sequences), which is quite a natural parameter here and part of the explanation for the good behavior of TimSort on partially sorted inputs.
A sample of brilliance
Jon Bentley’s “Programming Pearls” was a well-loved column in CACM (and also available in book form). Today we’re taking at look at his “Sample of Brilliance” column from 1987, featuring guest contributions from none other then Bob Floyd (whose Turing Award lecture we reviewed yesterday). It’s a chance to see Floyd’s algorithm development skills in action.
No, it is not a compiler error. It is never a compiler error.
But could it be...
A bug in the sort() function. O day and night, but this is wondrous strange!
Render Multimedia in Pure C
All that’s needed are a few functions to render artifacts — lines, shapes, etc. — to an RGB buffer. With a bit of basic sound synthesis, the same concept can be applied to create audio in a separate audio stream — in this case using the simple (but not as simple as Netpbm) WAV format. Put them together and a small, standalone program can create multimedia.
A closer look at the complexity analysis of finding the k’th smallest element in two sorted arrays
Given two sorted arrays of the same length, with unique elements, find the kth smallest element in the combined collection. One solution involves the double binary search. I’ll let you go read the article to see how it works. I’m here to dissect the complexity analysis.
Lowdown Diffing Engine
In this paper, I briefly describe the “diff” engine used in lowdown-diff(1) tool in lowdown.
Lowdown being a markdown parser.
Sorting number strings numerically
So the real title of this post should have been “An order-preserving embedding of the set of integers into the set of lexicographically sorted UTF-8 strings”.
Pattern-defeating quicksort (pdqsort) is a novel sorting algorithm that combines the fast average case of randomized quicksort with the fast worst case of heapsort, while achieving linear time on inputs with certain patterns. pdqsort is an extension and improvement of David Mussers introsort.
The Stack Clash
This is a great writeup, further developing an existing but overlooked exploit technique. Maybe not all the exploits worked, but lots of vulns found, and some countermeasures eluded.
A sorting algorithm one doesn’t see every day.