The convoy phenomenon
> The duration of a lock is the average number of instructions executed while the lock is held. The execution interval of a lock is the average number of instructions executed between successive requests for that lock by a process. The collision cross section of the lock is the fraction of time it is granted, i.e., the lock grant probability.
> Most of us are stuck with a pre-emptive scheduler (i.e., general purpose operating system with virtual memory). Hence convoys will occur. The problem is to make them evaporate quickly when they do occur rather than have them persist forever.
Software-defined far memory in warehouse scale computers
Understanding real-world concurrency bugs in Go
> We perform the first systematic study on concurrency bugs in real Go programs. We studied six popular Go software [projects] including Docker, Kubernetes, and gRPC. We analyzed 171 concurrency bugs in total, with more than half of them caused by non-traditional, Go-specific problems. Apart from root causes of these bugs, we also studied their fixes, performed experiments to reproduce them, and evaluated them with two publicly-available Go bug detectors.
Slim: OS kernel support for a low-overhead container overlay network
> In theory there are four possible modes for container networking: a bridge mode for containers on the same host; host mode in which containers use the IP address of their host network interface; macvlan mode (or similar hardware mechanisms) to give each container its own IP address; and overlay mode in which each container is given its own own virtual network interface and each application has its own network namespace.
Cloud computing simplified: a Berkeley view on serverless computing
> Ten years ago Berkeley released the ‘Berkeley view of cloud computing’ paper, predicting that cloud use would accelerate. Today’s paper choice is billed as its logical successor: it predicts that the use of serverless computing will accelerate. More precisely:
> … we predict that serverless computing will grow to dominate the future of cloud computing.
Keeping CALM: when distributed consistency is easy
> When it comes to high performing scalable distributed systems, coordination is a killer. It’s the dominant term in the Universal Scalability Law. When we can avoid or reduce the need for coordination things tend to get simpler and faster. See for example Coordination avoidance in database systems, and more recently the amazing performance of Anna which gives a two-orders-of-magnitude speed-up through coordination elimination. So we should avoid coordination whenever we can.
> So far so good, but when exactly can we avoid coordination? Becoming precise in the answer to that question is what the CALM theorem is all about. You’re probably familiar with Brooks’ distinction between essential complexity and accidental complexity in his ‘No silver bullet’ essay. Here we get to tease apart the distinction between essential coordination, a guarantee that cannot be provided without coordinating, and accidental coordination, coordination that could have been avoided with a more careful design.
Large scale GAN training for high fidelity natural image synthesis
> I was drawn to this paper to try and find out what’s behind the stunning rate of progress. The large-scale GANs (can I say LS-GAN?) trained here set a new state-of-the-art in class-conditional image synthesis.
> So what’s the secret to LS-GANs success? Partly of course it’s just a result of scaling up the models – but interestingly by going wide rather than deep. However, GANs were already notoriously difficult to train (‘unstable’), and scaling things up magnifies the training issues too. So the other part of the innovation here is figuring out how to maintain stability at scale. It’s less one big silver bullet, and more a clever aggregation of techniques from the deep learning parts bin. All in all, it has the feel to me of reaching the upper slopes of an ‘s’-curve such that we might need something new to get us onto the next curve. But hey, with the amazing rates of progress we’ve been seeing I could well be wrong about that.
RobinHood: tail latency aware caching – dynamic reallocation from cache-rich to cache-poor
> RobinHood dynamically allocates cache space to those backends responsible for high request tail latency (cache-poor) backends, while stealing space from backends that do not affect the request tail latency (cache-rich backends). In doing so, Robin Hood makes compromises that may seem counter-intuitive (e.g., significantly increasing the tail latencies of certain backends).
Making the fast slow to make the slow fast.
Columnstore and B+ tree – are hybrid physical designs important?
> It is generally understood that columnstores are crucial to achieving high performance for analytic queries and that B+ tree indexes are key to supporting transactional workloads efficiently. However, it is not well understood whether hybrid physical designs – both columnstore and B+ tree indices on the same database and potentially the same table – are important for any of the above workloads.
> Through a series of benchmarks the authors show that hybrid physical designs can result in more than an order of magnitude lower execution costs for many workloads when compared to alternatives using B+ tree-only or columnstore-only. The Database Engine Tuning Advisor (DTA) for SQL Server is extended to analyze and recommend the appropriate indices for a given workload.
Who left open the cookie jar? A comprehensive evaluation of third-party cookie policies
> It’s an evaluation of the defense mechanisms built into browsers (and via extensions / add-ons) that seek to protect against user tracking and cross-site attacks. Testing across 7 browsers and 46 browser extensions, the authors find that for virtually every browser and extension combination there is a way to bypass the intended security policies.
Image-to-image translation with conditional adversarial networks
> It’s time we looked at some machine learning papers again! Over the next few days I’ve selected a few papers that demonstrate the exciting capabilities being developed around images.
> Anyway, back to the research! The common name for the system described in today’s paper is pix2pix. You can find the code and more details online at https://github.com/phillipi/pix2pix. The name ‘pix2pix’ comes from that fact that the network is trained to map from input pictures (images) to output pictures (images), where the output is some translation of the input.
How not to structure your database-backed web applications: a study of performance bugs in the wild
> This is a fascinating study of the problems people get into when using ORMs to handle persistence concerns in their web applications. The authors study real-world applications and distil a catalogue of common performance anti-patterns. There are a bunch of familiar things in the list, and a few that surprised me with the amount of difference they can make. By fixing many of the issues that they find, Yang et al., are able to quantify how many lines of code it takes to address the issue, and what performance improvement the fix delivers.
> Note that fundamentally a lot of the issues stem from the fact that the ‘O’ in ORM could just as easily stand for ‘Opaque.’
> Chrome Zero is a proof of concept implementation that defends against these attacks. It installs as a Chrome extension and protects functions, properties, and objects that can be exploited to construct attacks. The basic idea is very simple, functions are wrapped with replacement versions that allow injection of a policy. This idea of wrapping functions (and properties with accessor properties, and certain objects with proxy objects) goes by the fancy name of virtual machine layering.
Quantum computing in the NISQ era and beyond
> “Intermediate scale” refers to computers with between 50 and a few hundred qubits. The 50 qubit milestone is significant because that takes us beyond what we can simulate by brute force using the most powerful existing supercomputers. “Noisy” emphasises that we’ll have imperfect control over those qubits. Because of the noise, we expect a limit of about 1000 gates in a circuit – i.e., 1000 fundamental two-qubit operations. Executing a single gate is about 1000 times slower on an ion trap quantum processor than on a superconducting circuit.
Regarding the near future potential of quantum computers.
For a look at the past, Shor’s Algorithm: https://blog.acolyer.org/2018/02/02/polynomial-time-algorithms-for-prime-factorization-and-discrete-logarithms-on-a-quantum-computer/
A sample of brilliance
> Jon Bentley’s “Programming Pearls” was a well-loved column in CACM (and also available in book form). Today we’re taking at look at his “Sample of Brilliance” column from 1987, featuring guest contributions from none other then Bob Floyd (whose Turing Award lecture we reviewed yesterday). It’s a chance to see Floyd’s algorithm development skills in action.
A practitioner’s guide to reading programming languages papers
> Last week I jokingly said that POPL papers must pass an ‘intellectual intimidation’ threshold in order to be accepted. That’s not true of course, but it is the case that programming languages papers can look especially intimidating to the practitioner (or indeed, the academic working in a different sub-discipline of computer science!). They are full of dense figures with mathematical symbols, and phrases thrown around such as “judgements”, “operational semantics”, and the like.
Linear Haskell: Practical linearity in a higher-order polymorphic language
> The paper is focused around two main use cases for linear types. The one that really catches my imagination is encoding the usage protocol of a resource (e.g., you can’t read from a file before it is opened) in types. “Here, linearity does not affect efficiency, but rather eliminates many bugs.” We’ll return to some examples of this later in this post. It seems like there’s a very nice and natural fit to protocols in distributed systems (a bit like we looked at with Disel yesterday).
> The second use case is safe update in place, and the reason we’re interested in that is efficiency – if we know it’s safe to update a value in place we don’t need to worry about making copies, or mutual exclusion mechanisms.
RustBelt: securing the foundations of the Rust programming language
> We’d really like to know whether those claims hold or not, and if so under what conditions, and that’s what today’s paper choice is all about. The authors give a formal and machine checked safety proof for a meaningful subset of Rust. If a library uses unsafe features, this gives a framework to say what verification condition it must satisfy for it to be considered a safe extension to the language.
> Now be warned, this is not an easy paper (it’s POPL, we should expect nothing less – I like to imagine the PC have an ‘intellectual intimidation’ criteria in the review process, and no paper falling below a certain threshold can be accepted!). And even though it runs to 34 pages, you’d really need to consult the even longer technical report to fully understand everything. Fortunately for most of us it suffices to know that the work exists, the results it demonstrates, and the implications for Rust. So those are the aspects I’ll focus on today.
The dynamics of innocent flesh on the bone: code reuse ten years later
> These attacks use static analysis to find available gadgets. In today’s paper, the authors introduce a dynamic approach to finding gadgets based on carefully observing a process as it executes. Newton is a run-time gadget-discovery framework based on constraint driven dynamic taint analysis that “can easily find function call gadgets even in the presence of state-of-the-art code-reuse defenses.”
ffwd: delegation is (much) faster than you think
> As soon as we want to make use of more than one hardware thread to solve a given problem, then the question of coordination rears its head. Today’s paper choice examines mechanisms for providing safe and efficient access to shared variables and data structures across coordinating threads.