The Year Ahead
There are a few conferences from 2019 that I didn’t manage to get to last year (notably CCS, SOCC, and NeurIPS) which are still on my plate. And then I’ve pulled together this initial ‘watch list’ for the coming year.
Ironies of automation
The central irony (‘combination of circumstances, the result of which is the direct opposite of what might be expected’) referred to in this paper is that the more we automate, and the more sophisticated we make that automation, the more we become dependent on a highly skilled human operator.
Snap: a microkernel approach to host networking
This paper describes the networking stack, Snap, that has been running in production at Google for the last three years+. It’s been clear for a while that software designed explicitly for the data center environment will increasingly want/need to make different design trade-offs to e.g. general-purpose systems software that you might install on your own machines. But wow, I didn’t think we’d be at the point yet where we’d be abandoning TCP/IP! You need a lot of software engineers and the willingness to rewrite a lot of software to entertain that idea.
An analysis of performance evolution of Linux’s core operations
When you get into the details I found it hard to come away with any strongly actionable takeaways though. Perhaps the most interesting lesson/reminder is this: it takes a lot of effort to tune a Linux kernel. For example:
“Red Hat and Suse normally required 6-18 months to optimise the performance an an upstream Linux kernel before it can be released as an enterprise distribution”, and
“Google’s data center kernel is carefully performance tuned for their workloads. This task is carried out by a team of over 100 engineers, and for each new kernel, the effort can also take 6-18 months.”
50 ways to leak your data: an exploration of apps’ circumvention of the Android permissions system
This paper is a study of Android apps in the wild that leak permission protected data (identifiers which can be used for tracking, and location information), where those apps should not have been able to see such data due to a lack of granted permissions. By detecting such leakage and analysing the responsible apps, the authors uncover a number of covert and side channels in real-world use.
The secret-sharer: evaluating and testing unintended memorization in neural networks
This is a really important paper for anyone working with language or generative models, and just in general for anyone interested in understanding some of the broader implications and possible unintended consequences of deep learning. There’s also a lovely sense of the human drama accompanying the discoveries that just creeps through around the edges.
Disclosure of secrets is of particular concern in neural network models that classify or predict sequences of natural language text… even if sensitive or private training data text is very rare, one should assume that well-trained models have paid attention to its precise details…. The users of such models may discover— either by accident or on purpose— that entering certain text prefixes causes the models to output surprisingly revealing text completions.
The convoy phenomenon
The duration of a lock is the average number of instructions executed while the lock is held. The execution interval of a lock is the average number of instructions executed between successive requests for that lock by a process. The collision cross section of the lock is the fraction of time it is granted, i.e., the lock grant probability.
Most of us are stuck with a pre-emptive scheduler (i.e., general purpose operating system with virtual memory). Hence convoys will occur. The problem is to make them evaporate quickly when they do occur rather than have them persist forever.
Software-defined far memory in warehouse scale computers
Understanding real-world concurrency bugs in Go
We perform the first systematic study on concurrency bugs in real Go programs. We studied six popular Go software [projects] including Docker, Kubernetes, and gRPC. We analyzed 171 concurrency bugs in total, with more than half of them caused by non-traditional, Go-specific problems. Apart from root causes of these bugs, we also studied their fixes, performed experiments to reproduce them, and evaluated them with two publicly-available Go bug detectors.
Slim: OS kernel support for a low-overhead container overlay network
In theory there are four possible modes for container networking: a bridge mode for containers on the same host; host mode in which containers use the IP address of their host network interface; macvlan mode (or similar hardware mechanisms) to give each container its own IP address; and overlay mode in which each container is given its own own virtual network interface and each application has its own network namespace.
Cloud computing simplified: a Berkeley view on serverless computing
Ten years ago Berkeley released the ‘Berkeley view of cloud computing’ paper, predicting that cloud use would accelerate. Today’s paper choice is billed as its logical successor: it predicts that the use of serverless computing will accelerate. More precisely:
… we predict that serverless computing will grow to dominate the future of cloud computing.
Keeping CALM: when distributed consistency is easy
When it comes to high performing scalable distributed systems, coordination is a killer. It’s the dominant term in the Universal Scalability Law. When we can avoid or reduce the need for coordination things tend to get simpler and faster. See for example Coordination avoidance in database systems, and more recently the amazing performance of Anna which gives a two-orders-of-magnitude speed-up through coordination elimination. So we should avoid coordination whenever we can.
So far so good, but when exactly can we avoid coordination? Becoming precise in the answer to that question is what the CALM theorem is all about. You’re probably familiar with Brooks’ distinction between essential complexity and accidental complexity in his ‘No silver bullet’ essay. Here we get to tease apart the distinction between essential coordination, a guarantee that cannot be provided without coordinating, and accidental coordination, coordination that could have been avoided with a more careful design.
Large scale GAN training for high fidelity natural image synthesis
I was drawn to this paper to try and find out what’s behind the stunning rate of progress. The large-scale GANs (can I say LS-GAN?) trained here set a new state-of-the-art in class-conditional image synthesis.
So what’s the secret to LS-GANs success? Partly of course it’s just a result of scaling up the models – but interestingly by going wide rather than deep. However, GANs were already notoriously difficult to train (‘unstable’), and scaling things up magnifies the training issues too. So the other part of the innovation here is figuring out how to maintain stability at scale. It’s less one big silver bullet, and more a clever aggregation of techniques from the deep learning parts bin. All in all, it has the feel to me of reaching the upper slopes of an ‘s’-curve such that we might need something new to get us onto the next curve. But hey, with the amazing rates of progress we’ve been seeing I could well be wrong about that.
RobinHood: tail latency aware caching – dynamic reallocation from cache-rich to cache-poor
RobinHood dynamically allocates cache space to those backends responsible for high request tail latency (cache-poor) backends, while stealing space from backends that do not affect the request tail latency (cache-rich backends). In doing so, Robin Hood makes compromises that may seem counter-intuitive (e.g., significantly increasing the tail latencies of certain backends).
Making the fast slow to make the slow fast.
Columnstore and B+ tree – are hybrid physical designs important?
It is generally understood that columnstores are crucial to achieving high performance for analytic queries and that B+ tree indexes are key to supporting transactional workloads efficiently. However, it is not well understood whether hybrid physical designs – both columnstore and B+ tree indices on the same database and potentially the same table – are important for any of the above workloads.
Through a series of benchmarks the authors show that hybrid physical designs can result in more than an order of magnitude lower execution costs for many workloads when compared to alternatives using B+ tree-only or columnstore-only. The Database Engine Tuning Advisor (DTA) for SQL Server is extended to analyze and recommend the appropriate indices for a given workload.
Who left open the cookie jar? A comprehensive evaluation of third-party cookie policies
It’s an evaluation of the defense mechanisms built into browsers (and via extensions / add-ons) that seek to protect against user tracking and cross-site attacks. Testing across 7 browsers and 46 browser extensions, the authors find that for virtually every browser and extension combination there is a way to bypass the intended security policies.
Image-to-image translation with conditional adversarial networks
It’s time we looked at some machine learning papers again! Over the next few days I’ve selected a few papers that demonstrate the exciting capabilities being developed around images.
Anyway, back to the research! The common name for the system described in today’s paper is pix2pix. You can find the code and more details online at https://github.com/phillipi/pix2pix. The name ‘pix2pix’ comes from that fact that the network is trained to map from input pictures (images) to output pictures (images), where the output is some translation of the input.
How not to structure your database-backed web applications: a study of performance bugs in the wild
This is a fascinating study of the problems people get into when using ORMs to handle persistence concerns in their web applications. The authors study real-world applications and distil a catalogue of common performance anti-patterns. There are a bunch of familiar things in the list, and a few that surprised me with the amount of difference they can make. By fixing many of the issues that they find, Yang et al., are able to quantify how many lines of code it takes to address the issue, and what performance improvement the fix delivers.
Note that fundamentally a lot of the issues stem from the fact that the ‘O’ in ORM could just as easily stand for ‘Opaque.’
Chrome Zero is a proof of concept implementation that defends against these attacks. It installs as a Chrome extension and protects functions, properties, and objects that can be exploited to construct attacks. The basic idea is very simple, functions are wrapped with replacement versions that allow injection of a policy. This idea of wrapping functions (and properties with accessor properties, and certain objects with proxy objects) goes by the fancy name of virtual machine layering.
Quantum computing in the NISQ era and beyond
“Intermediate scale” refers to computers with between 50 and a few hundred qubits. The 50 qubit milestone is significant because that takes us beyond what we can simulate by brute force using the most powerful existing supercomputers. “Noisy” emphasises that we’ll have imperfect control over those qubits. Because of the noise, we expect a limit of about 1000 gates in a circuit – i.e., 1000 fundamental two-qubit operations. Executing a single gate is about 1000 times slower on an ion trap quantum processor than on a superconducting circuit.
Regarding the near future potential of quantum computers.
For a look at the past, Shor’s Algorithm: https://blog.acolyer.org/2018/02/02/polynomial-time-algorithms-for-prime-factorization-and-discrete-logarithms-on-a-quantum-computer/