Big Tech Is Testing You
> Large-scale social experiments are now ubiquitous, and conducted without public scrutiny. Has this new era of experimentation remembered the lessons of the old?
> Physics, chemistry, and medicine have had their revolution. But now, driven by experimentation, a further transformation is in the air. That’s the argument of “The Power of Experiments” (M.I.T.), by Michael Luca and Max H. Bazerman, both professors at the Harvard Business School. When it comes to driving our decisions in a world of data, they say, “the age of experiments is only beginning.”
> Despite being highly privileged and processing untrusted input by design, it is unsandboxed and has poor mitigation coverage. Any vulnerabilities in this process are critical, and easily accessible to remote attackers.
> SafeSide is a project to understand and mitigate software-observable side-channels: information leaks between software domains caused by implementation details outside the software abstraction.
Dynamically scoped variables in Go
> What we want is to be able to access a variable whose declaration is neither global, or local to the function, but somewhere higher in the call stack. This is called dynamic scoping. Go doesn’t support dynamic scoping, but it turns out, for restricted cases, we can fake it.
The 3 A.M. Phone Call
> It went to a national security adviser, Zbigniew Brzezinski, who was awakened on 9 November 1979, to be told that the North American Aerospace Defense Command (NORAD), the combined U.S.–Canada military command–was reporting a Soviet missile attack. Just before Brzezinski was about to call President Carter, the NORAD warning turned out to be a false alarm. It was one of those moments in Cold War history when top officials believed they were facing the ultimate threat. The apparent cause? The routine testing of an overworked computer system.
Helping Generative Fuzzers Avoid Looking Only Where the Light is Good
> Using a generative fuzzer — which creates test cases from scratch, rather than mutating a collection of seed inputs — feels to me a lot like being the drunk guy in the joke: we’re looking for bugs that can be triggered by inputs that the generator is likely to generate, because we don’t have an obviously better option, besides doing some hard work in improving the generator. This problem has bothered me for a long time.
Binary symbolic execution with KLEE-Native
> KLEE is a symbolic execution tool that intelligently produces high-coverage test cases by emulating LLVM bitcode in a custom runtime environment. Yet, unlike simpler fuzzers, it’s not a go-to tool for automated bug discovery. Despite constant improvements by the academic community, KLEE remains difficult for bug hunters to adopt. We’re working to bridge this gap!
> My internship produced KLEE-Native; a version of KLEE that can concretely and symbolically execute binaries, model heap memory, reproduce CVEs, and accurately classify different heap bugs. The project is now positioned to explore applications made possible by KLEE-Native’s unique approaches to symbolic execution. We will also be looking into potential execution time speed-ups from different lifting strategies. As with all articles on symbolic execution, KLEE is both the problem and the solution.
Write Fuzzable Code
> Fuzzing is sort of a superpower for locating vulnerabilities and other software defects, but it is often used to find problems baked deeply into already-deployed code. Fuzzing should be done earlier, and moreover developers should spend some effort making their code more amenable to being fuzzed.
> This post is a non-comprehensive, non-orthogonal list of ways that you can write code that fuzzes better. Throughout, I’ll use “fuzzer” to refer to basically any kind of randomized test-case generator, whether mutation-based (afl, libFuzzer, etc.) or generative (jsfunfuzz, Csmith, etc.). Not all advice will apply to every situation, but a lot of it is sound software engineering advice in general. I’ve bold-faced a few points that I think are particularly important.
Design and Evolution of C-Reduce
> Since 2008, my colleagues and I have developed and maintained C-Reduce, a tool for programmatically reducing the size of C and C++ files that trigger compiler bugs. C-Reduce also usually does a credible job reducing test cases in languages other than C and C++; we’ll return to that later.
Part 2: https://blog.regehr.org/archives/1679
Vintage TV Test Patterns
> As you might expect, the BBC test card with the girl and clown has both a backstory and a cult following.
hey - HTTP load generator
> hey is a tiny program that sends some load to a web application.
Increasing coverage of signal semantics in regression tests
> Kernel signal code is a complex maze, it’s very difficult to introduce non-trivial changes without regressions. Over the past month I worked on covering missing elementary scenarios involving the ptrace(2) API. Part of the new tests were marked as expected to success, however a number of them are expected to fail.
I ran Cypress (the JS testing tool) exactly one time ever.
> Today I noticed that it put 42,471 files in ~/Library/Caches. 41% of all cache files on my machine are from that one launch. The resource consumption of modern programming tools is just reckless.
> Time to the first reply literally beginning with the words “who cares“: about one hour.
> Some people claim that unit tests make type systems unnecessary: “types are just simple unit tests written for you, and simple unit tests aren’t the important ones”. Other people claim that type systems make unit tests unnecessary: “dynamic languages only need unit tests because they don’t have type systems.” What’s going on here? These can’t both be right. We’ll use this example and a couple others to explore the unknown beliefs that structure our understanding of the world.
Really about our hidden assumptions.
> “Before I was alive I was wrong about this.”
My favorite papers of 2017
The (machine) learning was strong this year.
With the Router, In the Conference Room
> The killer was Cathy, in the issue tracking system, with the snarky bug report.
DeepXplore: automated whitebox testing of deep learning systems
> The state space of deep learning systems is vast. As we’ve seen with adversarial examples, that creates opportunity to deliberately craft inputs that fool a trained network. Forget adversarial examples for a moment though, what about the opportunity for good old-fashioned bugs to hide within that space? Experience with distributed systems tells us that there are likely to be plenty! And that raises an interesting question: how do you test a DNN?
At first glance this seems like more of the same adversarial stuff, fun as that may be, but they seem to do a better job finding real world scenarios that are misclassified. Nothing malicious, per se, just bad luck.
> os-test is a set of test suites for POSIX operating systems designed to make it easy to compare differences between operating systems and to find operating system bugs. It consists of test suites that focus on different operating system areas. This page visualizes the results for the free software POSIX operating systems that are relevant today.