Antithesis of a One-in-a-Million Bug: Taming Demonic Nondeterminism
https://www.cockroachlabs.com/blog/demonic-nondeterminism/ [www.cockroachlabs.com]
2024-03-22 19:46
Bugs are compounded by the number of distinct nodes operating in a distributed system, each providing their own sources of nondeterminism with thread timings, network conditions, hardware, and more. Finding and fixing these bugs requires new approaches to testing and debugging.
Like any emerging technology, the Antithesis platform is not without rough edges. Deterministic replay doesn’t immediately get you a reproduction, particularly across distinct code changes as you might see with a unit or integration test. In our experience, a significant amount of effort was invested in instrumenting the logs, as well as reasoning about injected failure states in order to recover the state machine which reproduces the bug. The rinse-and-repeat cycle means that a modified binary (with new instrumentation) may not always hit the same terminal state. Although, in practice determinism between runs is very high assuming the code changes are localized.
source: HN