How I accidentally took down GitHub Actions
> Commit shorthashes have a major problem: As a repository accumulates a large number of commits, eventually it will contain two commit hashes that start with the same seven characters (and have the same shorthash). After this happens, tools that use shorthashes will start to break because the commit shorthash is ambiguous (it’s no longer a pointer to a single commit). Due to the birthday problem, any repository that has at least 19291 commits is likely to have a pair of ambiguous commits somewhere. So if we waited for the actions/docker repo to have tens of thousands of commits, one of the shorthashes would eventually become ambiguous and break someone’s build.
A fast alternative to the modulo reduction
> Assume that x and N are 32-bit integers, consider the 64-bit product x * N. You have that (x * N) div 2^32 is in the range, and it is a fair map.
A Couple of (Probabilistic) Worst-case Bounds for Robin Hood Linear Probing
> I like to think of Robin Hood hash tables with linear probing as arrays sorted on uniformly distributed keys, with gaps. That makes it clearer that we can use these tables to implement algorithms based on merging sorted streams in bulk, as well as ones that rely on fast point lookups. A key question with randomised data structures is how badly we can expect them to perform.
Introducing Ristretto: A High-Performance Go Cache
> With over six months of research and development, we’re proud to announce the initial release of Ristretto: A High Performance, Concurrent, Memory-Bound Go cache. It is contention-proof, scales well and provides consistently high hit-ratios.
Interesting read even if only for the links to prior art and research.
How (not) to sign a JSON object
This covers a lot of ground. I liked this quote, even though there’s much more to the post.
> Canonicalization is a quagnet, which is a term of art in vulnerability research meaning quagmire and vulnerability magnet. You can tell it’s bad just by how hard it is to type ‘canonicalization’.
John the Ripper 1.9.0-jumbo-1
> It’s been 4.5 years and 6000+ jumbo tree commits (not counting JtR core tree commits, nor merge commits) since we released 1.8.0-jumbo-1:
> Proof of work algorithm based on random code execution
XXH3 - a new speed-optimized hash algorithm
> I was recently summoned to investigate performance for a bloom filter implementation, requiring to generate quickly 64 pseudo-random bits from small inputs of variable length. XXH64 could fit the bill, but performance on small inputs, never was its priority. It’s not completely wasteful either, it pays a bit attention to short inputs thanks to a small speed module in SMHasher. However, the module itself does the bare minimum, and it was not clear to me what’s exactly measured.
> So I decided to create my own benchmark program, as a way to ensure that I understand and control what’s being measured. This was a very interesting journey, leading to surprising discoveries.
> The end result of this investigation is XXH3, a cross-over inspired by many other great hash algorithms, which proves substantially faster than existing variants of xxHash, across basically all dimensions. Let’s detail those dimensions, and give some credit where inspiration is due.
A Deep Dive on RSA Accumulators
> Accumulators are a topic of interest in academia since 1994. Similarly to a Merkle Tree, they are used to cryptographically commit to the knowledge of a set of data. At a later point in time, proving membership of a subset of the dataset in the dataset can be proven by publishing a proof. In Merkle Trees the proof is called a Merkle Branch (or Merkle Proof), and grows logarithmically to the size of the committed data (commit 16 elements, prove inclusion by revealing log_2(16)=4).
> Accumulators on the other hand, allow proving membership in constant size, as well as batching of proofs for multiple elements (which is not a feature of Merkle trees).
> The focus of this post will be on describing the building blocks of RSA Accumulators, how we can construct proofs of (non-)membership as well as batch them across multiple blocks. This particular technique also has applications in UTXO-Based Plasma, and has given birth to the Plasma Prime variant. A lot of effort is being put into designing an accumulator that allows compaction of the UTXO-set in Plasma.
Fast Perfect Hashing Of Integral Types
> Encrypting a padded 32-bit value with AES will not produce a perfect 32-bit hash. AES is only guaranteed to be perfect if the key is 128-bits. Making the AES algorithm produce a perfect hash of, for example, a 32-bit key in its lowest 32 bits requires understanding the internals of the AES algorithm.
> At any rate, you would like to make clusters out of your data, but you only get to look at each item once in isolation. After looking at it you have to decide what cluster it should go to, at that moment, without looking at any other information, or any other items in your dataset. You only get one shot, do not throw it away! How can we accomplish this?
Searching statically-linked vulnerable library functions in executable code
> Software supply chains are increasingly complicated, and it can be hard to detect statically-linked copies of vulnerable third-party libraries in executables. This blog post discusses the technical details of an Apache-licensed open-source library to detect code from other open-source libraries in executables, along with some real-world findings of forked open-source libraries in real-world software.
Cuckoo Breeding Ground - A Better Cuckoo Hash Table
> Perhaps the most significant downside of cuckoo hashing, however, is that it potentially requires checking multiple memory regions randomly distributed throughout the table. In many settings, such random access lookups are expensive, making cuckoo hashing a less compelling alternative. We design a variant of cuckoo hashing that reduces the number of memory regions accessed, increase the load threshold and remains relatively simple. We do this by choosing some less popular options for cuckoo hashing with a couple of novel ideas.
GPU & FPGA cracking speeds for bcrypt, sha512crypt, sha256crypt, bsdicrypt scaled for same running time on CPU
Other comments in the discussion are also interesting.
Opening a File After A Hash Was Made and Matched to Known Image of Child Pornography is Not a "Search," Fifth Circuit Rules
> An interesting case applying the private search reconstruction doctrine.
> I suppose this hinges on what the baseline knowledge should be for a opening a file. It’s an interesting question. If it is known that a particular hash value corresponds with a particular known image, how do you model what is learned by opening a file that matched that hash? Do you say that the opener of the file already has the knowledge of what that particular image looks like, and that opening the file to see that it is that image really just confirms that it’s a match and doesn’t tell the agent anything else? Or do you model the agent’s knowledge as just being that a file matched with some known image, and that opening the file thus gives the opener more information about what the file looks like?
Prospecting for Hash Functions
> Suppose, for example, I wrote tool to generate a random hash function definition, then JIT compile it to a native function in memory, then execute that function across various inputs to evaluate its properties. My tool could rapidly repeat this process in a loop until it stumbled upon an incredible hash function the world had never seen. That’s what I actually did. I call it the Hash Prospector:
The Verge Hack, Explained
> In both cases, this hack presents a strong argument for tending towards sticking to things proven to work and to be wary of overcomplicating things and thereby introducing unnecessary risks when people’s financial assets are involved.
When more is less.
Hash-based Signatures: An illustrated Primer
> But before I get to all of that — much further below — let me stress that this is not a post about the quantum computing apocalypse, nor is it about the success of cryptography in the 21st century. Instead I’m going to talk about something much more wonky. This post will be about one of the simplest (and coolest!) cryptographic technologies ever developed: hash-based signatures.
yescrypt 1.0.0 - modern KDF and password
> This is to announce the release of yescrypt 1.0.0.
> yescrypt is a password-based key derivation function (KDF) and password hashing scheme. It builds upon Colin Percival’s scrypt and includes classic scrypt, a minor extension of scrypt known as YESCRYPT_WORM (named that for “write once, read [potentially] many [times]”, which is how scrypt works), and the full native yescrypt also known as YESCRYPT_RW (for “read-write“).
The SCRAM Authentication Protocol
A better CRAM-MD5. Interesting to consider, probably would not use in production.