What is the random oracle model and why should you care?
> About eight years ago I set out to write a very informal piece on a specific cryptographic modeling technique called the “random oracle model”. This was way back in the good old days of 2011, which was a more innocent and gentle era of cryptography. Back then nobody foresaw that all of our standard cryptography would turn out to be riddled with bugs; you didn’t have to be reminded that “crypto means cryptography“. People even used Bitcoin to actually buy things.
> That first random oracle post somehow sprouted three sequels, each more ridiculous than the last. I guess at some point I got embarrassed about the whole thing — it’s pretty cheesy, to be honest — so I kind of abandoned it unfinished. And that’s been a major source of regret for me, since I had always planned a fifth, and final post, to cap the whole messy thing off. This was going to be the best of the bunch: the one I wanted to write all along.
The BLAKE3 cryptographic hash function
> BLAKE3 is based on an optimized instance of the established hash function BLAKE2, and on the original Bao tree mode. The BLAKE3 specifications and design rationale are available in the BLAKE3 paper. The current version of Bao implements verified streaming with BLAKE3.
SHA-1 is a Shambles
> We have computed the very first chosen-prefix collision for SHA-1. In a nutshell, this means a complete and practical break of the SHA-1 hash function, with dangerous practical implications if you are still using this hash function. To put it in another way: all attacks that are practical on MD5 are now also practical on SHA-1.
Xor Filters: Faster and Smaller Than Bloom Filters
> Among other alternatives, Fan et al. introduced Cuckoo filters which use less space and are faster than Bloom filters. While implementing a Bloom filter is a relatively simple exercise, Cuckoo filters require a bit more engineering.
> Could we do even better while limiting the code to something you can hold in your head?
> It turns out that you can with xor filters. We just published a paper called Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters that will appear in the Journal of Experimental Algorithmics.
> Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution. When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
> While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3) can dramatically lower the execution speed and code size costs. Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically). This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI. Considerations that are implementation-specific are clearly identified throughout.
How I accidentally took down GitHub Actions
> Commit shorthashes have a major problem: As a repository accumulates a large number of commits, eventually it will contain two commit hashes that start with the same seven characters (and have the same shorthash). After this happens, tools that use shorthashes will start to break because the commit shorthash is ambiguous (it’s no longer a pointer to a single commit). Due to the birthday problem, any repository that has at least 19291 commits is likely to have a pair of ambiguous commits somewhere. So if we waited for the actions/docker repo to have tens of thousands of commits, one of the shorthashes would eventually become ambiguous and break someone’s build.
A fast alternative to the modulo reduction
> Assume that x and N are 32-bit integers, consider the 64-bit product x * N. You have that (x * N) div 2^32 is in the range, and it is a fair map.
A Couple of (Probabilistic) Worst-case Bounds for Robin Hood Linear Probing
> I like to think of Robin Hood hash tables with linear probing as arrays sorted on uniformly distributed keys, with gaps. That makes it clearer that we can use these tables to implement algorithms based on merging sorted streams in bulk, as well as ones that rely on fast point lookups. A key question with randomised data structures is how badly we can expect them to perform.
Introducing Ristretto: A High-Performance Go Cache
> With over six months of research and development, we’re proud to announce the initial release of Ristretto: A High Performance, Concurrent, Memory-Bound Go cache. It is contention-proof, scales well and provides consistently high hit-ratios.
Interesting read even if only for the links to prior art and research.
How (not) to sign a JSON object
This covers a lot of ground. I liked this quote, even though there’s much more to the post.
> Canonicalization is a quagnet, which is a term of art in vulnerability research meaning quagmire and vulnerability magnet. You can tell it’s bad just by how hard it is to type ‘canonicalization’.
John the Ripper 1.9.0-jumbo-1
> It’s been 4.5 years and 6000+ jumbo tree commits (not counting JtR core tree commits, nor merge commits) since we released 1.8.0-jumbo-1:
> Proof of work algorithm based on random code execution
XXH3 - a new speed-optimized hash algorithm
> I was recently summoned to investigate performance for a bloom filter implementation, requiring to generate quickly 64 pseudo-random bits from small inputs of variable length. XXH64 could fit the bill, but performance on small inputs, never was its priority. It’s not completely wasteful either, it pays a bit attention to short inputs thanks to a small speed module in SMHasher. However, the module itself does the bare minimum, and it was not clear to me what’s exactly measured.
> So I decided to create my own benchmark program, as a way to ensure that I understand and control what’s being measured. This was a very interesting journey, leading to surprising discoveries.
> The end result of this investigation is XXH3, a cross-over inspired by many other great hash algorithms, which proves substantially faster than existing variants of xxHash, across basically all dimensions. Let’s detail those dimensions, and give some credit where inspiration is due.
A Deep Dive on RSA Accumulators
> Accumulators are a topic of interest in academia since 1994. Similarly to a Merkle Tree, they are used to cryptographically commit to the knowledge of a set of data. At a later point in time, proving membership of a subset of the dataset in the dataset can be proven by publishing a proof. In Merkle Trees the proof is called a Merkle Branch (or Merkle Proof), and grows logarithmically to the size of the committed data (commit 16 elements, prove inclusion by revealing log_2(16)=4).
> Accumulators on the other hand, allow proving membership in constant size, as well as batching of proofs for multiple elements (which is not a feature of Merkle trees).
> The focus of this post will be on describing the building blocks of RSA Accumulators, how we can construct proofs of (non-)membership as well as batch them across multiple blocks. This particular technique also has applications in UTXO-Based Plasma, and has given birth to the Plasma Prime variant. A lot of effort is being put into designing an accumulator that allows compaction of the UTXO-set in Plasma.
Fast Perfect Hashing Of Integral Types
> Encrypting a padded 32-bit value with AES will not produce a perfect 32-bit hash. AES is only guaranteed to be perfect if the key is 128-bits. Making the AES algorithm produce a perfect hash of, for example, a 32-bit key in its lowest 32 bits requires understanding the internals of the AES algorithm.
> At any rate, you would like to make clusters out of your data, but you only get to look at each item once in isolation. After looking at it you have to decide what cluster it should go to, at that moment, without looking at any other information, or any other items in your dataset. You only get one shot, do not throw it away! How can we accomplish this?
Searching statically-linked vulnerable library functions in executable code
> Software supply chains are increasingly complicated, and it can be hard to detect statically-linked copies of vulnerable third-party libraries in executables. This blog post discusses the technical details of an Apache-licensed open-source library to detect code from other open-source libraries in executables, along with some real-world findings of forked open-source libraries in real-world software.
Cuckoo Breeding Ground - A Better Cuckoo Hash Table
> Perhaps the most significant downside of cuckoo hashing, however, is that it potentially requires checking multiple memory regions randomly distributed throughout the table. In many settings, such random access lookups are expensive, making cuckoo hashing a less compelling alternative. We design a variant of cuckoo hashing that reduces the number of memory regions accessed, increase the load threshold and remains relatively simple. We do this by choosing some less popular options for cuckoo hashing with a couple of novel ideas.
GPU & FPGA cracking speeds for bcrypt, sha512crypt, sha256crypt, bsdicrypt scaled for same running time on CPU
Other comments in the discussion are also interesting.
Opening a File After A Hash Was Made and Matched to Known Image of Child Pornography is Not a "Search," Fifth Circuit Rules
> An interesting case applying the private search reconstruction doctrine.
> I suppose this hinges on what the baseline knowledge should be for a opening a file. It’s an interesting question. If it is known that a particular hash value corresponds with a particular known image, how do you model what is learned by opening a file that matched that hash? Do you say that the opener of the file already has the knowledge of what that particular image looks like, and that opening the file to see that it is that image really just confirms that it’s a match and doesn’t tell the agent anything else? Or do you model the agent’s knowledge as just being that a file matched with some known image, and that opening the file thus gives the opener more information about what the file looks like?