A fast alternative to the modulo reduction
> Assume that x and N are 32-bit integers, consider the 64-bit product x * N. You have that (x * N) div 2^32 is in the range, and it is a fair map.
Faster remainders when the divisor is a constant: beating compilers and libdivide
> Not all instructions on modern processors cost the same. Additions and subtractions are cheaper than multiplications which are themselves cheaper than divisions. For this reason, compilers frequently replace division instructions by multiplications. Roughly speaking, it works in this manner. Suppose that you want to divide a variable n by a constant d. You have that n/d = n * (2N/d) / (2N). The division by a power of two (/ (2N)) can be implemented as a right shift if we are working with unsigned integers, which compiles to single instruction: that is possible because the underlying hardware uses a base 2. Thus if 2N/d has been precomputed, you can compute the division n/d as a multiplication and a shift. Of course, if d is not a power of two, 2N/d cannot be represented as an integer. Yet for N large enough, we can approximate 2N/d by an integer and have the exact computation of the remainder for all possible n within a range. I believe that all optimizing C/C++ compilers know how to pull this trick and it is generally beneficial irrespective of the processor’s architecture.
> Can we do better? It turns out that in some instance, we can beat both the compilers and a library like libdivide.
Don’t make it appear like you are reading your own recent writes
On apparent false aliasing with vector instructions.
Quantifying the performance benefits of Go 1.9 on bitsets
> Go, the programming language initiated at Google, has recently shipped its version 1.9. One big change is the introduction of the math/bits package which offers hardware-accelerated functions to manipulate data.
How many floating-point numbers are in the interval [0,1]?
And how to uniformly pick one.