Faster remainders when the divisor is a constant: beating compilers and libdivide

https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/ [lemire.me]

2019-02-08 22:39

Not all instructions on modern processors cost the same. Additions and subtractions are cheaper than multiplications which are themselves cheaper than divisions. For this reason, compilers frequently replace division instructions by multiplications. Roughly speaking, it works in this manner. Suppose that you want to divide a variable n by a constant d. You have that n/d = n * (2N/d) / (2N). The division by a power of two (/ (2N)) can be implemented as a right shift if we are working with unsigned integers, which compiles to single instruction: that is possible because the underlying hardware uses a base 2. Thus if 2N/d has been precomputed, you can compute the division n/d as a multiplication and a shift. Of course, if d is not a power of two, 2N/d cannot be represented as an integer. Yet for N large enough, we can approximate 2N/d by an integer and have the exact computation of the remainder for all possible n within a range. I believe that all optimizing C/C++ compilers know how to pull this trick and it is generally beneficial irrespective of the processorâ€™s architecture.

Can we do better? It turns out that in some instance, we can beat both the compilers and a library like libdivide.

Paper: https://arxiv.org/abs/1902.01961

source: HN