How to divide by 3 quickly using the NEON instruction set


Ok… so as many of you know, Neon is ARM’s library that has optimised low level system calls specialised for the A9 and A7 architectures. Unlike Ne10, it can be really fast… In many cases it can also be really painful to code in! Division isn’t provided as a part of the Neon instruction set, and it would be incredibly inefficient for an SSE instruction to widen an 8 bit unsigned integer up to a 32 bit float, solely to multiply by the reciprocal (multiplicative inverse). Fear not! There are ways to divide with neon, whilst it being relatively efficient. One such method is known as unsigned integer division by constants. This can be performed by computing the reciprocal of the divisor in advance, then multiply the dividend by that series of shift right and add instructions. This gives an approximation of the quotient.

Here is a pre rolled version of my vdiv3_u8 instruction to achieve division by 3 using as much SSE neon goodness as possible!

uint8x8_t vdiv3_u8(uint8x8_t in){
    //widen in
    uint16x8_t tmp = vmovl_u8(in);

    //q = (n >> 2) + (n >> 4)   ~ q = n * 0.0101 (approx.)
    uint16x8_t quo = vshrq_n_u16(tmp, 2);
    quo = vaddq_u16(quo, vshrq_n_u16(tmp, 4));
    //q = q + (q >> 4)          ~ q = n * 0.01010101
    quo = vaddq_u16(quo, vshrq_n_u16(quo, 4));
    //q = q + (q >> 8)          ~ q = n * 0.0101010101010101
    quo = vaddq_u16(quo, vshrq_n_u16(quo, 8));
    // r = n - q*3
    uint16x8_t rem = vsubq_u16(tmp,vmulq_n_u16(quo,3));
    // return q + (6*r >> 4)
    tmp = vaddq_u16(quo, vshrq_n_u16(vmulq_n_u16(rem,6),4));
    in  = vmovn_u16(tmp);
    return in;

Note: the major inefficiency here is that our vector of 8*8 bit unsigned int values are lengthened to the give space to compute and then shortened again once we have a result.

A full list of the NEON instruction set can be found here!

Bravo Aldous Huxley

This prescient genius was well ahead of his time!

“I use logic and common sense friend. I’ve read many books, each book is not 100% correct. Numbers and statistics lie all the time. Greed, and the fear of losing ones power are what drive the elites to do what they do. Suppressing our knowledge and technology, keeping certain parts of the world in turmoil.”