How to divide by 3 quickly using the NEON instruction set


Ok… so as many of you know, Neon is ARM’s library that has optimised low level system calls specialised for the A9 and A7 architectures. Unlike Ne10, it can be really fast… In many cases it can also be really painful to code in! Division isn’t provided as a part of the Neon instruction set, and it would be incredibly inefficient for an SSE instruction to widen an 8 bit unsigned integer up to a 32 bit float, solely to multiply by the reciprocal (multiplicative inverse). Fear not! There are ways to divide with neon, whilst it being relatively efficient. One such method is known as unsigned integer division by constants. This can be performed by computing the reciprocal of the divisor in advance, then multiply the dividend by that series of shift right and add instructions. This gives an approximation of the quotient.

Here is a pre rolled version of my vdiv3_u8 instruction to achieve division by 3 using as much SSE neon goodness as possible!

uint8x8_t vdiv3_u8(uint8x8_t in){
    //widen in
    uint16x8_t tmp = vmovl_u8(in);

    //q = (n >> 2) + (n >> 4)   ~ q = n * 0.0101 (approx.)
    uint16x8_t quo = vshrq_n_u16(tmp, 2);
    quo = vaddq_u16(quo, vshrq_n_u16(tmp, 4));
    //q = q + (q >> 4)          ~ q = n * 0.01010101
    quo = vaddq_u16(quo, vshrq_n_u16(quo, 4));
    //q = q + (q >> 8)          ~ q = n * 0.0101010101010101
    quo = vaddq_u16(quo, vshrq_n_u16(quo, 8));
    // r = n - q*3
    uint16x8_t rem = vsubq_u16(tmp,vmulq_n_u16(quo,3));
    // return q + (6*r >> 4)
    tmp = vaddq_u16(quo, vshrq_n_u16(vmulq_n_u16(rem,6),4));
    in  = vmovn_u16(tmp);
    return in;

Note: the major inefficiency here is that our vector of 8*8 bit unsigned int values are lengthened to the give space to compute and then shortened again once we have a result.

A full list of the NEON instruction set can be found here!

Lofti’s Fuzzy Logic built into kids cartoons

It appears that the “Bob the Builder” Character “Lofty” may have been based on the father of the “Fuzzy Logic” Lofti Zadeh.


“Fuzzy logic is a form of many-valued logic or probabilistic logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two-valued logic, true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. Fuzzy logic has been extended to handle the concept of partial truth, where the truth value may range between completely true and completely false.”~ Wikipedia


Like the theory, he never has an exact answer they are always probabilistic… with “yeah… I think so?” (within a 95% confidence interval?).

But seriously, Lofti Zadeh is a pioneer within computer science and he is also credited with the development of the z-transform.