Tag Archives: SIMD

ARM NEON C++ Cheat Sheet

Newer ARM processors have their own flavor of SIMD instructions called NEON. In my little Android application Arashi, NEON is used a lot to speed up the simulation of particles.

Here is a table explaining some of the NEON functions that are used:

C++ NEON functions
NEON Explanation Pseudocode
vdupq_n_f32(a) New NEON value a
vsubq_f32(a, b) Subtract a – b
vaddq_f32(a, b) Add a + b
vmulq_f32(a, b) Multiply a * b
vmlaq_f32(a, b, c) Multiply and add a + (b * c)
vmlsq_f32(a, b, c) Multiply and subtract a – (b * c)
vrsqrteq_f32(a) Reciprocal square root 1 / sqrt(a)
vcgtq_f32(a, b) Compare greater than a > b ? 1 : 0
vcltq_f32(a, b) Compare less than a < b ? 1 : 0
vbslq_f32(mask, a, b) Select by mask mask != 0 ? a : b
vminq_f32(a, b) Get minimum a < b ? a : b
vmaxq_f32(a, b) Get maximum a > b ? a : b