Newer ARM processors have their own flavor of SIMD instructions called NEON. In my little Android application Arashi, NEON is used a lot to speed up the simulation of particles.
Here is a table explaining some of the NEON functions that are used:
NEON | Explanation | Pseudocode |
---|---|---|
vdupq_n_f32(a) | New NEON value | a |
vsubq_f32(a, b) | Subtract | a – b |
vaddq_f32(a, b) | Add | a + b |
vmulq_f32(a, b) | Multiply | a * b |
vmlaq_f32(a, b, c) | Multiply and add | a + (b * c) |
vmlsq_f32(a, b, c) | Multiply and subtract | a – (b * c) |
vrsqrteq_f32(a) | Reciprocal square root | 1 / sqrt(a) |
vcgtq_f32(a, b) | Compare greater than | a > b ? 1 : 0 |
vcltq_f32(a, b) | Compare less than | a < b ? 1 : 0 |
vbslq_f32(mask, a, b) | Select by mask | mask != 0 ? a : b |
vminq_f32(a, b) | Get minimum | a < b ? a : b |
vmaxq_f32(a, b) | Get maximum | a > b ? a : b |