Newer ARM processors have their own flavor of SIMD instructions called NEON. In my little Android application Arashi, NEON is used a lot to speed up the simulation of particles.

Here is a table explaining some of the NEON functions that are used:

NEON | Explanation | Pseudocode |
---|---|---|

vdupq_n_f32(a) | New NEON value | a |

vsubq_f32(a, b) | Subtract | a - b |

vaddq_f32(a, b) | Add | a + b |

vmulq_f32(a, b) | Multiply | a * b |

vmlaq_f32(a, b, c) | Multiply and add | a + (b * c) |

vmlsq_f32(a, b, c) | Multiply and subtract | a - (b * c) |

vrsqrteq_f32(a) | Reciprocal square root | 1 / sqrt(a) |

vcgtq_f32(a, b) | Compare greater than | a > b ? 1 : 0 |

vcltq_f32(a, b) | Compare less than | a < b ? 1 : 0 |

vbslq_f32(mask, a, b) | Select by mask | mask != 0 ? a : b |

vminq_f32(a, b) | Get minimum | a < b ? a : b |

vmaxq_f32(a, b) | Get maximum | a > b ? a : b |