ARM Technical Support Knowledge Articles

# How do I use VRECPE / VRECPEQ for reciprocal estimate?

Applies to: RealView Development Suite (RVDS)

This FAQ introduces how to do reciprocal estimate with functions using NEON instructions `VRECPE` / `VRECPEQ` in C/C++, and demonstrates the estimation accuracy with examples and a chart.

1. The `VRECPE` instructions produce the initial estimate of the reciprocal of a number, which is used in the Newton-Raphson iteration for calculating the reciprocal of the number. The number `a` for the calculation should be in a range of` 0.5<= a <1.0`. The result `x` of the reciprocal estimate is in between `1.0<= x <2.0`. The theory detail can be found in section “Reciprocal estimate and step” in “ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition”.

NEON provides both floating point and fixed-point unsigned integer type reciprocal estimate instructions. The fixed-point unsigned integer type reciprocal estimate instructions provide higher accuracy calculation than the floating point ones.

The NEON reciprocal estimate functions and their corresponding assembler NEON instructions are:

```      float32x2_t vrecpe_f32(float32x2_t a); // VRECPE.F32 d0,d0       uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE.U32 d0,d0       float32x4_t vrecpeq_f32(float32x4_t a); // VRECPE.F32 q0,q0       uint32x4_t vrecpeq_u32(uint32x4_t a); // VRECPE.U32 q0,q0 ```

2. How to understand unsigned integer `VRECPE` instruction:

`       uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE .U32 d0, d0`

The argument of this function is a fixed-point unsigned integer data type with a scaling factor of 2^-32. For example, `a = 0x80000000` implies that the value of a is `0.5 (1/2)`, and `a = 0xc0000000` implies that the value of `a` is `0.75 (0.75 = 1/2 + 1/4)`, and `a = 0xe0000000` implies that the value of `a` is `0.875 (0.875 = 1/2 + 1/4 + 1/8)`. Therefore with this fixed-point unsigned integer type, the range of `0.5 ~ 1.0` should be expressed in a value of `0x80000000 ~ 0xFFFFFFFF`.

The output data type of the function is fixed-point unsigned integer with a scaling factor of `2^-31`. By calculating the number `0.5 ~ 1.0`, the result will be within `1.0 ~ 2.0 (0x80000000 ~ 0xFFFFFFFF)`, which means the highest bit is always set. For example, result `0xc0000000 `is `1.25` in real value.

3. The example below is to compare the floating point reciprocal initial estimate function `vrecpeq_f32()` with the normal reciprocal division. More example code can be found in neon_vrecpe_test.c.

```          float32x2_t a; // <stdint.h>           float32x2_t x;           x = vrecpe_f32( a ); // <arm_neon.h>           difference = 1/a – x; ```
Within a input range of `0.5` to `1.0` and `0.01` difference in each step, the error range can be seen in the following chart, which is within `-0.004 ~ 0.004`: 4. Using the Newton-Raphson iteration will make the reciprocal estimate result even closer to the true value. In example code in neon_vrecpe_test.c, after 2 or 3 times of iterations the result is already the same as the true value.

The Newton-Raphson iteration:

`X[n+1] = X[n](2 – d*X[n])`

5. It is much faster to calculate a non-high-accuracy reciprocal with NEON `VRECPE` instructions than using a normal division instruction, especially to calculate reciprocals for a large data set.

References:

1. 5.10.9 VRECPE and VRSQRTE, NEON and VFP Programming
2. Reciprocal Estimate
3. Reciprocal estimate and step, ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition (DDI0406B)

Attachments: neon_vrecpe_test.c , difference.JPG

Article last edited on: 2010-04-07 10:28:09         