|ARM Technical Support Knowledge Articles|
Applies to: RealView Development Suite (RVDS)
This FAQ introduces how to do reciprocal estimate with functions using NEON instructions
VRECPEQ in C/C++, and demonstrates the estimation accuracy with examples and a chart.
VRECPE instructions produce the initial estimate of the reciprocal of a number, which is used in the Newton-Raphson iteration for calculating the reciprocal of the number. The number
a for the calculation should be in a range of
0.5<= a <1.0. The result
x of the reciprocal estimate is in between
1.0<= x <2.0. The theory detail can be found in section “Reciprocal estimate and step” in “ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition”.
NEON provides both floating point and fixed-point unsigned integer type reciprocal estimate instructions. The fixed-point unsigned integer type reciprocal estimate instructions provide higher accuracy calculation than the floating point ones.
The NEON reciprocal estimate functions and their corresponding assembler NEON instructions are:
float32x2_t vrecpe_f32(float32x2_t a); // VRECPE.F32 d0,d0
uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE.U32 d0,d0
float32x4_t vrecpeq_f32(float32x4_t a); // VRECPE.F32 q0,q0
uint32x4_t vrecpeq_u32(uint32x4_t a); // VRECPE.U32 q0,q0
How to understand unsigned integer
uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE .U32 d0, d0
The argument of this function is a fixed-point unsigned integer data type with a scaling factor of 2^-32. For example,
a = 0x80000000 implies that the value of a is
0.5 (1/2), and
a = 0xc0000000 implies that the value of
0.75 (0.75 = 1/2 + 1/4), and
a = 0xe0000000 implies that the value of
0.875 (0.875 = 1/2 + 1/4 + 1/8). Therefore with this fixed-point unsigned integer type, the range of
0.5 ~ 1.0 should be expressed in a value of
0x80000000 ~ 0xFFFFFFFF.
The output data type of the function is fixed-point unsigned integer with a scaling factor of
2^-31. By calculating the number
0.5 ~ 1.0, the result will be within
1.0 ~ 2.0 (0x80000000 ~ 0xFFFFFFFF), which means the highest bit is always set. For example, result
1.25 in real value.
The example below is to compare the floating point reciprocal initial estimate function
vrecpeq_f32() with the normal reciprocal division. More example code can be found in neon_vrecpe_test.c.
float32x2_t a; // <stdint.h>
x = vrecpe_f32( a ); // <arm_neon.h>
difference = 1/a – x;
Within a input range of
0.01 difference in each step, the error range can be seen in the following chart, which is within
-0.004 ~ 0.004:
Using the Newton-Raphson iteration will make the reciprocal estimate result even closer to the true value. In example code in neon_vrecpe_test.c, after 2 or 3 times of iterations the result is already the same as the true value.
The Newton-Raphson iteration:
X[n+1] = X[n](2 – d*X[n])
It is much faster to calculate a non-high-accuracy reciprocal with NEON
VRECPE instructions than using a normal division instruction, especially to calculate reciprocals for a large data set.
Article last edited on: 2010-04-07 10:28:09
Did you find this article helpful? Yes No
How can we improve this article?