ARM Technical Support Knowledge Articles | |
Applies to: RealView Development Suite (RVDS)
This FAQ introduces how to do reciprocal estimate with functions using NEON instructions VRECPE
/ VRECPEQ
in C/C++, and demonstrates the estimation accuracy with examples and a chart.
The VRECPE
instructions produce the initial estimate of the reciprocal of a number, which is used in the Newton-Raphson iteration for calculating the reciprocal of the number. The number a
for the calculation should be in a range of 0.5<= a <1.0
. The result x
of the reciprocal estimate is in between 1.0<= x <2.0
. The theory detail can be found in section “Reciprocal estimate and step” in “ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition”.
NEON provides both floating point and fixed-point unsigned integer type reciprocal estimate instructions. The fixed-point unsigned integer type reciprocal estimate instructions provide higher accuracy calculation than the floating point ones.
The NEON reciprocal estimate functions and their corresponding assembler NEON instructions are:
float32x2_t vrecpe_f32(float32x2_t a); // VRECPE.F32 d0,d0
uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE.U32 d0,d0
float32x4_t vrecpeq_f32(float32x4_t a); // VRECPE.F32 q0,q0
uint32x4_t vrecpeq_u32(uint32x4_t a); // VRECPE.U32 q0,q0
How to understand unsigned integer VRECPE
instruction:
uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE .U32 d0, d0
The argument of this function is a fixed-point unsigned integer data type with a scaling factor of 2^-32. For example, a = 0x80000000
implies that the value of a is 0.5 (1/2)
, and a = 0xc0000000
implies that the value of a
is 0.75 (0.75 = 1/2 + 1/4)
, and a = 0xe0000000
implies that the value of a
is 0.875 (0.875 = 1/2 + 1/4 + 1/8)
. Therefore with this fixed-point unsigned integer type, the range of 0.5 ~ 1.0
should be expressed in a value of 0x80000000 ~ 0xFFFFFFFF
.
The output data type of the function is fixed-point unsigned integer with a scaling factor of 2^-31
. By calculating the number 0.5 ~ 1.0
, the result will be within 1.0 ~ 2.0 (0x80000000 ~ 0xFFFFFFFF)
, which means the highest bit is always set. For example, result 0xc0000000
is 1.25
in real value.
The example below is to compare the floating point reciprocal initial estimate function vrecpeq_f32()
with the normal reciprocal division. More example code can be found in neon_vrecpe_test.c.
float32x2_t a; // <stdint.h>
float32x2_t x;
x = vrecpe_f32( a ); // <arm_neon.h>
difference = 1/a – x;
Within a input range of 0.5
to 1.0
and 0.01
difference in each step, the error range can be seen in the following chart, which is within -0.004 ~ 0.004
:
Using the Newton-Raphson iteration will make the reciprocal estimate result even closer to the true value. In example code in neon_vrecpe_test.c, after 2 or 3 times of iterations the result is already the same as the true value.
The Newton-Raphson iteration:
X[n+1] = X[n](2 – d*X[n])
It is much faster to calculate a non-high-accuracy reciprocal with NEON VRECPE
instructions than using a normal division instruction, especially to calculate reciprocals for a large data set.
References:
Attachments: neon_vrecpe_test.c , difference.JPG
Article last edited on: 2010-04-07 10:28:09
Did you find this article helpful? Yes No
How can we improve this article?