ARM Technical Support Knowledge Articles

How do I use VRECPE / VRECPEQ for reciprocal estimate?

Applies to: RealView Development Suite (RVDS)

Answer

This FAQ introduces how to do reciprocal estimate with functions using NEON instructions VRECPE / VRECPEQ in C/C++, and demonstrates the estimation accuracy with examples and a chart.

  1. The VRECPE instructions produce the initial estimate of the reciprocal of a number, which is used in the Newton-Raphson iteration for calculating the reciprocal of the number. The number a for the calculation should be in a range of 0.5<= a <1.0. The result x of the reciprocal estimate is in between 1.0<= x <2.0. The theory detail can be found in section “Reciprocal estimate and step” in “ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition”.

    NEON provides both floating point and fixed-point unsigned integer type reciprocal estimate instructions. The fixed-point unsigned integer type reciprocal estimate instructions provide higher accuracy calculation than the floating point ones.

    The NEON reciprocal estimate functions and their corresponding assembler NEON instructions are:

          float32x2_t vrecpe_f32(float32x2_t a); // VRECPE.F32 d0,d0
          uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE.U32 d0,d0
          float32x4_t vrecpeq_f32(float32x4_t a); // VRECPE.F32 q0,q0
          uint32x4_t vrecpeq_u32(uint32x4_t a); // VRECPE.U32 q0,q0

  2. How to understand unsigned integer VRECPE instruction:

           uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE .U32 d0, d0

    The argument of this function is a fixed-point unsigned integer data type with a scaling factor of 2^-32. For example, a = 0x80000000 implies that the value of a is 0.5 (1/2), and a = 0xc0000000 implies that the value of a is 0.75 (0.75 = 1/2 + 1/4), and a = 0xe0000000 implies that the value of a is 0.875 (0.875 = 1/2 + 1/4 + 1/8). Therefore with this fixed-point unsigned integer type, the range of 0.5 ~ 1.0 should be expressed in a value of 0x80000000 ~ 0xFFFFFFFF.

    The output data type of the function is fixed-point unsigned integer with a scaling factor of 2^-31. By calculating the number 0.5 ~ 1.0, the result will be within 1.0 ~ 2.0 (0x80000000 ~ 0xFFFFFFFF), which means the highest bit is always set. For example, result 0xc0000000 is 1.25 in real value.

  3. The example below is to compare the floating point reciprocal initial estimate function vrecpeq_f32() with the normal reciprocal division. More example code can be found in neon_vrecpe_test.c.

              float32x2_t a; // <stdint.h>
              float32x2_t x;
              x = vrecpe_f32( a ); // <arm_neon.h>
              difference = 1/a – x;

    Within a input range of 0.5 to 1.0 and 0.01 difference in each step, the error range can be seen in the following chart, which is within -0.004 ~ 0.004:


  4. Using the Newton-Raphson iteration will make the reciprocal estimate result even closer to the true value. In example code in neon_vrecpe_test.c, after 2 or 3 times of iterations the result is already the same as the true value.

    The Newton-Raphson iteration:

               X[n+1] = X[n](2 – d*X[n])

  5. It is much faster to calculate a non-high-accuracy reciprocal with NEON VRECPE instructions than using a normal division instruction, especially to calculate reciprocals for a large data set.


References:

  1. 5.10.9 VRECPE and VRSQRTE, NEON and VFP Programming
  2. Reciprocal Estimate
  3. Reciprocal estimate and step, ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition (DDI0406B)

Attachments: neon_vrecpe_test.c , difference.JPG

Article last edited on: 2010-04-07 10:28:09

Rate this article

[Bad]
|
|
[Good]
Disagree? Move your mouse over the bar and click

Did you find this article helpful? Yes No

How can we improve this article?

Link to this article
Copyright © 2011 ARM Limited. All rights reserved. External (Open), Non-Confidential