5.6.4. Half-precision floating-point number support

Half-precision floating-point numbers are provided as an optional extension to the VFPv3 architecture. If the VFPv3 coprocessor is not available, or if a VFPv3 coprocessor is used that does not have this extension, they are supported through the floating-point library fplib.

Half-precision floating-point numbers can only be used when selected with the fp16_format command-line option. See --fp16_format=format in the Compiler Reference Guide.

The half-precision floating-point formats available are ieee and alternative. In both formats, the basic layout of the 16-bit number is the same. See Figure 5.1.

Figure 5.1. Half-precision floating-point format

To view this graphic, your browser must support the SVG format. Either install a browser with native support, or install an appropriate plugin such as Adobe SVG Viewer.


Where:

   S (bit[15]):      Sign bit
   E (bits[14:10]):  Biased exponent
   T (bits[9:0]):    Mantissa.

The meanings of these fields depend on the format that is selected.

IEEE half-precision

IF E==31:
   IF T==0: Value = Signed infinity
   IF T!=0: Value = Nan
             T[9] determines Quiet or Signalling:
                  0: Quiet NaN
                  1: Signalling NaN
IF 0<E<31:
   Value = (-1)Sx2(E-15)x(1+2-10T)

IF E==0:
   IF T==0: Value = Signed zero
   IF T!=0: Value = (-1)Sx2(-14)x(0+2-10T)

Alternative half-precision

IF 0<E<32:
   Value = (-1)Sx2(E-15)x(1+2-10T)

IF E==0:
   IF T==0: Value = Signed zero
   IF T!=0: Value = (-1)Sx2(-14)x(0+2-10T)

Usage restrictions

The following restrictions apply when you use the __fp16 type:

  • When used in a C or C++ expression, an __fp16 type is promoted to single precision. Subsequent promotion to double precision can occur if required by one of the operands.

  • A single precision value can be converted to __fp16. A double precision value is converted to single precision and then to __fp16, that could involve double rounding. This reflects the lack of direct double-to-16-bit conversion in the ARM architecture.

  • When using fpmode=fast, no floating-point exceptions are raised when converting to and from half-precision floating-point format.

  • Function formal arguments cannot be of type __fp16. However, pointers to variables of type __fp16 can be used as function formal argument types.

  • __fp16 values can be passed as actual function arguments. In this case, they are converted to single-precision values.

  • __fp16 cannot be specified as the return type of a function. However, a pointer to an __fp16 type can be used as a return type.

  • An __fp16 value is converted to a single-precision or double-precision value when used as a return value for a function that returns a float or double.

Name mangling

The C++ name mangling for the half-precision data type is specified in the C++ generic ABI. See the C++ ABI for the ARM Architecture.

Copyright © 2002-2010 ARM. All rights reserved.ARM DUI 0205J
Non-ConfidentialID101213