5.6.4. Half-precision floating-point number support

Half-precision floating-point numbers are provided as an optional extension to the VFPv3 architecture. If the VFPv3 coprocessor is not available, or if a VFPv3 coprocessor is used that does not have this extension, they are supported through the floating-point library fplib.

Half-precision floating-point numbers can only be used when selected with the fp16_format command-line option. See --fp16_format=format in the Compiler Reference Guide.

The half-precision floating-point formats available are ieee and alternative. In both formats, the basic layout of the 16-bit number is the same. See Figure 5.1.

Figure 5.1. Half-precision floating-point format

Where:

S (bit):      Sign bit
E (bits[14:10]):  Biased exponent
T (bits[9:0]):    Mantissa.

The meanings of these fields depend on the format that is selected.

IEEE half-precision

IF E==31:
IF T==0: Value = Signed infinity
IF T!=0: Value = Nan
T determines Quiet or Signalling:
0: Quiet NaN
1: Signalling NaN
IF 0<E<31:
Value = (-1)Sx2(E-15)x(1+2-10T)

IF E==0:
IF T==0: Value = Signed zero
IF T!=0: Value = (-1)Sx2(-14)x(0+2-10T)

Alternative half-precision

IF 0<E<32:
Value = (-1)Sx2(E-15)x(1+2-10T)

IF E==0:
IF T==0: Value = Signed zero
IF T!=0: Value = (-1)Sx2(-14)x(0+2-10T)

Usage restrictions

The following restrictions apply when you use the __fp16 type:

• When used in a C or C++ expression, an __fp16 type is promoted to single precision. Subsequent promotion to double precision can occur if required by one of the operands.

• A single precision value can be converted to __fp16. A double precision value is converted to single precision and then to __fp16, that could involve double rounding. This reflects the lack of direct double-to-16-bit conversion in the ARM architecture.

• When using fpmode=fast, no floating-point exceptions are raised when converting to and from half-precision floating-point format.

• Function formal arguments cannot be of type __fp16. However, pointers to variables of type __fp16 can be used as function formal argument types.

• __fp16 values can be passed as actual function arguments. In this case, they are converted to single-precision values.

• __fp16 cannot be specified as the return type of a function. However, a pointer to an __fp16 type can be used as a return type.

• An __fp16 value is converted to a single-precision or double-precision value when used as a return value for a function that returns a float or double.

Name mangling

The C++ name mangling for the half-precision data type is specified in the C++ generic ABI. See the C++ ABI for the ARM Architecture.

 Copyright © 2002-2010 ARM. All rights reserved. ARM DUI 0205J Non-Confidential ID101213 PDF version