### 6.5.1. Basic data types

ARM floating-point values are stored in one of two data types, single precision and double precision. In this document these are called float and double. These are the corresponding C types.

#### Single precision

A float value is 32 bits wide. The structure is shown in Figure 6.3.

Figure 6.3. IEEE 754 single-precision floating-point format

The `S` field gives the sign of the number. It is 0 for positive, or 1 for negative.

The `Exp` field gives the exponent of the number, as a power of two. It is biased by `0x7F` (127), so that very small numbers have exponents near zero and very large numbers have exponents near `0xFF` (255).

So, for example:

• if `Exp` = `0x7D` (125), the number is between 0.25 and 0.5 (not including 0.5)

• if `Exp` = `0x7E` (126), the number is between 0.5 and 1.0 (not including 1.0)

• if `Exp` = `0x7F` (127), the number is between 1.0 and 2.0 (not including 2.0)

• if `Exp` = `0x80` (128), the number is between 2.0 and 4.0 (not including 4.0)

• if `Exp` = `0x81` (129), the number is between 4.0 and 8.0 (not including 8.0).

The `Frac` field gives the fractional part of the number. It usually has an implicit 1 bit on the front that is not stored to save space.

So if `Exp` is `0x7F`, for example:

• if `Frac` = `00000000000000000000000` (binary), the number is 1.0

• if `Frac` = `10000000000000000000000` (binary), the number is 1.5

• if `Frac` = `01000000000000000000000` (binary), the number is 1.25

• if `Frac` = `11000000000000000000000` (binary), the number is 1.75.

So in general, the numeric value of a bit pattern in this format is given by the formula:

(–1)S * 2Exp(–0x7F) * (1 + Frac * 2–23)

Numbers stored in this form are called normalized numbers.

The maximum and minimum exponent values, 0 and 255, are special cases. Exponent 255 is used to represent infinity, and store Not a Number (NaN) values. Infinity can occur as a result of dividing by zero, or as a result of computing a value that is too large to store in this format. NaN values are used for special purposes. Infinity is stored by setting Exp to 255 and Frac to all zeros. If Exp is 255 and Frac is nonzero, the bit pattern represents a NaN.

Exponent 0 is used to represent very small numbers in a special way. If `Exp` is zero, then the `Frac` field has no implicit 1 on the front. This means that the format can store 0.0, by setting both `Exp` and `Frac` to all 0 bits. It also means that numbers that are too small to store using `Exp >= 1` are stored with less precision than the ordinary 23 bits. These are called denormals.

#### Double precision

A double value is 64 bits wide. Figure 6.4 shows its structure.

Figure 6.4. IEEE 754 double-precision floating-point format

As before, `S` is the sign, `Exp` the exponent, and `Frac` the fraction. Most of the discussion of float values remains true, except that:

• The `Exp` field is biased by `0x3FF` (1023) instead of `0x7F`, so numbers between 1.0 and 2.0 have an `Exp` field of `0x3FF`.

• The `Exp` value used to represent infinity and NaNs is `0x7FF` (2047) instead of `0xFF`.

#### Sample values

Some sample float and double bit patterns, together with their mathematical values, are given in Table 6.14 and Table 6.15.

Table 6.14. Sample single-precision floating-point values

Float valueSExpFracMathematical valueNotes
`0x3F800000``0``0x7F``000...000``1.0`-
`0xBF800000``1``0x7F``000...000``-1.0`-
`0x3F800001``0``0x7F``000...001``1.000 000 119`[1]
`0x3F400000``0``0x7E``100...000``0.75`-
`0x00800000``0``0x01``000...000``1.18*10-38`[2]
`0x00000001``0``0x00``000...001``1.40*10-45`[3]
`0x7F7FFFFF``0``0xFE``111...111``3.40*1038`[4]
`0x7F800000``0``0xFF``000...000`Plus infinity-
`0xFF800000``1``0xFF``000...000`Minus infinity-
`0x00000000``0``0x00``000...000``0.0`[5]
`0x7F800001``0``0xFF``000...001`Signaling NaN[6]
`0x7FC00000``0``0xFF``100...000`Quiet NaNf

[1] The smallest representable number that can be seen to be greater than 1.0. The amount that it differs from 1.0 is known as the machine epsilon. This is 0.000 000 119 in float, and 0.000 000 000 000 000 222 in double. The machine epsilon gives a rough idea of the number of significant figures the format can keep track of. float can do six or seven places. double can do fifteen or sixteen.

[2] The smallest value that can be represented as a normalized number in each format. Numbers smaller than this can be stored as denormals, but are not held with as much precision.

[3] The smallest positive number that can be distinguished from zero. This is the absolute lower limit of the format.

[4] The largest finite number that can be stored. Attempting to increase this number by addition or multiplication causes overflow and generates infinity (in general).

[5] Zero. Strictly speaking, they show plus zero. Zero with a sign bit of 1, minus zero, is treated differently by some operations, although the comparison operations (for example `==` and `!=`) report that the two types of zero are equal.

[6] There are two types of NaNs, signaling NaNs and quiet NaNs. Quiet NaNs have a 1 in the first bit of Frac, and signaling NaNs have a zero there. The difference is that signaling NaNs cause an exception (see Exceptions) when used, whereas quiet NaNs do not.

Table 6.15. Sample double-precision floating-point values

Double valueSExpFracMathematical valueNotes
`0x3FF00000 00000000``0``0x3FF``000...000``1.0`-
`0xBFF00000 00000000``1``0x3FF``000...000``-1.0`-
`0x3FF00000 00000001``0``0x3FF``000...001``1.000 000 000 000 000 222`[1]
`0x3FE80000 00000000``0``0x3FE``100...000``0.75`-
`0x00100000 00000000``0``0x001``000...000``2.23*10-308`b
`0x00000000 00000001``0``0x000``000...001``4.94*10-324`c
`0x7FEFFFFF FFFFFFFF``0``0x7FE``111...111``1.80*10308`d
`0x7FF00000 00000000``0``0x7FF``000...000`Plus infinity-
`0xFFF00000 00000000``1``0x7FF``000...000`Minus infinity-
`0x00000000 00000000``0``0x000``000...000``0.0`e
`0x7FF00000 00000001``0``0x7FF``000...001`Signaling NaNf
`0x7FF80000 00000000``0``0x7FF``100...000`Quiet NaNf

[1] to f. For footnotes, see Table 6.14.