| |||

Home > Floating-point support > Single precision data type for IEEE 754 arithmetic |

A float value is 32 bits wide. The structure is shown in Figure 1.

The `S`

field gives the sign of the number.
It is 0 for positive, or 1 for negative.

The `Exp`

field gives the exponent of the
number, as a power of two. It is *biased* by `0x7F`

(127),
so that very small numbers have exponents near zero and very large
numbers have exponents near `0xFF`

(255).

So, for example:

if

=`Exp`

`0x7D`

(125), the number is between 0.25 and 0.5 (not including 0.5)if

=`Exp`

`0x7E`

(126), the number is between 0.5 and 1.0 (not including 1.0)if

=`Exp`

`0x7F`

(127), the number is between 1.0 and 2.0 (not including 2.0)if

=`Exp`

`0x80`

(128), the number is between 2.0 and 4.0 (not including 4.0)if

=`Exp`

`0x81`

(129), the number is between 4.0 and 8.0 (not including 8.0).

The `Frac`

field gives the fractional part
of the number. It usually has an implicit 1 bit on the front that
is not stored to save space.

So if * Exp* is

`0x7F`

,
for example:if

=`Frac`

`00000000000000000000000`

(binary), the number is 1.0if

=`Frac`

`10000000000000000000000`

(binary), the number is 1.5if

=`Frac`

`01000000000000000000000`

(binary), the number is 1.25if

=`Frac`

`11000000000000000000000`

(binary), the number is 1.75.

So in general, the numeric value of a bit pattern in this format is given by the formula:

(-1)^{S }* 2^{(Exp-0x7F)} *
(1 + *Frac ** 2^{-23})

Numbers stored in this form are called *normalized* numbers.

The maximum and minimum exponent values, 0 and 255, are special
cases. Exponent 255 is used to represent infinity, and store *Not
a Number *(NaN) values. Infinity can occur as a result of
dividing by zero, or as a result of computing a value that is too
large to store in this format. NaN values are used for special purposes.
Infinity is stored by setting Exp to 255 and Frac to all zeros.
If Exp is 255 and Frac is nonzero, the bit pattern represents a
NaN.

Exponent 0 is used to represent very small numbers in a special
way. If * Exp* is zero, then the

`Frac`

field
has no implicit 1 on the front. This means that the format can store
0.0, by setting both `Exp`

and `Frac`

to
all 0 bits. It also means that numbers that are too small to store
using `Exp >= 1`

are stored with less precision
than the ordinary 23 bits. These are called - Concepts
- Reference
- Other information
*IEEE Standard for Floating-Point Arithmetic*(IEEE 754), 1985 version