Representing Floating-Point Numbers in Binary

"There are two ways of constructing a software design; one way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult." -- C. A. R. Hoare

Background

All floating-point numbers consist of three parts:
  1. A sign - Indicates whether the number is positive (above zero) or negative (below zero)
  2. An exponent - This can be positive (absolute value of the number is above 1) or negative (absolute value of the number is between 0 and 1).
  3. A mantissa - This is the precision of the number (also called the significand in our usage). Usually this is the fractional portion of the number after it has been normalized in scientific notation.
Some examples in decimal (base 10):
NumberSignMantissaExponent
3.763 x 103+3.7633
1.2345 x 10-11+1.2345-11
-4.45 x 105-4.455
-2.6795 x 10-7-2.6795-7

Representing the value 12,345 in decimal:

NumberSignMantissaExponent
12345 x 100+123450
1234.5 x 101+1234.51
123.45 x 102+123.452
12.345 x 103+12.3453
1.2345 x 104+1.23454
.12345 x 105+.123455
.012345 x 106+.0123456
Notes:


IEEE 754

The IEEE (Institute of Electrical and Electronics Engineers) sets the standard for floating point arithmetic. This standard specifies how single precision (32 bit) and double precision (64 bit) floating point numbers are represented and how arithmetic should be carried out on them.

32-bit single-precision

Parts:

Notes: Special numbers (some have many different binary representations):
Zero
0 00000000 00000000000000000000000 = +0 (sign is 0, exponent is 0, mantissa is 0)
1 00000000 00000000000000000000000 = -0 (sign is 1, exponent is 0, mantissa is 0)
0 00000000 00100000000000000000000 = Dirty zero (sign is 0 or 1, exponent is 0, mantissa is non-zero)

INF - Infinity
0 11111111 00000000000000000000000 = +Infinity (sign is 0, exponent is 255, mantissa is 0)
1 11111111 00000000000000000000000 = -Infinity (sign is 1, exponent is 255, mantissa is 0)

NAN - Not a Number
0 11111111 10000100000000000000000 = Quiet NaN (sign is 0 or 1, exponent is 255, mantissa is non-zero, MSB of mantisaa is 1)
0 11111111 00100010001001010101010 = Signaling NaN (sign is 0 or 1, exponent is 255, mantissa is non-zero, MSB of mantissa is 0)
64-bit double-precision

Summary:

Constructing Numbers

Given the decimal number 3045.125:
     3045.125 = (3*1000) + (0*100) + (4*10) + (5*1) + (1/10) + (2/100) + (5/1000)

              = (3*103) + (0*102) + (4*101) + (5*100) + (1*10-1) + (2*10-2) + (5*10-3) 

              =   300   +   0     +   40    +   5     +   .1     +   .02    +  .005
Given the binary number 10110111.1011:
10110111.1011 = (1*128) + (0*64) + (1*32) + (1*16) + (0*8) + (1*4) + (1*2) + (1*1) + (1/2) + (0/4) + (1/8) + (1/16)

10110111      = (1*27) + (0*26) + (1*25) + (1*24) + (0*23) + (1*22) + (1*21) + (1*20) 

              =  128   +   0    +   32   +   16   +   0   +    4    +   2    +   1
       
        .1011 = (1*2-1) + (0*2-2) + (1*2-3) + (1*2-4)

              =    .5   +    0    +  .125   +  .0625    
                
10110111.10112 = 183.687510
Binary vs. decimal fraction
             Decimal     Decimal
  Binary     fraction    value
-----------------------------------
  .1           1/2       .5      
  .01          1/4       .25     
  .001         1/8       .125     
  .0001        1/16      .0625       
  .00001       1/32      .03125
  .000001      1/64      .015625
  
  etc...
Examples of binary and decimal equivalents:
   Binary         Decimal (fraction)     Decimal
-------------------------------------------------
    1.1                1 1/2             1.5
    1.101              1 5/8             1.625
  101.001              5 1/8             5.125
 1001.0101             9 5/16            9.3125
 0011.10101            3 21/32           3.65625
Examples of normalizing binary numbers and their associated exponents:
   Binary       Normalized     Exponent (decimal)   Exponent (IEEE 754 binary)
-----------------------------------------------------------------------------
.11011           1.1011              -1                01111110 (12610)
1100.101         1.100101             3                10000010 (13010)
1010.1           1.0101               3                10000010 (13010)
100110           1.00110              5                10000100 (13210)
.00010101        1.0101              -4                01111011 (12310)
1.001            1.001                0                01111111 (12710)
Points
Reference values:
  Bit                    Decimal              Decimal
Position   Exponent      Fraction             Number
------------------------------------------------------------------
    1        1/21              1/2     0.5000000000000000000000000
    2        1/22              1/4     0.2500000000000000000000000
    3        1/23              1/8     0.1250000000000000000000000
    4        1/24             1/16     0.0625000000000000000000000
    5        1/25             1/32     0.0312500000000000000000000
    6        1/26             1/64     0.0156250000000000000000000
    7        1/27            1/128     0.0078125000000000000000000
    8        1/28            1/256     0.0039062500000000000000000
    9        1/29            1/512     0.0019531250000000000000000
   10        1/210          1/1024     0.0009765625000000000000000
   11        1/211          1/2048     0.0004882812500000000000000
   12        1/212          1/4096     0.0002441406250000000000000
   13        1/213          1/8192     0.0001220703125000000000000
   14        1/214         1/16384     0.0000610351562500000000000
   15        1/215         1/32768     0.0000305175781250000000000
   16        1/216         1/65536     0.0000152587890625000000000
   17        1/217        1/131072     0.0000076293945312500000000
   18        1/218        1/262144     0.0000038146972656250000000
   19        1/219        1/524288     0.0000019073486328125000000
   20        1/220       1/1048576     0.0000009536743164062500000
   21        1/221       1/2097152     0.0000004768371582031250000
   22        1/222       1/4194304     0.0000002384185791015625000
   23        1/223       1/8388608     0.0000001192092895507812500

The output from a program to demonstrate this shows the imperfection clearly:
Calculating bits for: 0.2

 1   -----------------------------
 2   -----------------------------
 3   subtract this: 0.125000000000
         new value: 0.075000002980
 4   subtract this: 0.062500000000
         new value: 0.012500002980
 5   -----------------------------
 6   -----------------------------
 7   subtract this: 0.007812500000
         new value: 0.004687502980
 8   subtract this: 0.003906250000
         new value: 0.000781252980
 9   -----------------------------
10   -----------------------------
11   subtract this: 0.000488281250
         new value: 0.000292971730
12   subtract this: 0.000244140625
         new value: 0.000048831105
13   -----------------------------
14   -----------------------------
15   subtract this: 0.000030517578
         new value: 0.000018313527
16   subtract this: 0.000015258789
         new value: 0.000003054738
17   -----------------------------
18   -----------------------------
19   subtract this: 0.000001907349
         new value: 0.000001147389
20   subtract this: 0.000000953674
         new value: 0.000000193715
21   -----------------------------
22   -----------------------------
23   subtract this: 0.000000119209
         new value: 0.000000074506

binary: 0.00110011001100110011001
Binary/Decimal converter (BinConverter.exe)

More examples

Additional Resources: