Floating Point

Register Sizes#

The following image shows registers for fixed point, floating point, and double precission. This follows the IEEE 754 FP specification.

r2(int)3102s complimentfloat f;3101823exponentfractionalsigndouble f2;63011152exponentfractionalsign

Conversion#

The following shows an example of how to convert between binary and float

0.2=324=00110.2=\frac{3}{2^4}=\texttt{0011}
0.2=1326=0011010.2=\frac{13}{2^6}=\texttt{001101}

In the first example, there are four fractional bits, which gives you a value of 0.1875. In the second example, six fractional bits are used, which gives a value of 0.203125. As you can see, adding only a few fractional bits gives a much higher accuracy.

An example of binary to decimal/float conversion is shown below:

BinaryDecimalConversion
11.03.01â‹…21+1â‹…201\cdot2^1+1\cdot2^0
10.02.01â‹…21+0â‹…201\cdot2^1+0\cdot2^0
1.01.01â‹…201\cdot2^0
0.10.51⋅2−11\cdot2^{-1}
0.110.751∗2−1+1⋅2−21*2^{-1}+1\cdot2^{-2}

Adding Floating Point Instructions to Pipeline#

Floating point operations require different execution units from integer operations. This results in onger latency and requires more cycles.

For current CPUs:

IntelAMD
add33
multiply53
divide1613

Latency and Initiation Interval#

Latency Definition

How long an operation takes (number of cycles between producing and consuming instructions)
npipeline stages=1+latencyn_{\textrm{pipeline stages}}=1+\textrm{latency}
The following example has a latency of 3:

fpadd f1,__,__
; some instruction
; some instruction
; some instruction
fpadd f1,__,__
Initiation Interval Definition

How long to wait before sending some type of instruction

An initiation interval of 1 means you can send instructions every clock cycle

For the following examples, assume:

Cycle LatencyInitiation Interval
add31
multiply61
divide2425

What the pipeline now looks like:

IFIDEXFP DIV (D{1:25})MEMWBM1M2M3M4M5M6M7A1A2A3A4

Example 1#

How many clock cycles does it take to run this code?

L.D F2, 0 (R2)
ADD.D F6, F2, F1
S.D F6, 0 (R2)
1234567891011
L.DIFIDEXMEMWB
ADD.DIFIDIDA1A2A3A4MEMWB
S.DIFIFIDEXEXEXEXMEMWB

Some things to note:

  • At cycle four for ADD.D, there is an extra ID. This is because the add cannot happen until the load reaches and completes the MEM stage
  • At cycle nine, S.D cannot enter the MEM stage because ADD.D is using it, therefore it is stuck in EX for an extra cycle

Example 2#

How many clock cycles does it take to run this code? Assume that all operations are independent

MUL.D __,__,__
ADD.D __,__,__
L.D __,__,__
s.D __,__,__
1234567891011
MUL.DIFIDM1M2M3M4M5M6M7MEMWB
ADD.DIFIDA1A2A3A4MEMWB
L.DIFIDEXMEMWB
S.DIFIDEXMEMWB

Problems with this pipeline#

  1. Instructions could write registers in an order that is different from program order (this is shown in the above example)
  2. Structural Hazards
    1. Instructions may want to use register file at the same time
    2. If you have many divides

You need to check for hazards. Like before, hazard checking is done in the ID stage, but now need to check for some new ones. You'll know what the instruction is, and how many clock cycles it is going to take. Then, you can figure out how many cycles you need to hold the insturction to determine if the order is correct.

During the decode stage, you check for:

  • structural hazards
  • data hazards

A few more terms:

Forwarding/Bypassing Definition

Taking a value from a pipeline stage and sending it to another stage

ADD.D F6, F2, F1
ADD.D F8, F6, F3
Last updated on