Floating Point

Register Sizes#

The following image shows registers for fixed point, floating point, and double precission. This follows the IEEE 754 FP specification.

Conversion#

The following shows an example of how to convert between binary and float

0.2=\frac{3}{2^4}=\texttt{0011}

0.2=\frac{13}{2^6}=\texttt{001101}

In the first example, there are four fractional bits, which gives you a value of 0.1875. In the second example, six fractional bits are used, which gives a value of 0.203125. As you can see, adding only a few fractional bits gives a much higher accuracy.

An example of binary to decimal/float conversion is shown below:

Binary	Decimal	Conversion
`11.0`	`3.0`	$1\cdot2^1+1\cdot2^0$
`10.0`	`2.0`	$1\cdot2^1+0\cdot2^0$
`1.0`	`1.0`	$1\cdot2^0$
`0.1`	`0.5`	$1\cdot2^{-1}$
`0.11`	`0.75`	$1*2^{-1}+1\cdot2^{-2}$

Adding Floating Point Instructions to Pipeline#

Floating point operations require different execution units from integer operations. This results in onger latency and requires more cycles.

For current CPUs:

	Intel	AMD
`add`	3	3
`multiply`	5	3
`divide`	16	13

Latency and Initiation Interval#

Latency Definition

How long an operation takes (number of cycles between producing and consuming instructions)
$n_{\textrm{pipeline stages}}=1+\textrm{latency}$
The following example has a latency of 3:

fpadd f1,__,__
; some instruction
; some instruction
; some instruction
fpadd f1,__,__

Initiation Interval Definition

How long to wait before sending some type of instruction

An initiation interval of 1 means you can send instructions every clock cycle

For the following examples, assume:

	Cycle Latency	Initiation Interval
`add`	3	1
`multiply`	6	1
`divide`	24	25

What the pipeline now looks like:

Example 1#

How many clock cycles does it take to run this code?

L.D   F2, 0 (R2)
ADD.D F6, F2, F1
S.D   F6, 0 (R2)

	1	2	3	4	5	6	7	8	9	10	11
`L.D`	`IF`	`ID`	`EX`	`MEM`	`WB`
`ADD.D`		`IF`	`ID`	`ID`	`A1`	`A2`	`A3`	`A4`	`MEM`	`WB`
`S.D`			`IF`	`IF`	`ID`	`EX`	`EX`	`EX`	`EX`	`MEM`	`WB`

Some things to note:

At cycle four for ADD.D, there is an extra ID. This is because the add cannot happen until the load reaches and completes the MEM stage
At cycle nine, S.D cannot enter the MEM stage because ADD.D is using it, therefore it is stuck in EX for an extra cycle

Example 2#

How many clock cycles does it take to run this code? Assume that all operations are independent

MUL.D __,__,__
ADD.D __,__,__
L.D   __,__,__
s.D   __,__,__

	1	2	3	4	5	6	7	8	9	10	11
`MUL.D`	`IF`	`ID`	`M1`	`M2`	`M3`	`M4`	`M5`	`M6`	`M7`	`MEM`	`WB`
`ADD.D`		`IF`	`ID`	`A1`	`A2`	`A3`	`A4`	`MEM`	`WB`
`L.D`			`IF`	`ID`	`EX`	`MEM`	`WB`
`S.D`				`IF`	`ID`	`EX`	`MEM`	`WB`

Problems with this pipeline#

Instructions could write registers in an order that is different from program order (this is shown in the above example)
Structural Hazards
1. Instructions may want to use register file at the same time
2. If you have many divides

You need to check for hazards. Like before, hazard checking is done in the ID stage, but now need to check for some new ones. You'll know what the instruction is, and how many clock cycles it is going to take. Then, you can figure out how many cycles you need to hold the insturction to determine if the order is correct.

During the decode stage, you check for:

structural hazards
data hazards

A few more terms:

Forwarding/Bypassing Definition

Taking a value from a pipeline stage and sending it to another stage

ADD.D F6, F2, F1
ADD.D F8, F6, F3