









| Specula                                             | ation                                                                                                               |
|-----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| its result.                                         | ulo's algorithm, once an instruction writes<br>any subsequently issued instructions will<br>It in the register file |
| ° With spec<br>until the i                          | culation, the register file is not updated<br>Instruction commits                                                   |
| • (we kno                                           | ow definitively that the instruction should execute)                                                                |
| ° Thus, the<br>between<br>instructic                | ROB supplies operands in interval<br>completion of instruction execution and<br>on commit                           |
| <ul> <li>ROB is<br/>reserva<br/>algorith</li> </ul> | a source of operands for instructions, just as ation stations (RS) provide operands in Tomasulo's nm                |
| • ROB ex                                            | tends architecture registers like RS                                                                                |
| ° ROB holo<br>associate<br>commit                   | Is the results between the operation<br>ed with the instruction completes, and                                      |
| Fall 08                                             | CSE4201                                                                                                             |







| Exa     | mple |            |  |
|---------|------|------------|--|
| Loop    | LD   | F0,10(R2)  |  |
|         | ADDD | F10,F4,F0  |  |
|         | DIVD | F2,F10,F6  |  |
|         | DADD | R1,R1,-8   |  |
|         | BNE  | R1,R2,Loop |  |
|         |      |            |  |
| Fall 08 |      | CSE4201    |  |











| Source instruction         |               | Instruction using res | ult Latency          |
|----------------------------|---------------|-----------------------|----------------------|
| FP ALU                     | OP            | FP ALU OP             | 3                    |
| FP ALU                     | OP            | Store double          | 2                    |
| Load double<br>Load Double |               | FP ALU OP             | 1                    |
|                            |               | Store double          | 0                    |
| Loop:                      | L.D           | F0,0(R1)              |                      |
|                            | ADD.D         |                       | For (I=1000;I>0;I++) |
|                            | S.D<br>DADDUI | 0(R1),F4<br>R1,R1,#-8 | x[I]=x[I]+s;         |
|                            | BNE R         | 1,R2,Loop             |                      |
|                            |               |                       |                      |
| Fall 08                    |               | CSE4201               |                      |

| ° Assu<br>opera       |                       | v can sche<br>FP operati |              |                        |     |
|-----------------------|-----------------------|--------------------------|--------------|------------------------|-----|
| Memory<br>reference 1 | Memory<br>reference 2 | FP<br>2 operation 1      | FP<br>op. 2  | Int. op/ Cle<br>branch | ock |
| LD F0,0(R1)           | LD F6,-8(R1)          |                          |              |                        | 1   |
| LD F10,-16(R1)        | LD F14, 24(R1)        |                          |              |                        | 2   |
| LD F18,-32(R1)        | LD F22,-40(R1)        | ADDD F4,F0,F2            | ADDD F8,F6,F | 23                     |     |
| LD F26,-48(R1)        |                       | ADDD F12,F10,F2          | ADDD F16,F1  | 4,F2                   | 4   |
|                       |                       | ADDD F20,F18,F2          | ADDD F24,F2  | 2,F2                   | 5   |
| SD 0(R1),F4           | SD -8(R1),F8          | ADDD F28,F26,F2          |              |                        | 6   |
| SD -16(R1),F12        | SD -24(R1),F16        |                          |              | DADD R1,R1,#-56        | 7   |
| SD 24(R1),F20         | SD 16(R1),F24         |                          |              |                        | 8   |
| SD 8(R1),F28          |                       | 7 iterations             | in 9         | BNEZ R1,LOOP           | 9   |
|                       |                       |                          |              |                        |     |





| lteration<br>number | Instruct | tions      | lssues at<br>clock cycle<br>number | Executes at<br>clock cycle<br>number | access at<br>clock cycle<br>number | Write CDB at<br>clock cycle<br>number | Comment         |
|---------------------|----------|------------|------------------------------------|--------------------------------------|------------------------------------|---------------------------------------|-----------------|
| 1                   | LD       | R2,0(R1)   | 1                                  | 2                                    | 3                                  | 4                                     | First issue     |
| - 1                 | DADDIU   | R2,R2,#1   | 1                                  | 5 🔶                                  |                                    | 6                                     | Wait for LW     |
| 1                   | SD       | R2,0(R1)   | 2                                  | 3                                    | 7                                  |                                       | Wait for DADDI  |
| 1                   | DADDIU   | R1,R1,#4   | 2                                  | 3                                    |                                    | 4                                     | Execute directl |
| 1                   | BNE      | R2,R3,L00P | 3                                  | 7                                    |                                    |                                       | Wait for DADDI  |
| 2                   | LD       | R2,0(R1)   | 4                                  | 8                                    | 9                                  | 10                                    | Wait for BNE    |
| 2                   | DADDIU   | R2,R2,#1   | 4                                  | 11 🗸                                 |                                    | 12                                    | Wait for LW     |
| 2                   | SD       | R2,0(R1)   | 5                                  | 9                                    | 13                                 |                                       | Wait for DADDI  |
| 2                   | DADDIU   | R1,R1,#4   | 5                                  | 8                                    |                                    | 9                                     | Wait for BNE    |
| 2                   | BNE      | R2,R3,L00P | 6                                  | 13                                   |                                    |                                       | Wait for DADDI  |
| 3                   | LD       | R2,0(R1)   | 7                                  | 14                                   | 15                                 | 16                                    | Wait for BNE    |
| 3                   | DADDIU   | R2,R2,#1   | 7                                  | 17 🔸                                 |                                    | 18                                    | Wait for LW     |
| 3                   | SD       | R2,0(R1)   | 8                                  | 15                                   | 19                                 |                                       | Wait for DADDI  |
| 3                   | DADDIU   | R1,R1,#4   | 8                                  | 14                                   |                                    | 15                                    | Wait for BNE    |
| 3                   | BNZ      | R2,R3,L00P | 9                                  | 19                                   |                                    |                                       | Wait for DADDI  |

CSE4201

Fall 08

| Iteration<br>number | Instruct   | tions                                                | lssues<br>at clock<br>number | Executes<br>at clock<br>number | Read access<br>at clock<br>number | Write<br>CDB at<br>clock<br>number | Commits<br>at clock<br>number | Comment            |
|---------------------|------------|------------------------------------------------------|------------------------------|--------------------------------|-----------------------------------|------------------------------------|-------------------------------|--------------------|
| 1                   | LD         | R2,0(R1)                                             | 1                            | 2                              | 3                                 | 4                                  | 5                             | First issue        |
| 1                   | DADDIU     | R2,R2,#1                                             | 1                            | 5                              |                                   | 6                                  | 7                             | Wait for LW        |
| 1                   | SD         | R2,0(R1)                                             | 2                            | 3                              |                                   |                                    | 7                             | Wait for DADDIU    |
| 1                   | DADDIU     | R1,R1,#4                                             | 2                            | 3                              |                                   | 4                                  | 8                             | Commit in order    |
| 1                   | BNE        | R2,R3,L00P                                           | 3                            | 7                              |                                   |                                    | 8                             | Wait for DADDIU    |
| 2                   | LD         | R2,0(R1)                                             | 4                            | 5                              | 6                                 | 7                                  | 9                             | No execute delay   |
| 2                   | DADDIU     | R2,R2,#1                                             | 4                            | 8                              |                                   | 9                                  | 10                            | Wait for LW        |
| 2                   | SD         | R2,0(R1)                                             | 5                            | 6                              |                                   |                                    | 10                            | Wait for DADDIU    |
| 2                   | DADDIU     | R1,R1,#4                                             | 5                            | 6                              |                                   | 7                                  | 11                            | Commit in order    |
| 2                   | BNE        | R2,R3,L00P                                           | 6                            | 10                             |                                   |                                    | 11                            | Wait for DADDIU    |
| 3                   | LD         | R2,0(R1)                                             | 7                            | 8                              | 9                                 | 10                                 | 12                            | Earliest possible  |
| 3                   | DADDIU     | R2,R2,#1                                             | 7                            | 11                             |                                   | 12                                 | 13                            | Wait for LW        |
| 3                   | SD         | R2,0(R1)                                             | 8                            | 9                              |                                   |                                    | 13                            | Wait for DADDIU    |
| 3                   | DADDIU     | R1,R1,#4                                             | 8                            | 9                              |                                   | 10                                 | 14                            | Executes earlier   |
| 3                   | BNE        | R2,R3,L00P                                           | 9                            | 13                             |                                   |                                    | 14                            | Wait for DADDIU    |
| tion. Note          | e that the | ime of issue, ex<br>L.D following t<br>I Single Issu | he BNE can s                 |                                |                                   |                                    |                               | eline with specula |

## Loop Level Parallelism LLP <sup>°</sup> Loop-Level Parallelism (LLP) analysis focuses on whether data accesses in later iterations of a loop are data dependent on data values produced in earlier iterations and possibly making loop iterations independent. o for (i=1; i<=1000; i++) e.g. in x[i] = x[i] + s; the computation in each iteration is independent of the previous iterations and the loop is thus parallel. The use of X[i] twice is within a single iteration. $\Rightarrow$ Thus loop iterations are <u>parallel</u> (or independent from each other). Fall 08 CSE4201



























| Loop:                                                                   | L.D<br>ADD.D<br>S.D<br>DADDUI<br>BNE                         | F4,0(R1)                                                                                                                                         |                               |                                     |                                                                                       |                                                        |
|-------------------------------------------------------------------------|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|-------------------------------------|---------------------------------------------------------------------------------------|--------------------------------------------------------|
| 1 L<br>2 AI<br>3 S<br>4 L<br>5 AI<br>6 S<br>7 L<br>8 AI<br>9 S<br>10 D2 | .D F<br>DD.D F<br>.D F<br>.D F<br>.D F<br>.D F<br>.D F<br>.D | lled 3 times<br>0,0(R1)<br>4,F0,F2<br>4,0(R1)<br>0,-8(R1)<br>4,F0,F2<br>4,-8(R1)<br>0,-16(R1)<br>4,F0,F2<br>4,-16(R1)<br>R1,R1,#-24<br>1,R2,LOOP | Afte<br>1<br>2<br>3<br>4<br>5 | L.D<br>ADD.D<br>L.D<br>S.D<br>ADD.D | F0,-8(R1)<br>F4,0(R1)<br>F4,F0,F2<br>F0,-16(R1<br>R1,R1,#-8<br>R1,R2,LOO<br>F4, 0(R1) | ;Stores M[i]<br>;Adds to M[i-1]<br>);Loads M[i-2]<br>P |

