|         | COSC4201<br>Multiprocessors                                               |
|---------|---------------------------------------------------------------------------|
|         | Prof. Mokhtar Aboelaze                                                    |
|         | Parts of these slides are taken from Notes by Prof. David Patterson (UCB) |
| Fall 07 | CSE4201                                                                   |





| Processor                        | Micro architecture                                                | Fetch /<br>Issue /<br>Execute | FU             | Clock<br>Rate<br>(GHz) | Transi<br>s-tors<br>Die<br>size           | Powei         |
|----------------------------------|-------------------------------------------------------------------|-------------------------------|----------------|------------------------|-------------------------------------------|---------------|
| Intel<br>Pentium 4<br>Extreme    | Speculative<br>dynamically<br>scheduled; deeply<br>pipelined; SMT | 3/3/4                         | 7 int.<br>1 FP | 3.8                    | 125 M<br>122<br>mm <sup>2</sup>           | 115 W         |
| AMD<br>Athlon 64<br>FX-57        | Speculative<br>dynamically<br>scheduled                           | 3/3/4                         | 6 int.<br>3 FP | 2.8                    | 114 M<br>115<br>mm <sup>2</sup>           | 104 W         |
| IBM<br>Power5<br>(1 CPU<br>only) | Speculative<br>dynamically<br>scheduled; SMT;<br>2 CPU cores/chip | 8/4/8                         | 6 int.<br>2 FP | 1.9                    | 200 M<br>300<br>mm <sup>2</sup><br>(est.) | 80W<br>(est.) |
| Intel<br>Itanium 2               | Statically scheduled<br>VLIW-style                                | 6/5/11                        | 9 int.<br>2 FP | 1.6                    | 592 M<br>423<br>mm <sup>2</sup>           | 130 W         |































| Application | Scaling of computation | Scaling of communicatio n  | Scaliong of<br>Com/Com      |
|-------------|------------------------|----------------------------|-----------------------------|
| FFT         | (nlogn)/p              | n/p                        | log n                       |
| LU          | n/p                    | $\sqrt{n}$ $\sqrt{p}$      | $\sqrt{n}$ $\sqrt{p}$       |
| Barnes      | (nlogn)/p              | $\sqrt{n} \log n \sqrt{p}$ | $\approx \sqrt{n} \sqrt{p}$ |
| Ocean       | n/p                    | $\sqrt{n}$ $\sqrt{p}$      | $\sqrt{n}$ $\sqrt{p}$       |























| ° Normal<br>snoopir    | cache tags can be used for                                          |
|------------------------|---------------------------------------------------------------------|
| ° Valid bi             | t per block makes invalidation easy                                 |
| ° Read m               | isses easy since rely on snooping                                   |
| ° Writes =<br>other co | ⇒ Need to know if know whether any<br>opies of the block are cached |
| • No ot<br>bus fo      | her copies $\Rightarrow$ No need to place write on<br>or WB         |
| • Other                | $copies \Rightarrow$ Need to place invalidate on bus                |











