Out-of-Order Algorithm
Published:
scoreboard and tomasulo algorithm
References
Scoreboard
Tomasulo
Lecture4 CS256 1996 EECS UCB
Scoreboard
Structure
Scoreboard is a centralized control unit.
Three parts of scoreboard
- instruction status (issue -> read operand -> execute complete -> write result)
- function unit status for each function unit
9 fields — Busy, Op, Dest(register NUMBER), 2 Src(register NUMBER), 2 FU producing the src, 2 Ready flag if source register number is not ready, function unit field indicating producing the source register should be set - register result status
Execution features
For structure hazard and WAW hazard(prevent early commit), check and compare current instruction with function unit status, current instruction is stalled and not issued.
For RAW hazard, instruction at issued stage is checked, once hazard detected(check register result status and corresponding instruction status, determine the function unit status field. — more specifically, source-register numbers ready or not), it will be stalled at issued stage.
For WAR hazard(prevent early commit) detection, check and compare register result status of current instruction with function unit status(destination register field) of previous instructions, once hazard detected, the instruction will be stalled at execute complete stage.
Cons
- stall on WAW, WAR, but they are false data dependency
- limited instruction sliding window size
- out-of-order commit, confusing for programmer to debug when interrupts or exception is raised
- low function unit utilization
- No forwarding(register file bottleneck)
Tomasulo
Structure
Control and buffers distributed with function unit — reservation station
broadcast FU result to all reservation station
- 3 instruction status(issue -> execution complete -> write result)
for tomasulo, reservation station decouple read operand status(register file) from instruction status pipeline
Reservation Component
Reservation station field stores register value, that’s a huge difference from scoreboard
- 8 field(distributed) — Busy, Op, 2 Src(register VALUE), 2 RS producing the src, 2 Ready flag
- Register result status(centralized) — Indicates which functional unit will write each register, if one exists.
Execution features
Pros
- multiple entries in reservation, reduce structure hazard
- register rename(when a instruction is issued, directly reserve the value of available register in reservation station) avoid WAR, WAW(false data dependency) stall the pipeline
- forwarding value from a reservation station to another(avoid register file bottleneck)
Cons
- resource contention(different function unit content fot CDB bus when write back to reservation station)
- hardware area overhead(hardwire port)
Reorder buffer
to enhance the performance of previous 2 algorithm(when branch instruction occur), reorder buffer is come up. ROB ensures in-order commit of out-of-order executed instrucions
ROB extended modified architecture for tomasulo
- CDB bus is not directly connected to RS, to ROB instead.
- ROB stores write-back value which can be bypass to reservation station, from function unit.