Threading to hide pipeline depth combined with an ISA which makes branching cheep is one goal. Specifically absolute branching with immediate destination in the opcode word (single word branch/call, no adder), and instructions which include a return flag (make returns free). Enables easy computed branches, both for jump tables, and loops. Can factor out loop check into hierarchical call tree,
Do4: Unroll work four times; Return;
Do16: Call Do4; Call Do4; Call Do4; Jump Do4;
Do64: Call Do16; Call Do16; Call Do16; Jump Do16;
... etc ...
Can use a computed branch to jump into the tree for other loop counts.
Do4: Unroll work four times; Return;
Do16: Call Do4; Call Do4; Call Do4; Jump Do4;
Do64: Call Do16; Call Do16; Call Do16; Jump Do16;
... etc ...
Can use a computed branch to jump into the tree for other loop counts.