Minimal Operand CPU ISA For IPC

Dumping some thoughts on CPU design, one way to design for higher IPC (ILP) with something similar to a dual-stack forth machine...

This started from the thought that it might be possible to apply almost zero-operand ISA design to a CPU designed for IPC. Packing either 3 or 4 operations into a single 32-bit instruction word. This form of instruction compression has all sorts of advantages, both in reduction of bandwidth and as a side effect reducing the wires and ports required to implement. Started thinking about a fictional implementation of a 32-bit integer only computer, something which, in theory thanks to a minimal ISA, could be JIT compiled on a conventional CPU via just a lookup table (would be very fast emulation). However in this post I'm deviating from the idea of emulation and throwing in ideas which would be better for dedicated hardware (like using per slot data stacks instead of just registers).


Each instruction is a fixed 32-bits in size.
Each instruction holds 4 slots {0,1,2,3}.
Each slot has an 8-bit opcode.

  11111111111111110000000000000000
  FEDCBA9876543210FEDCBA9876543210
  MSB                          LSB
  ........................ssssssss 8-bit opcode for slot 0
  ................tttttttt........ 8-bit opcode for slot 1
  ........uuuuuuuu................ 8-bit opcode for slot 2
  vvvvvvvv........................ 8-bit opcode for slot 3

Slots share the following 32-bit registers,
  return stack pointer : R
  instruction  pointer : I

Each slot has its own 32-bit registers: slot {0,1,2,3},
  top     registers : {S,T,U,V}
  2nd     registers : {K,L,M,N}
  address registers : {W,X,Y,Z}

The "top" is the top of a very short per-slot data stack (which loops around on overflow, and has no memory backing). The "2nd" registers are just the 2nd item on the data stack. This split data and return stack shares a lot of similarities in structure to Color Forth style hardware. The reads of top and address registers of other slots, would see the value from the prior clock. So think of all the slots executing in parallel.

Dedicated address registers provide some interesting properties. Becomes possible to limit the opcode space and operations applied to them. It also becomes possible to auto prefetch cache lines into L0 (something much closer than L1) when the address registers are set. So using half the opcode space for 32-bit load and stores with a literal range large enough for access to a complete cache line. The # below represents how many opcode slots are taken up. NOTE, addressing is by 32-bit word, not by byte! The load pushes the fetched value on the slot's data stack (so 2nd ends up being the old top). The store pops the value off the slot's data stack (so top ends up being the prior 2nd, etc). Grabbing 32-bit literals can be done with one opcode "#", taking the next instruction as data, and advancing the CP. There is a secondary load path which uses a top register as an address. The . is a place-holder for some opcode binary data I didn't feel like flushing out.


ENCODING  ASM   #   MEANING
--------  ----  --  -------
00ssiiii  si@   64  top(%)=[adr(s)+i]
01ssiiii  si!   64  [adr(s)+i]=top(%)
1.....ss  s@    4   [top(s)+i]=top(%)
1.......  #     1   top(%)=next instruction

  % = current slot
  s = 2-bit slot index
  i = 4-bit unsigned immediate

Integer operations can take advantage of slot implied destination and first source, using another slot's "top" as 2nd source or using the current slot's "2nd". Using "2nd" consumes the value from the slot's local data stack. Using "top" from another slot does not consume the value. These support cross slot source reads without taking much opcode space. I left out shifts and other operations, just providing a few examples below,


ENCODING  ASM   #   MEANING
--------  ----  --  -------
1.....ss  s     4   top(%)=src(s)
1.....ss  s+    4   top(%)=top(%)+src(s)
1.....ss  s*    4   top(%)=top(%)*src(s)
1.....ss  s&    4   top(%)=top(%)&src(s)
1.....ss  s|    4   top(%)=top(%)|src(s)
1.....ss  s^    4   top(%)=top(%)^src(s)
1.......  -     1   top(%)=-top(%)
1.......  ~     1   top(%)=~top(%)

  % = current slot
  s = 2-bit slot index
  src(s) := if(s==%) 2nd(%) else top(s)

Address register operations are separate from standard ALU ops. Supporting ability to write to another slot's address register unlocks working on address registers in any slot. Setting an address register pops the top off the slot's local data stack. Fetching an address register pushes the value on the slot's local data stack.



ENCODING  ASM   #   MEANING
--------  ----  --  -------
1.....ss  sP    4   top(%)=adr(s)
1.....ss  sP=   4   adr(s)=top(%)
1.....ss  sP+=  4   adr(s)=adr(s)+top(%)

  % = current slot
  s = 2-bit slot index
  src(s) := if(s==%) 2nd(%) else top(s)

Branching can act as a terminator of the 4 opcode packed 32-bit instruction word. So slots after a branch become a nop, and the space is reused for a literal for the branch itself. So 24-bit, 16-bit, or 8-bit, or 0-bit displacement.

Minimal Operand CPU ISA For IPC

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112