Quantcast
Channel: Timothy Lottes
Viewing all articles
Browse latest Browse all 434

Thoughts on the Evolution of Processor Design

$
0
0
Feels like the fundamental limiter in the evolution of processor design is the {load,alu,store} design paradigm: the separation of memory and ALU at all scales. A CPU is effectively like having billions of people each with a mailbox to store data, each routing the data to just one single person (out of billions) with a calculator doing computation. As CPUs have evolved, there has only been a tiny increase in the number of people with calculators. Then the GPU enters the timeline, providing a substantial increase in the number of people with calculators, but this increase is still relatively tiny with respect to the number of people routing data to and from the mailboxes. I'm wondering if perhaps all people should just have calculators. Looking at some numbers,

Chip Capacity in Flop per Clock per Transistor
Using numbers from Wikipedia for Fury X,

8601 Gflop/s
8900 Mtransistors
1050 MHz

Capacity for upwards of 8000 flops each clock, but with around 1 million transistors per flop.

Science Fiction Version of the Suggested Paradigm Shift
A completely science fiction version of the suggested paradigm shift might be a chip with 256 MB of memory, divided into 32 million 64-bit cells, with each cell doing one 64-bit uber-operation every 64 clocks (bit per clock), clocked at a relatively low clock rate like 500 Mhz: providing something like 250,000,000,000,000 uber-ops per second. The board composed of 3D stacks of these chips connected by TSVs, stacks connected by interposers (like HBM). Board might have something like 16 stacks, providing 4,000,000,000,000,000 uber-ops per second. The local parallel just-in-time compile step configures cells to work around bad cells, yield problems go away. The mindset used to program the machine is quite different. Data constantly flows around chip as it is filtered by computation. The organization of data constantly changing to adapt to locality of reference. Programs reconfigure parts of the chip at run-time to solve problems. Reconfigure of a cell is basically adjusting which neighborhood connections are inputs to the uber-op, and the properties of the uber-op.


Viewing all articles
Browse latest Browse all 434

Trending Articles