Forth Hardware Thoughts

James Bowman's FPGA based J1 : Site | PDF | Presentation | Forth Source
Chuck Moore : Arithmetic | Instruction Set | Ether Forth | Problem Oriented Language

GA144
GreenArrays
144 cores
9216 18-bit words of memory
21.3 mm^2 area on 180 nm process
0.65 watts at peak
666 MHz peak instruction rate

At 180 nm, roughly 20 GA144s would fit in large GPU area: 144 cores * 20x = 2880 cores
At 180 nm, roughly 380 GA144s would fit in large GPU 250 watt budget: 144 cores * 380x = 54,720 cores
At 28 nm, assuming 40x smaller area than 180 nm, in large GPU die: 144 cores * 20x * 40x = 115,200 cores
115,200 cores * 64 words/core = 7,372,800 18-bit words of memory

GA144 runs async, but has a peak instruction rate which is roughly 3x higher than GPUs of the 180 nm era (based on wikipedia numbers). The point of this thought experiment was to roughly imagine how a forth based machine would scale in an alternative timeline where they had been commercially successful. Seems possible to scale to over 100 K cores on 28 nm. These forth cores don't directly compare to GPU cores. For example, GA144 38-bit multiply result takes 18 +* operations: 115,200/18 = 6400 multiplies/clock, and forth designed around rational math instead of floating point. Seems possible that in terms of raw arithmetic, the forth machine would be competitive, if problems were solved in a "parallel forth" way. However, in terms of programmable logic, the forth machine would likely be over an order of magnitude faster. Modern machines tend to use area and pipelining to make expensive operations (like multiply add) run fast, while GA144 effectively micro-codes them, keeping low area and much higher throughput for inexpensive operations.

The imaginary scaled GA144 memory capacity looks possible for a high ALU/MEM ratio. Note GA144 only has 64 words of memory per core. Working this from a different perspective, the Epiphany V is 64 MB of on-chip memory. That 64 MB divided across 256 K forth sized cores is again only 256 bytes of memory (or 64 32-bit words/core). Point being, if one wanted to scale to massive counts of simple cores, memory/core has to be tiny.

Which brings up the ultimate question: is it possible to practically leverage the order of magnitude increase in performance for simple operations, when one needs to deconstruct every problem into such small tasks?

Forth Hardware Thoughts

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112