One of my personal goals has been to revive the simple style of software engineering practiced before this age of the software complexity explosion. To maintain the spirit of building software for the machines before the late 1980's, but apply that on modern hardware. Another way to look at this process is as an exercise in factoring out complexity from software while maintaining the ability to still easily author content with as much complexity as desired. The critical point being that complexity lives in the content, not in the software. This post outlines some ideas which I have leveraged over the years which might be helpfull to others following a similar path.
This journey starts at the top of the software toolchain. Start by removing the ABI, the linker, the compiler, the assembler, and even the concept of source code as a text file which requires parsing. Start with the baseline, simple languages which work interactively in a few KB of ram. Chuck Moore's Color Forth is one such modern example of that: a language which lives in a pre-parsed binary form paired with a simple editor and block based storage.
Next add back only the complexity required to interface with the host OS and APIs required to do graphics/IO. One such realization of this can be a single file which contains {ELF header, interpreter, source and block data}. The file carves out physical storage for blocks of data. The binary effectively "boots" an alternative operating system which leverages the host OS's APIs instead of the hardware to run on the machine.
In order to keep everything self contained in one file, that file must both be executing and writable, which is not allowed on many operating systems. The fix for this on Linux for example, is to have the program first copy the binary using sendfile then {fork, setsid, execve the copy, and exit the non-copy}. I like to use '_' at the end of the copy filename so that it is easy to check the end of argv[0] string to see if the binary is the copy or not at runtime. The binary can now read/write blocks inside the original executable file. Adding more physical storage to the system is as simple as writing extra blocks at the end of the file, and adjusting one 64-bit word in the ELF program header. There are some interesting advantages of this kind of setup. Configuration settings can just be stored back into the executable. Install of the program on any machine is done by a simple copy of one file.
The alternative "OS" in this case would include a source interpreter (easily under 1KB), which would interpret the source which would build the rest of the machine code for the system at run-time. These systems are so small and fast that full "JIT" is not perceptually noticeable. Likewise the editor programming cycle is fully interactive, bits of program can be built from inside the program (effectively zero latency edit/try cycle). Part of this would include enough to assemble direct machine code to interface via the ABI, dynamic loading, and system calls to get access to the rest of the machine. One advantage of such a system for VR is that the program editor can be useable inside the virtual world (no cycle of removing the VR unit to get back to coding, then placing it back on to test again).
This is a great example of placing complexity into content. The overhead for the "compiled software", the {ELF header, the binary copy workaround, and source interpreter} can be under 2K in size. The "content" in this case is the rest of the system, which is in source ("data"), which is fully editable at run-time.
Some other useful posts,
Minimal x86-64 Elf Header For Dynamic Loading
Minimal Single Symbol ELF For Using GL
Example of Leveraging Brute Force For Vast Simplification
The Practical Assembly post outlines the style of interpreted language I prefer to use. However that post described the language in a text based source form. A C implementation of that interpreter including the built-in opcodes has a complexity of roughly 15K including a gcc bloated ELF header.
Even in a pre-parsed binary form similar to Color Forth, the interpreter would still require searching at run-time for a symbol to address mapping, as each symbol is interpreted. Hashing with linear probing is one example of how to implement the search easily. But why do that? In practice there are not that many symbols in simple systems. Color Forth, if I remember right, skips hashing and uses a simple linear search (more brute force, easier implementation). But why do that?
Given the performance of modern machines, why not just "link" during edit time?
The result is that the source has the address of the symbol data, no execute-time lookups. However each time source is changed (insert, delete, move, or copy), the entire source gets "relinked" (symbol address fixups). The cost for 1MB of source for example, is the time to stream through just 1MB of source. Given modern machine bandwidth, it is trivial.
In this example, source is simply 32-bit words of data where the top 4 bits of each word provide the meaning of the source word (tags) and the lower 28 bits usually provide an address to a 64-bit symbol data. Here is an example tag encoding (note "stack" means an extra data stack, not the RSP or "call" stack),
0... - TEXT - 4 characters of ASCII text (ignored by the interpreter).
1000 - CALL - Fetch 64-bit value from 28-bit address and call it.
1001 - PUSH - Fetch 64-bit value from 28-bit address and push it on the stack.
1010 - ADDR - Push the 28-bit address on the stack (address of symbol).
1011 - NUMB - Push the 28-bit address on the stack (hex literal in source).
1100 - POP_ - Pop 64-bit value from stack, store in 28-bit address.
Unlike Forth, there is no dictionary, or rather the dictionary is directly inside the source code. Symbol names are placed directly in the source as two TEXT 32-bit words which (8 characters total, the pair is 64-bit aligned). Symbol data is stored in a constant offset from that pair of TEXT source words. Effectively each source block has a data block at a constant offset. The binary file would contain {source blocks, and then data blocks in the zero fill on load}. The editor can find the string name for a symbol by taking the 28-bit address and subtracting the size of the source. As source moves around due to editing, the source words with {CALL, PSH, ADDR, POP_} tags get the 28-bit address adjusted for symbols which move.
This journey starts at the top of the software toolchain. Start by removing the ABI, the linker, the compiler, the assembler, and even the concept of source code as a text file which requires parsing. Start with the baseline, simple languages which work interactively in a few KB of ram. Chuck Moore's Color Forth is one such modern example of that: a language which lives in a pre-parsed binary form paired with a simple editor and block based storage.
Next add back only the complexity required to interface with the host OS and APIs required to do graphics/IO. One such realization of this can be a single file which contains {ELF header, interpreter, source and block data}. The file carves out physical storage for blocks of data. The binary effectively "boots" an alternative operating system which leverages the host OS's APIs instead of the hardware to run on the machine.
In order to keep everything self contained in one file, that file must both be executing and writable, which is not allowed on many operating systems. The fix for this on Linux for example, is to have the program first copy the binary using sendfile then {fork, setsid, execve the copy, and exit the non-copy}. I like to use '_' at the end of the copy filename so that it is easy to check the end of argv[0] string to see if the binary is the copy or not at runtime. The binary can now read/write blocks inside the original executable file. Adding more physical storage to the system is as simple as writing extra blocks at the end of the file, and adjusting one 64-bit word in the ELF program header. There are some interesting advantages of this kind of setup. Configuration settings can just be stored back into the executable. Install of the program on any machine is done by a simple copy of one file.
The alternative "OS" in this case would include a source interpreter (easily under 1KB), which would interpret the source which would build the rest of the machine code for the system at run-time. These systems are so small and fast that full "JIT" is not perceptually noticeable. Likewise the editor programming cycle is fully interactive, bits of program can be built from inside the program (effectively zero latency edit/try cycle). Part of this would include enough to assemble direct machine code to interface via the ABI, dynamic loading, and system calls to get access to the rest of the machine. One advantage of such a system for VR is that the program editor can be useable inside the virtual world (no cycle of removing the VR unit to get back to coding, then placing it back on to test again).
This is a great example of placing complexity into content. The overhead for the "compiled software", the {ELF header, the binary copy workaround, and source interpreter} can be under 2K in size. The "content" in this case is the rest of the system, which is in source ("data"), which is fully editable at run-time.
Some other useful posts,
Minimal x86-64 Elf Header For Dynamic Loading
Minimal Single Symbol ELF For Using GL
Example of Leveraging Brute Force For Vast Simplification
The Practical Assembly post outlines the style of interpreted language I prefer to use. However that post described the language in a text based source form. A C implementation of that interpreter including the built-in opcodes has a complexity of roughly 15K including a gcc bloated ELF header.
Even in a pre-parsed binary form similar to Color Forth, the interpreter would still require searching at run-time for a symbol to address mapping, as each symbol is interpreted. Hashing with linear probing is one example of how to implement the search easily. But why do that? In practice there are not that many symbols in simple systems. Color Forth, if I remember right, skips hashing and uses a simple linear search (more brute force, easier implementation). But why do that?
Given the performance of modern machines, why not just "link" during edit time?
The result is that the source has the address of the symbol data, no execute-time lookups. However each time source is changed (insert, delete, move, or copy), the entire source gets "relinked" (symbol address fixups). The cost for 1MB of source for example, is the time to stream through just 1MB of source. Given modern machine bandwidth, it is trivial.
In this example, source is simply 32-bit words of data where the top 4 bits of each word provide the meaning of the source word (tags) and the lower 28 bits usually provide an address to a 64-bit symbol data. Here is an example tag encoding (note "stack" means an extra data stack, not the RSP or "call" stack),
0... - TEXT - 4 characters of ASCII text (ignored by the interpreter).
1000 - CALL - Fetch 64-bit value from 28-bit address and call it.
1001 - PUSH - Fetch 64-bit value from 28-bit address and push it on the stack.
1010 - ADDR - Push the 28-bit address on the stack (address of symbol).
1011 - NUMB - Push the 28-bit address on the stack (hex literal in source).
1100 - POP_ - Pop 64-bit value from stack, store in 28-bit address.
Unlike Forth, there is no dictionary, or rather the dictionary is directly inside the source code. Symbol names are placed directly in the source as two TEXT 32-bit words which (8 characters total, the pair is 64-bit aligned). Symbol data is stored in a constant offset from that pair of TEXT source words. Effectively each source block has a data block at a constant offset. The binary file would contain {source blocks, and then data blocks in the zero fill on load}. The editor can find the string name for a symbol by taking the 28-bit address and subtracting the size of the source. As source moves around due to editing, the source words with {CALL, PSH, ADDR, POP_} tags get the 28-bit address adjusted for symbols which move.