FORTH Dictionary and Word Definitions Structure
published: 26 April 2021 / updated 28 April 2021
To understand how the FORTH dictionary works, it is important to understand first the structure of the FORTH words.
Definition fields
There is a set of established terminology about describing the way that the forth system registers words and executions:
- NAME-field, NFA
A forth word does have a name usually which can be used in source text to refer to the other parts associated with that forth name. The starting adress of the NAME-field is called the NFA, the Name-Field Adress. - CODE-field, CFA
CODE-area, Execution. A forth word does have an execution behavior usually which is coded in the native instruction set of the local CPU. This is the primary code and the starting adress is usally called CFA, the Code-Field Adress. - PFA
BODY, Primitives - LFA
LINK, Threads
A classic FORTH word structure implementation:
The NFA (Name Field Address) field varies in size. It begins with a byte whose bit of most significant is 1. The following bits are flags indicating the nature of the word (IMMEDIATE, COMPILE-ONLY ...). Least significant bits indicate the number of characters of the compiled word. Example: "DUP" (3), "WORDS" (5) ... The word length on 4 bits limits the number of characters in a 15 character word.
On some implementations, one byte is reserved for flags and the next for header length. This solution makes it possible to have very long words.
The LFA (Link Field Address) field is 16 or 32 bits in size depending on the implementation. It points to the NFA of the word that precedes it in the dictionary:
The CFA (Code Field Address) field is 16 or 32 bits in size depending on the implementation. This field points to the executable code of the word.
The PFA (Parameters Field Address) field is of variable size. This field contains compiled FORTH data or code.
Access the fields of a definition
We will see in practice on a FlashForth version for ARDUINO Nano how to access the different fields of a FORTH word:
' words \ push cfa of "words" on stack
Here, ' words
drops the address of the CFA field onto the data stack
of the word words
. We can see this address simply by unstacking this address:
' words u. 62196 ok<#,ram>
The address 62196 is the CFA address of our word WORDS
. This address can be processed by
execute
:
62196 execute p2+ pc@ @p hi d. ud. d> d< d= d0< d0= dinvert d2* d2/ d- d+ dabs ?dnegate dnegat e s>d rdrop endit next for in, inline repeat while again until begin then else i f zfl pfl xa> >xa x>r dump .s words >pr .id ms ticks r0 s0 latest state bl 2- [' ] -@ ; :noname : ] [ does> postpone create cr [char] ihere ( char ' lit abort" ? abort ?abort? abort prompt quit true false .st inlined immediate shb interpret ' .........
The word c>n
converts a CFA address to NFA:
' words c>n ok<#,ram> 62190 dup ok<#,ram> 62190 62190 c@ ok<#,ram> 62190 133 $0f and ok<#,ram> 62190 5 swap 1+ swap type \ display "words"
The word n> c
converts an NFA address to CFA.
Our FlashForth version on ARDUINO Nano does not have word to convert an NFA or CFA address in an LFA address. After code analysis FlashForth AVR assembler, it turns out that you just need to subtract 2 from the address NFA to point to the LFA address of our word:
' words ok<#,ram> 62196 c>n ok<#,ram> 62190 2- ok<#,ram> 62188 @ ok<#,ram> 62084 dup ok<#,ram> 62084 62084 @ ok<#,ram> 62084 16003 $0f and ok<#,ram> 62084 3 swap 1+ swap type >pr ok<#,ram>
Here, in the last line, we went back to the word >pr
which
is the word attached to words
in the FORTH dictionary:
The word >body
converts a CFA address to PFA.
FlashForth word structure
Our few essays tend to show a FORTH word structure for FlashForth which comes close to this:
We will detail this structure by analyzing our word words
from the dump of its code:
hex ' words c>n 2- 20 dump f2ec :84 f2 85 77 6f 72 64 73 77 df e2 db 66 de 5c de ...wordsw...f.\. f2fc :e8 f6 02 d0 84 df 99 dd 9e de 0c 94 6a 39 ee f2 ............j9..
Highlighting the NFA of words
:
f2ec :84 f2 85 77 6f 72 64 73 77 df e2 db 66 de 5c de ...wordsw...f.\.
The first byte, 85 gives the length of the compiled word, here 5.
Where it starts to get interesting is the field analysis that follows our NFA:
f2ec :84 f2 85 77 6f 72 64 73 77 df e2 db 66 de 5c de ...wordsw...f.\.
Here is the disassembly of this piece of code:
Mmmmm.... There is no CFA!
In fact, we attack directly in the compiled FORTH code. This succession of instructions
rcall is explained when we analyze the source code of the word words
as defined in the assembly code of the ff-atmega.asm file:
; WORDS -- filter fdw TO_PRINTABLE_L WORDS_L: .db NFA|5,"words" rcall BL rcall WORD rcall DUP rcall DOLIT fdw kernellink rcall WDS1 rcall LATEST_ rcall FETCH_A WDS1: rcall CR jmp LIKES
FlashForth does not directly compile the CFAs of compiled words. In place,
we find short or long relative calls to the CFAs of these words.
This technique eliminates the need for a Forth engine
. Every relative call runs
occupies the same memory size as a CFA. The speed of execution is strongly
accelerated, compared to that of a version of the FORTH language using indirect chaining.