FORTH Dictionary and Word Definitions Structure

published: 26 April 2021 / updated 28 April 2021

To understand how the FORTH dictionary works, it is important to understand first the structure of the FORTH words.

Definition fields

There is a set of established terminology about describing the way that the forth system registers words and executions:

NAME-field, NFA
A forth word does have a name usually which can be used in source text to refer to the other parts associated with that forth name. The starting adress of the NAME-field is called the NFA, the Name-Field Adress.
CODE-field, CFA
CODE-area, Execution. A forth word does have an execution behavior usually which is coded in the native instruction set of the local CPU. This is the primary code and the starting adress is usally called CFA, the Code-Field Adress.
PFA
BODY, Primitives
LFA
LINK, Threads

A classic FORTH word structure implementation:

The NFA (Name Field Address) field varies in size. It begins with a byte whose bit of most significant is 1. The following bits are flags indicating the nature of the word (IMMEDIATE, COMPILE-ONLY ...). Least significant bits indicate the number of characters of the compiled word. Example: "DUP" (3), "WORDS" (5) ... The word length on 4 bits limits the number of characters in a 15 character word.

On some implementations, one byte is reserved for flags and the next for header length. This solution makes it possible to have very long words.

The LFA (Link Field Address) field is 16 or 32 bits in size depending on the implementation. It points to the NFA of the word that precedes it in the dictionary:

The CFA (Code Field Address) field is 16 or 32 bits in size depending on the implementation. This field points to the executable code of the word.

The PFA (Parameters Field Address) field is of variable size. This field contains compiled FORTH data or code.

Access the fields of a definition

We will see in practice on a FlashForth version for ARDUINO Nano how to access the different fields of a FORTH word:

' words     \ push cfa of "words" on stack

Here, ' words drops the address of the CFA field onto the data stack of the word words. We can see this address simply by unstacking this address:

' words u. 62196  ok<#,ram>

The address 62196 is the CFA address of our word WORDS. This address can be processed by execute:

62196 execute
p2+ pc@ @p hi d. ud. d> d< d= d0< d0= dinvert d2* d2/ d- d+ dabs ?dnegate dnegat
e s>d rdrop endit next for in, inline repeat while again until begin then else i
f zfl pfl xa> >xa x>r dump .s words >pr .id ms ticks r0 s0 latest state bl 2- ['
] -@ ; :noname : ] [ does> postpone create cr [char] ihere ( char ' lit abort" ?
abort ?abort? abort prompt quit true false .st inlined immediate shb interpret '
.........

The word c>n converts a CFA address to NFA:

' words c>n  ok<#,ram> 62190
dup  ok<#,ram> 62190 62190
c@  ok<#,ram> 62190 133
$0f and  ok<#,ram> 62190 5
swap 1+ swap type \ display "words"

The word n> c converts an NFA address to CFA.

Our FlashForth version on ARDUINO Nano does not have word to convert an NFA or CFA address in an LFA address. After code analysis FlashForth AVR assembler, it turns out that you just need to subtract 2 from the address NFA to point to the LFA address of our word:

' words  ok<#,ram> 62196
c>n  ok<#,ram> 62190
2-  ok<#,ram> 62188
@  ok<#,ram> 62084
dup  ok<#,ram> 62084 62084
@  ok<#,ram> 62084 16003
$0f and  ok<#,ram> 62084 3
swap 1+ swap type >pr ok<#,ram>

Here, in the last line, we went back to the word >pr which is the word attached to words in the FORTH dictionary:

The word >body converts a CFA address to PFA.

FlashForth word structure

Our few essays tend to show a FORTH word structure for FlashForth which comes close to this:

We will detail this structure by analyzing our word words from the dump of its code:

hex ' words c>n 2- 20 dump
f2ec :84 f2 85 77 6f 72 64 73 77 df e2 db 66 de 5c de ...wordsw...f.\.
f2fc :e8 f6 02 d0 84 df 99 dd 9e de 0c 94 6a 39 ee f2 ............j9..

Highlighting the NFA of words:

f2ec :84 f2 85 77 6f 72 64 73 77 df e2 db 66 de 5c de ...wordsw...f.\.

The first byte, 85 gives the length of the compiled word, here 5.

Where it starts to get interesting is the field analysis that follows our NFA:

f2ec :84 f2 85 77 6f 72 64 73 77 df e2 db 66 de 5c de ...wordsw...f.\.

Here is the disassembly of this piece of code:

Mmmmm.... There is no CFA!

In fact, we attack directly in the compiled FORTH code. This succession of instructions rcall is explained when we analyze the source code of the word words as defined in the assembly code of the ff-atmega.asm file:

; WORDS    -- filter
        fdw     TO_PRINTABLE_L
WORDS_L:
        .db     NFA|5,"words"
        rcall   BL
        rcall   WORD
        rcall   DUP
        rcall   DOLIT
        fdw     kernellink
        rcall   WDS1
        rcall   LATEST_
        rcall   FETCH_A
WDS1:   rcall   CR
        jmp     LIKES

FlashForth does not directly compile the CFAs of compiled words. In place, we find short or long relative calls to the CFAs of these words. This technique eliminates the need for a Forth engine. Every relative call runs occupies the same memory size as a CFA. The speed of execution is strongly accelerated, compared to that of a version of the FORTH language using indirect chaining.