Chapter 1 Tokens, Symbols and Labels

What is a token?

Tokens are the baseline interface with the internal Fig-Forth engine, saved as indirection calls to the machine code that performs a specific function. Although defined as addresses within the user space, (hence allowing 64k worth of them) all high level words added to the dictionary space comprise lists of these tokens, along with the occasional use of direct machine coding as is appropriate to the need. During the execution phase each token generates a CALL [address] command, without the machine code for Call being included in the saved value. Token threads calling kernel routines are fully re-locate-able within the dictionary space, requiring only two word pointers be changed to move any list of root functions to another location. (Though this is not true of word functions which call other dictionary words at this time.)

Special Characters and Reserved Words

As in other languages Forth uses special key codes to represent functions that take place when they are parsed from the input stream. Most of these are available in the language index listed below, however the following functions take place internally to the compiler operation;

Table 1-1. Special Character Actions

Symbol

Condition

a .. z

Lower case letters. Unless enclosed by one of the quotation functions these characters will be raised to their upper case value if the CASELOCK variable is zero before the compiler interprets the input.

&

The ampersand character is used to change number bases when inputting a value, specifying that the number following the ampersand is a Hexadecimal constant. After the number is parsed the current base is restored to what it was before the ampersand override.
<NULL> The ASCII null character is used to end an input line, whether the source is a disk file or the keyboard input. The processing of Null returns Fig-Forth to the keyboard input interpreter.

<CR>

The carriage return character is used to terminate an input line from the keyboard, which is replaced by Null before interpretation. The inclusion of carriage return characters in an input file is to be avoided.

<LF>

The Line-Feed or Control-J character is unilaterally ignored by the keyboard input processor, and its use within a file input is to be avoided.

<DEL>

The Delete or Control-H character is used to remove the previous character from the keyboard input stream. Its use within a file input is to be avoided.

.

The Period character has a mixed operation as outlined in the Variables and Math chapter.

Symbols

Each word that has or will be assigned a token value by Forth becomes a symbol to the redirection list or machine code contained within its meaning. These words must be surrounded by parsing spaces upon input, and the word may or may not be acted upon the moment it is parsed from the input stream. Those words which are such immediate functions are listed in the dictionary in red letters to indicate their immediate status. Additional words, along with any constants provided in the program source, return their token address or value to the compiler for inclusion into the current definition.

Labels

Fig-Forth uses a set of specialized labels for its operation, some of which are available to the advanced programmer. The basic form of label is the Name Field Address, or that label used to identify any entry within the dictionary space. These labels should be unique within the currently constructed vocabulary, but Forth will only refer to the latest created label for all subsequent references to that word. Word labels must be in the range of 1 to 31 ASCII characters in total, which are converted to their upper case equivalents for the letters A to Z when the system variable CASELOCK is zero and the label is added to the dictionary. If the CASELOCK variable is non-zero the word label is stored in its natural case structure, and the complete input line must appear in the correct case for each word making up the definition. Characters to be used for such labels should be those available from the keyboard itself, however this restriction is by convention more than the compiler's requirement. The characters of Control-H or ASCII Backspace, Control-J or ASCII Line-Feed, ASCII Delete, Control-M or ASCII Carriage Return and ASCII Null cannot be used as label defining characters.

Specific Labels

In addition to the Name Field Address Fig-Forth uses 7 specific labels for pointing to the dictionary and definition characteristics, of which the name field is a part. In the dictionary space three such values are defined; HERE which marks the end of the current dictionary space, PAD which is 88 bytes beyond the HERE point, and TIB which is the Text Input Buffer. The area between HERE and PAD is used by the compiler during the build and run process, while the TIB is defined as between the Parameter and Control stacks. User programs should be cautious about accessing these areas.

The four advanced user labels point to the key elements of defined functions, using the Tick word (an apostrophe) to retrieve the Parameter Field Address (PFA) for such a definition. After obtaining this address the words CFA, NFA and LFA respectively convert the PFA into the other values. The CFA word adjusts the PFA to point to the Code Field Address, or that location pointing to the machine specific code in the kernel to perform the word's function. The NFA word adjusts the PFA to the Name Field Address, or the label structure created when the word was added to the dictionary. Finally, the LFA word will adjust the PFA to the Link Field Address, or that address used to link the specified word into the current dictionary search chain. The modification of any of these fields or their contents by a means outside of compiler action is at your own risk.

Internal Labels

Lastly, during the compilation phase of any word into the dictionary space, the compiler defines execution jumps and conditional branches by hidden relative labels, items that the compiler places onto the current stack as the compilation proceeds. These labels are managed almost exclusively by the compiler itself, but under unusual circumstances may be referenced by outside words or across words using specific directives.

Numbers

Fig-Forth v2 has 4 kinds of common numbers; byte, word, double word and quad word. Bytes are unilaterally assumed to have no sign bit, and are generally considered to be either an ASCII character or a string count byte. (See Strings.) Words are implemented as signed 2's complement integer values in the decimal range of -32768 to +32767, or addresses within the dictionary and user space. Under special conditions, words are also offset pointers to other portions of the machine using direct memory access.

Double word numbers are sign extended versions of their word counterparts, expanding the range of significance to -232 to +232-1. To maintain compatibility with previous versions of Fig-Forth, double word numbers are stored in Big Endian Format by the number access words, contrary to Intel's vector format. Such Big Endian Format is also present upon the parameter stack. Please note that all double word values entered from the keyboard or disk file must include a decimal point.

Quad word numbers are sign extended versions of their double word counterparts, expanding the range of significance to -264 to +264-1. Quad word numbers are supplied mainly as interim values for double word arithmetic. See Chapter 2 for their storage format. Quad numbers cannot be entered directly from the keyboard or within a program file, except as two properly constructed double word values.

All numeric operations in Fig-Forth v2 have a variable mathematical base, within the range of 2 and 36 inclusively for the current build. While values beyond the base 36 level are acceptable for printing, no correction is made for the ASCII control characters such as <DEL> that may result from this process. Numbers of larger bases cannot be properly input from the keyboard or file because of the ASCII <DEL> character.

Two words are provided to change between the most common of the number systems, HEX for Base 16 and DECIMAL for Base 10. In addition, the input interpreter allows than an ampersand (&) can prefix any numeric value, specifying that the value is in the hexadecimal base. The system variable BASE holds the current numbering system value of any single digit.

Character Strings

As with word names, strings in Fig-Forth v2 are comprised of ASCII characters entered from the keyboard, however no changes take place when lower case characters are entered. Because Fig-Forth stores such strings as a length byte and the text characters, the maximum length of any entry saved is 255 bytes. The same restrictions of character values as stated in Labels applies to strings with the addition of the termination character of a quotation mark when required.

Constant Declarations

Constants are values entered directly by the program source or defined during the compilation phase, which will generate either a token or a literal function within the current definition. If a literal function is created by the appropriate action, the value of the constant follows the literal, for whatever size of constant is being represented. At the time of program operation (run) all tokens, with the single exception of those words deferred by re-direction, are assumed to be immutable constants by the processor.

Comments

Fig-Forth allows comments within program lines by preceding the text with an opening parenthesis character, separating the parenthesis character from the text by a single space. Text following the opening parenthesis is parsed until a closing parenthesis is found, then it is discarded. Because of disk access limitations, the maximum length of any comment is limited to 1020 characters. Comments cannot be nested.

Conventions used

For the purposes of this manual all code presented will be in UPPERCASE, or that case to which the compiler will shift all defining words before adding them to the vocabulary if CASELOCK is zero. User input material will be underlined to show the user action and system responses relative to each other. Links shall refer to other sections of this manual itself. During the process of commenting any included programs within this manual or package, table 1-2 contains the conventions of the stack diagrams shown. (See STACKS.)

Table 1-2. Comment Conventions

Symbol

Meaning

--

The word being commented in operation.

b

A byte number. (upper 8 bits zero.)

c

An ASCII or keyboard character. (upper 8 bits zero.)

f

A logical flag. Zero=False

n

A signed word number.

d

A signed double word number.

u

An unsigned word number.

q

The signed quotient of a division.

r

The signed remainder of a division.

dl

The lower portion of a double word number.

dh

The higher portion of a double word number.

dq

A signed double quotient.

dr

A signed double remainder.

qd or q

A signed quad number.

ud

An unsigned double word item.

adr or a

A 16 bit address.

off

A 16 bit offset value.

dql

The 32 bit (double) lower portion of a quad number.

dqh

The 32 bit (double) higher portion of a quad number.

seg

A 16 bit segment pointer

(n) --

The word requires a parameter value on execution

--(word)

a space delimited string is required after the word

--(string")

a quote delimited string is required after the word

Program Lines

Fig-Forth v2 is designed to accept program lines from disk access blocks consisting of 16 lines by 64 characters. Disk Block Files require that a special editor is employed, which displays the block without recognizing the control characters consistent with DOS or other formats. Moving sequentially from block to block is explicit, requiring an Arrow Symbol (-->) during program loading. An optional convention for programming disk blocks is to use the first line as a comment statement about what the block contains and the program of which it is a part, usually displaying the date of writing and/or other material to identify the block. The remaining lines are filled with the program source text, the format of which is up to the programmer.

Return to Contents.  Next Chapter.   Previous Chapter.