The Fig-Forth v2 Compiler contains a wide variety of options available to the programmer, allowing the construction of compiler directives, error catching and overlay functions, and code generation. In addition the properties of indirection and piping have been fully implemented to allow applications to control their internal functions and to change or modify this function in an on-the-fly manner. Because each token thread is defined as a list of subroutines to the Fig-Forth internal code engine, the construction of these code fragments and their interpretation can be changed in many complex and varied ways. For the purposes of this chapter it is assumed the user is familiar with all areas of this manual, and that only the examples and explanation of these advanced functions is required.
The first extended operation of Fig-Forth v2 is the Compiler Directive structure, which instructs the compiler on the manner in which a code segment is constructed. Some of these words are discussed in chapter 5 under Flow Control, with a secondary reference in chapter 2 under Storage Operators for the variable movers and record addressing constructors. These directive commands are often defined as Immediate Function words, such that their execution takes place when the compiler encounters them. Compiler directives are constructed using the following definitions;
CREATE
This word instructs the compiler to create a new dictionary symbol entry, linking that symbol to the current compilation vocabulary. Because this new symbol is assumed to be undefined the compiler automatically sets the SMUDGE bit, preventing the accidental execution or compilation of the new symbol. The programmer should note that the characters for the new symbol must come from the current input stream for the input processor, whether this word is executed as a part of that source screen or a token list function. For example;
: MY-WORD CREATE ... --creation at runtime
CREATE MY-WORD ... --immediate creation
The first definition will accept the next space delimited word from the system console, file block, modem input or command line tail string when the token list MY-WORD is run, building a new symbol out of the parsed word and placing it into the current vocabulary. However, which is why the compiler sets the smudge bit, the new symbol will contain no definition of its run-time operation or any parameters, becoming a symbol without a purpose. Such symbols are compared versus the current search order and vocabulary listings, generating a warning error if the new symbol already exists. Directive words which call this function should include a valid Code Field Address token and reset the smudge bit at the end of construction. E.g.;
: MY-WORD CREATE | -- build new word |
' DPL CFA @ , | -- borrow copy of variable function code |
0 , | -- add zero data for contents |
SMUDGE ; | -- complete definition |
Optionally, such a definition can be made immediate to cause the action to take place when the defined word is parsed from the input. (In this case the MY-WORD stream of characters, followed by the name of the new symbol.) The above definition would be equivalent to the VARIABLE word.
<BUILDS
This word is used to call the CREATE process given the current input device and the data that comes from it. For example, this word also calls WORD, -FIND and NUMBER inside the Forth kernel, thus will parse out the next space delimited word from the input stream, build a current vocabulary symbol for the given input text, and will do a find for a repeat of that symbol for a warning to be sent to the user. In addition, this word also marks the beginning of the directions to instruct the compiler as to how to build the token list structure, to be followed by the function or functions executed when the new symbol is encountered. It therefor acts much like a macro command to the current construction, as an example, this is the definition of the VARIABLE word;
: VARIABLE <BUILDS , DOES> ;
Thus each time the variable word is encountered the compiler first builds a new symbol and then saves a single word to the user space by the comma word. (See DOES> below.) A double variable version would appear as follows;
: 2VARIABLE <BUILDS , , DOES> ;
This operation performs the function as listed above, though now there are two commas contained in the definition, requiring two values from the current parameter stack when the 2VARIABLE word is encountered. Build operations can contain loops, conditionals or calls to other definitions as needed.
DOES>
This word is used to complete a compiler directive as defined under <BUILDS, specifying the steps required of the internal machine when the new symbol or its token is executed. Note that in the interest of user space the compiler does not add the listed tokens following DOES> to the created definition, instead constructing a long jump to the token list that follows DOES> with an internal call. Thus the new symbol behaves very much like a deferred word construction, which can be changed by shifting the address of the relative call.
: WORD-ARRAY <BUILDS 0 DO 0 , LOOP DOES> SWAP 2 * + ;
This example demonstrates the construction method required to build a table of integer variables, which will return the address of any element within the array given the offset at run time. How this operates is as follows;
10 WORD-ARRAY MYDATA OK.
7 5 MYDATA ! 3 MYDATA ? 0 OK.
The first line above calls the build function to create an array of 10 word values, using the internal DO-LOOP sequence of the WORD-ARRAY definition. During this time all values are zeroed as indicated by the zero comma function, then the contents of MYDATA's Code Field Address is changed to point to the location after the DOES>. When MYDATA is run as in the second line the DOES> construct leaves the address of the Parameter Field for the array sitting on the active stack, where the SWAP 2 * + converts this address to the element indexed by the data value given MYDATA. The typical value of the address returned by the new symbol will be 4 bytes beyond the DOES> code call, such that a Tick operation on the above MYDATA word will not point to the array data. Constructs of this kind maybe placed anywhere within the active input stream and may optionally be made immediate if more complex constructions are required.
COMPILE
This word is used in a manner similar to that of <BUILDS and DOES> above, in that it instructs the compiler to add token addresses to the current definition. As before, the COMPILE word cannot be used by itself because it must operate within the compile mode, so more often than not those definitions which contain it will be marked as immediate. To demonstrate this operation the definition below would mimic the dot-quote function;
: "" HIDDEN COMPILE (.") FORTH 22 WORD HERE C@ 1+ ALLOT ; IMMEDIATE
This word will instruct the compiler to include the same kernel token as the standard dot-quote operation, and to save a string given it in the same fashion. How it would be used is;
: TEST "" This is my string!" ... ;
In this case the immediate status starts the "" operation as the compiler separates it from the input stream, where the COMPILE word has instructed the processor to add the token address of (.") to the TEST definition such that it may be executed when the new word is run. Once that is performed quote-quote goes on to parse out the given input string, allocating dictionary space for it before letting the compiler return to its token building function. A review of the final operation will appear precisely like that of dot-quote, though the above definition of quote-quote contains no execution time code of its operation. (Thus it cannot be entered from the keyboard as a direct "" function like ." may be. See STATE below.)
[COMPILE]
This word performs the opposite function of the COMPILE word, causing the compiler to add the token address of the next immediate word into the current definition rather than executing it. As such if the "" as defined above has already been properly defined and is immediate, its inclusion in the current word if required can be forced by the following process;
: WORD-LIKE [COMPILE] "" ... ;
Quite often, this word is required for the purposes of calling upon the vocabulary search process, such as in the definition below;
: SEE [COMPILE] ' CFA ... ;
In this case when the compiler encounters the SEE word the first operation to be performed will be the immediate Tick function, which will parse out the next word from the input stream and look through the current search order to find the location if any for its matching symbol. Thus the function of [COMPILE] is to add the functionality of the immediate word to the current definition, expanding the operation available in the compiler.
LITERAL and DLITERAL
These two words are used to add constants to the currently constructed token list, either from the present stack if used alone or a future stack if [COMPILED] as part of a defining function. Thus placing a value on the stack and executing create will cause the value to be added to the current definition as shown below;
512 : MYWORD LITERAL ... -- adds value immediately as a constant
Or the addition may be postponed until another definition is run;
: MYWORD [COMPILE] IF [COMPILE] LITERAL ... ; IMMEDIATE
512 : NEWWORD MYWORD -- adds value after an IF construction
Such directives may be required for a replacement of the QUIT process by an application, such as creating a new input processor loop. Both words act as a no-operation in the interpret or run mode. These words add the LIT or 2LIT functions of the compile to the current definition, followed by the word or double-word constant of the term.
BRANCH and 0BRANCH
These words may be added to a token list in any conceivable mix using compiler directives, though must be followed by the byte-count offset of the resulting jump.
CODE and ;CODE
These two words are used to define machine sequences within the user space area, CODE for defining a function comprised totally of machine instructions and ;CODE for completing a current definition with the same. Note that both these operations are to be ended with the END-CODE definition, a far function return to the internal clean process. When either of these words is executed the compiler will be forced into a Run Mode, an initial code fragment will be added to the dictionary space, and the kernel will make a branch to the ASM vectored word in an attempt to call the system assembler if it has been loaded. Note that without an END-CODE operation the words containing these functions will not be SMUDGEd. For precise details on the limits and requirements of code fragments, see Appendix C.
The system variable STATE contains the current mode of operation within the Fig-Forth compiler, or whether or not words are executed upon separation from the input data stream. Many of the words discussed elsewhere in this manual can have a direct bearing on the contents of this variable, and several use this variable's contents to determine how they function. As with most operations controlling the compiler, this word is located in the HIDDEN vocabulary.
This word is used to set the contents of STATE to that of zero, or to force the compiler into a Run Mode. All input after parsing this word is assumed to be stream commands directly to the internal engine, and will not be contained inside the current definition unless explicitly commanded. Note that this word is also made immediate, such that commands may be directed to the internal machine within the current compiler action.
This word is the opposite function of the opening bracket listed above, setting the STATE variable to a non-zero state. Words following this function will have their token addresses added to the current definition or at the current dictionary endpoint, and all compiler directives will be executed in their compiling mode of operation. Note that this word is not made immediate, it must be executed by the machine while in a Run Mode.
STATE
This word returns the address of the STATE variable such that it may be tested, a zero indicating the machine is in the Run Mode. Words such as dot-quote call this variable to determine if they should add the enclosed string to the current definition or perform an immediate operation, such as the responses indicated below;
." This is a test!" This is a test! OK.
: S1 ." This is a test!" ; OK.
S1 This is a test! OK.
Such words are called "state smart" within the realm of Forth parlance, or are aware of the machine's state as they function. (See example below.)
?COMP and ?EXEC
These words are used by some state smart words to prohibit their function whenever the compiler is not in the state desired. ?COMP for example will generate an error if Fig-Forth is not currently defining a user subroutine, such as using an IF word outside of colon and semi-colon. ?EXEC will generate a similar error if Fig-Forth is not in an execution or run mode, for example using VARIABLE within a colon definition. A specific example;
: "" HIDDEN ?COMP COMPILE (.") FORTH 22 WORD HERE C@ 1+ ALLOT ; IMMEDIATE
This change in the definition example given under COMPILE will generate an error if the user or input stream attempts to call this function when in an execution mode, preventing such accidental use because it does not contain a run time component.
Using the definition displayed above under COMPILE and ?COMP, the code below demonstrates how to construct a state-smart function. This operation is functionally identical to that of dot-quote.
HIDDEN ALSO FORTH | |
: "" STATE @ IF | --test state |
COMPILE (.") 22 WORD | --compiler actions |
HERE C@ 1+ ALLOT | |
ELSE | |
22 WORD HERE | --run time actions |
COUNT TYPE | |
THEN ; IMMEDIATE | --set as directive |
CSP
This system variable is a temporary storage location used by the compiler to store the location of the parameter stack pointer, later testing it to determine if an error has occurred. Typical errors of this type are pairing errors of IF, ELSE, THEN, and so on, which can be tested below.
!CSP
This word causes the location of the current parameter stack to be stored into the variable CSP mentioned above.
?CSP
This word will generate an error if the current parameter stack location does not match the previously saved value of !CSP, calling the error handler with a compiler message.
As was seen under the prior section Fig-Forth v2 contains word functions for the express purpose of generating system errors, and for the correcting of these errors and notifying the user of their occurrence. However, the method used for error correction inside the virtual machine is both unilateral and unforgiving; emptying both functional stacks and returning to the Forth Command Processing Loop. Clearly this operation would interrupt any user process or application being undertaken at the time, such that specialized steps must be performed to prevent this return to the input processor. In addition, because some errors can originate from inside the compiler or the run time engine at the time of user operation, avoiding this return is an essential practice.
There are many words and functions that control the Forth system for the process of error correction, and for the settings involved with calling user generated handling code. The programmer should review these words and be fully aware of their operation before attempting to create their own error service routines, of particular concern is the interception of errors generated by the system compiler and internal runtime engine. These words and their functions are outlined below;
WARNING
This user variable forms the master switch of the Forth error correction routine and the MESSAGE display function, much like the STATE variable controls the compiler operation. While this variable is non-zero and positive all errors generated will be routed directly to the function of QUIT, after making a pass through the error display system mentioned below. If this value is zero the same branch to QUIT takes place, but without notifying the user of a generated error as stated below. Note that for both MESSAGE and ERROR this variable controls the outcome of the notification process, thus for all user defined error interception this value must be set to a negative number.
MESSAGE
This word is called by Forth to announce the presence of any error, and if the variable WARNING is non-zero will display the messages listed below. This function may be called by user defined routines with a single value on the stack, or other routines may be employed for specialized display. You will notice that this function is also responsible for the search order and VLIST specialized printing functions.
Value | Message |
0 | Huh? |
1 | Stack Empty! |
2 | Isn't Unique! |
3 | Stack Out Of Bounds! |
4 | Disk Error! |
5 | Compiler Only! |
6 | Execute Only! |
7 | Check pairs! |
8 | Compile Error! |
9 | Under Fence! |
10 | Not Loading! |
11 | Context Not Current! |
12 | Link Error! |
13 | Shell Error! # |
14 | Jump Out Of Range! |
15 | Press any key... |
16 | Save Error! |
17 | ------------------- |
18 | SEARCH ORDER: |
19 | NEW WORDS: |
20 | Not Terminated! |
ERROR
This word is used to notify the user of all fatal errors, using the same values as mentioned above for MESSAGE. Note that this function is controlled by the WARNING variable above, and will exit through the vectored word of ABORT if the WARNING variable contains a negative value. If the value of WARNING is positive the last parsed word from the active input stream will be echoed to the system console, followed by the message specified on the parameter stack. When WARNING is either zero or positive this function empties the parameter stack after displaying the message notice, then branches to the QUIT process and the command input processor. If the error was generated while accepting input from a file block this function will place the block number and location within the block on the parameter stack for further error processing, and will reset the input device to the system console.
QUIT
This word is the basis of the Forth input processor, accepting input from the current console function and acting upon the directions given. When this word is called as part of an error system it unilaterally halts any input from file sources and forces the compiler into an inactive state.
(ABORT)
This function is that operation normally assigned to the ABORT vector, and forms the catastrophic error handler within Fig-Forth v2. When this word is run both the Return and Parameter stacks are universally emptied, the identity of the compiler and its revision date are printed, and the vocabularies of ROOT and FORTH are placed into the Search Order Buffer. All other vocabulary search entries or values on the stacks are discarded before this function exits via the QUIT word.
?ERROR
This word is the primary gate used by the compiler to detect and display the errors it generates, branching to the ERROR word if the top stack item is non-zero. If the top stack item is zero both it and the second item on the stack are discarded, such that the second stack item indicates the error type and message number to be displayed.
?STACK
This word tests the size of the parameter stack versus the current dictionary tail and the upper limit of available memory, throwing an error code of 1 if the stack under-flows or 3 if the stack comes within 384 bytes of the dictionary tail. (192 items.)
?LOADING
This word tests the current input source for a file based device and generates an error code of 10 if it is active.
?PAIRS
This word tests the top two stack items for equality and generates an error code of 7 if they do not match. This word forms the basis of the Forth language syntactical enforcement to couple similar operations together with their opposites or options. (Such as IF THEN, BEGIN UNTIL, etc.)
Other Errors
While the above review of the word operations that comprise the compiler's public links does not cover every potential error that may arise from within the compiler, these links have been provided for the construction of additional compiler directives. Some errors not covered by this list is disk read or write operations, number conversion errors beyond the limits of the current number base, and find errors resulting from improperly constructed symbols or symbols that are not defined. While any user oriented error routine will probably not need to address these functions or codes as indicated in the above contained MESSAGE list, if the compiler or Forth language is in operation during the application the programmer should allow for and correct these errors as well.
There are four items that must be considered when constructing an error handling service routine for the Fig-Forth environment, the first being the likely source of the error to be corrected. Because most of Forth's error codes are passed into the function of ?ERROR the service routine should expect an integer to be located upon the stack, along with any user parameters present when the error occurred. Such error handling routines should use this value to determine the type of correction required if any is possible, then return to the application at the appropriate location.
The second item that needs to be considered is the state of the Return Stack, because several errors can originate from within the internal engine or the compiler itself. In addition, such as in the case of disk access errors, allowing the calling routine to continue execution may not be possible or advisable, e.g., as in a LOAD operation where the file cannot be opened.
The third item for consideration in error handling is the state of the Parameter Stack, and whether or not the data it contains should be preserved or discarded. All of these actions may or may not be necessary for the corrective action to any range of error codes generated by the system or any user application, using the recommendations suggested below;
SP!
This word empties the parameter stack by copying its value from the S0 system variable, effectively discarding all values placed there by the program. This action should be called in the event that the user application is forced to reset itself, returning to its own command processor after informing the user of the error.
RP!
This word empties the return stack by copying its value from the system variable R0, then pushes two distinct values on the control stack for the purpose of catastrophic correction. These two values are a jump to the internal Forth engine followed by the offset to the system operation of COLD, which will effectively restart Forth and attempt to discard all work thus performed.
S0
This system variable holds the value of the cold start stack location which is transferred to the stack pointer in the event of an error.
R0
This system variable holds the value of the cold start return stack location which is transferred to the return stack pointer in the event of an error.
The last item for consideration is the type of intervention to be taken by the host program, and where it should return when the correction is complete. Such an example of this action can be found in the Midi Maker program, where the error to be intercepted is anticipated to originate in (OPEN).
In this case the host program wants to retain the values saved on the parameter and return stacks, up to the point of where the error occurred. In doing this the program defines a double number to hold the cold start values, then makes a copy of the current locations for the purpose of correcting the error;
0. 2VARIABLE SPRP
: CATCH1 S0 TO SPRP SP@ S0 ! RP@ R0 !
At this point the user service routine is installed, replacing the (ABORT) function normally taken by the error;
' CATCH2 CFA ' ABORT !
Finally, to engage the call the message value is set to a negative value;
-1 WARNING ! ;
The following defines the error catching routine, which discards all values and returns control to the word that called the erring process;
: CATCH2 SP! RP! UNCATCH UNLOOP R> 4+ >R
Note that in this case the values being placed into the stack pointers are those saved by CATCH1, rather than those used by COLD. Finally, this routine ends with the error announcement itself;
." File Error!" 0 FILENAME ! ;
The UNCATCH word detaches the error handler and must be called after the suspected word generating the error, and is comprised as follows;
: UNCATCH SPRP TO S0 ' (ABORT) CFA ' ABORT ! 1 WARNING ! ;
This word disconnects the error handler and restores the system values, placing the machine in its original state. The word in which the error is expected should be constructed as follows, engaging the error handler and then discarding it after the suspect word;
: @DIS CATCH1 GET-F1 UNCATCH SCANF .CHS ;
Such error handling is an extreme measure and a very complex one, particularly in the area of CATCH2. The process of UNLOOP R> 4+ >R is designed to "skip over" the UNCATCH and SCANF functions inside the @DIS word, bumping the token instruction pointer by two tokens before the error routine exits. UNCATCH is skipped because it has already been performed in the event of an error, while SCANF is the function that must not be performed if the file failed to open. Your code may not require such drastic measures but it is suggested you step carefully, because an incorrect process will likely crash Forth. (SEE ALSO: The Code Optimizer below.)
A typical example of this process using delayed error messaging is listed below;
0. 2VARIABLE SPRP 0 VARIABLE ERR 0 VARIABLE EXP
: GRAB SP! RP! UNLOOP EXP -> ERR ;
: CATCH EXP ! 0 ERR ! S0 TO SPRP SP@ S0 ! RP@ R0 ! ' GRAB CFA ' ABORT !
-1 WARNING ! ;
: RELEASE SPRP TO S0 ' (ABORT) CFA ' ABORT ! 1 WARNING ! ERR @ ;
: GET-FILE 100 ," FILEDATA.DAT" 1 CATCH HIDDEN (OPEN) FORTH RELEASE
IF ." FILE ERROR!" 0 ELSE 1 THEN FILEOK ! ;
In this case the error originates in the same location as the Midi Maker program, however the user defined error code of 1 has been selected for this function. (Set by the 1 CATCH word in GET-FILE.) When the RELEASE word detaches the handler it returns the error status, thus informing the user routine if the handler exit was taken. Further testing or reporting can be achieved by querying the contents of the ERR output variable.
Overlays are user functions that have already been compiled into the user memory, which can be written to disk to facilitate faster loading and greater utility. The format of these files is very straight forward, a single byte to indicate the overlay and two bytes to indicate the target address, followed by 1021 bytes of pre-compiled data. Note that if the current search path contains words or data in the path of the overlay it will be over-written, and for this reason Forth will reject attempts at loading such files if the user space is not in the same condition as when the overlay was created. (But there are ways around this.)
When an overlay is created the Compilation Vocabulary must be the same as the Top Search Buffer Item, and the file required to hold the overlay must already be open. For entire programs comprised of a vocabulary constructed as a part of the root directory, this implies the following process;
ONLY VOCABULARY MYTASK IMMEDIATE
FORTH ALSO MYTASK DEFINITIONS
(other words defining the operations of the program)
ONLY 20 DUMP MYTASK
This process has created and saved the entire program as a function of the ROOT vocabulary such that it may be loaded with the following command;
ONLY FORGET MYTASK
20 LINK
Now the MYTASK vocabulary is back in the user space, once again a part of the ROOT vocabulary. In addition, the GO word can be used to link an overlay into the user space, except that the last word defined in the overlay is automatically executed.
ONLY FORGET MYTASK
20 GO (program starts to run)
When used as a part of a program the function of FORGET must be a deferred operation, to remove any old overlay that may be present in the user space when a new one is needed. This process is very easy to implement, as shown below;
ONLY VOCABULARY MYTASK IMMEDIATE -- create task in root
ALSO FORTH MYTASK DEFINITIONS -- add forth and define task
: SWITCH ( BLK# -- ) ," FORGET OVERLAY" RUN$ -- remove current overlay
LINK LATEST PFA CFA EXECUTE ; -- load next piece & run 1st word
: OVERLAY ; -- where the overlay goes.
Through the power of the special word of RUN$ the actual forget command will not be processed until it is needed, while the null word of OVERLAY is the location to be forgotten. After the string is interpreted the new overlay can be loaded, where the second half of SWITCH tells the machine to execute the first word. Note however that the first word to be executed by the overlay should be an RP!, to empty any calling values that may be remaining from the previous overlay. (Unless of course, the second overlay exits in a fashion that these values will have meaning.)
Using this scheme the overlays themselves are very easy to construct, for all they require is the same header (null or not) and the location of the next part of the program;
: OVERLAY ." PART 4" ; -- an overlay with ID string
....
: GO-NEXT 34 SWITCH ; -- switch to next
....
: ENTRY-POINT RP! .... ; -- where overlay starts
Using the RUN$ word also means the overlay does not need to be started with the first word, but can call upon any word in the overlay and exit back if required;
: SUB99 ," FORGET OVERLAY 55 LOAD GET# FORGET OVERLAY 24 LINK"
RUN$ ;
: OVERLAY ;
which will forget the OVERLAY, load a new routine of which GET# is a part from block 55, run the word, forget the new overlay and bring the old pre-compiled overlay back into memory. Under this condition the RP! would not be required, since the new overlay is expected to return the control stack to its former state. The RUN$ word will attempt to execute any string of the address given it, but must be zero terminated for WORD. Note however that the string to be run, particularly when calling FORGET, should be placed under the location of the forget itself, as well as the word which contains RUN$.
Forth also has the capacity to allow defining the operation of a word at a later time by the use of the vector, which ABORT, KEY, ?TERMINAL, EMIT, CR and others perform. These words are defined by a simple NOOP call to the compiler code, which may be replaced by a different token later in the text or by program function;
: CT1 NOOP ;
( other words )
: MYWORD ...
... ' MYWORD CFA ' CT1 ! ... ( set the vector )
This also allows self re-entrant code structures to be defined, up to the depth available by the Return Stack. This is often used for mouse call-backs, error functions, and parsing routines.
In addition to the vector, tables of addressing may be constructed for later execution, much like the SWITCH operation mentioned in Chapter 5. In these cases a simple variable array can be constructed as in the following example;
' MYWORD VARIABLE MYLIST ' NEXTWORD , ' THIRDWORD , ...
Because the Tick operation returns the PFA address these command words can be executed using the following;
: RUN-ONE ( N -- ) 2 * MYLIST + @ CFA EXECUTE ;
This example assumes that N will point to the word list, executing MYWORD if zero, NEXTWORD if 1, and so on. EXECUTE will run any Code Field Address word given it, including such definitions as moved to file blocks or screens.
Return to Contents. Next Chapter. Previous Chapter.