Writing a "Forth-like" in SPARK

This is the second stop on my journey writing a verified Forth. The final goal would be to make a verified Forth (like seedForth) which could be used as a compiler backend as described on the Ada forum.

Previously, I wrote a postfix calculator to learn more about the SPARK subset of Ada, and to see if a small postfix calculator could have its operations verified. I've never worked with Forth before, so this step was to understand that language more and the implementation issues.

Forth Background

Forth is a neat concatenative programming language in how much power it can get from its minimal syntax. The primary element in the syntax is the word, represented in text as a sequence of non-whitespace characters.

10 20 + . ( add 10 and 20 and print the result )

Words operate on a variety of data structures: a stack containing data (parameter stack), a stack containing return addresses (return stack), a text input (terminal input buffer), and a dictionary of words.

Forth programs are lists of words. Words do everything, from comments, to doing arithmetic, to starting and terminating definitions of custom words.

( left parentheses is a word that treats everything until right parentheses as a comment )

( add 10 and 20 )
10 20 +

( squares the top of the stack )
: SQUARE DUP * ;

Forth's directness promotes its use in bootstrapping systems, and a lot of implementations I came across directly emitted assembly. I wanted to avoid complications dealing with assembly and its implications with the prover, so instead I implemented a Forth-like virtual machine. I'll call it "Forth-like" since I'm not sure what subset of language features minimally constitutes a "Forth," and I'm not looking to raise anyone's ire or bikeshed.

Wanting to run it interactively and from a file, I implemented a tokenizer. Some functions typically implemented in the Forth itself (like the text interpreter) were done in SPARK. This allows some behavior to be verified, like that the tokenizer terminates.

The first Apollo mission didn't immediately land on the moon. Maybe the next generation of what I write will be a seedForth-like implementation.

My goal was to support defining custom words and have some form of control flow. I didn't get to variables and constants (! or @) or to DOES>. This has been a great experience so far, so I might add those in the future.

Verification with SPARK

This Forth implementation verifies specific behavior with SPARK as begun and described in the postfix calculator project. If the VM remains in an "Ok" state provided no error occurs with these errors detected and caught:

arithmetic would overflow or underflow
data or return stack overflowed or underflowed
any word, name, and instruction allocations failed
any invalid word attempts execution

Operations are proved to various degrees.

Stack operations prove that their results are correct, such as ROT (rotate). 1 2 3 on the stack is proven to result in 2 3 1 on the stack, unless there's not enough elements, which will result in an error.

This is what the description of ROT looks like.

procedure Op_Rotate (V : in out VM)
with
    Global         => null,
    Pre            => Is_Running (V),
    Contract_Cases =>
    (Param_Stack_Size (V) < 3 =>
        V.Status = Param_Stack_Underflow
        and then Only_Status_Changed (V, V'Old),
    others                   =>
        (Param_Stack_Size (V) = Param_Stack_Size (V'Old)
        and then Param_Peek (V, 0) = Param_Peek (V'Old, 2)
        and then Param_Peek (V, 1) = Param_Peek (V'Old, 0)
        and then Param_Peek (V, 2) = Param_Peek (V'Old, 1)
        and then Param_Stack_Equal_From_Bottom_Until
                    (V, V'Old, V.Param_Top - 3)
        and then Only_Param_Stack_Changed (V, V'Old)));

This procedure does not depend on any global variables:

Global => null

The virtual machine must be running:

Pre => Is_Running (V)

Contract_Cases provides different guarantees based on which condition matches when entering the procedure.

If there's not enough values, then an error is signaled:

Contract_Cases =>
    (Param_Stack_Size (V) < 3 =>
        V.Status = Param_Stack_Underflow
        and then Only_Status_Changed (V, V'Old),

Otherwise, the values have been rotated appropriately and nothing else in the VM was changed.

others                   =>
    (Param_Stack_Size (V) = Param_Stack_Size (V'Old)
    and then Param_Peek (V, 0) = Param_Peek (V'Old, 2)
    and then Param_Peek (V, 1) = Param_Peek (V'Old, 0)
    and then Param_Peek (V, 2) = Param_Peek (V'Old, 1)
    and then Param_Stack_Equal_From_Bottom_Until (V, V'Old, V.Param_Top - 3)
    and then Only_Param_Stack_Changed (V, V'Old)));

Param_Stack_Equal_From_Bottom_Until is a nifty and small function I wrote to help the stack checks, and it uses Ada's quantified expressions:

function Param_Stack_Equal_From_Bottom_Until
    (A, B : VM; Depth : Natural) return Boolean
is (for all X in 1 .. Depth => A.Params (X) = B.Params (X))
with Ghost, Global => null, Pre => Depth <= Max_Param_Stack_Size;

The prover doesn't understand what subprograms do, just their preconditions and their postconditions. Anything not listed in a postcondition, or guaranteed elsewhere by a Type_Invariant, Abstract_State, Static_Predicate or Dynamic_Predicate has an unknown state. This opaqueness is crucial to understand as I had multiple occasions where the solution to fix a proof was simply, "I need to add more postconditions."

Other words, like :, are encoded with a few guarantees on correctness, partially due to time constraints, which leads to the next point.

procedure Op_Left_Bracket (V : in out VM)
with Global => null, Pre => Is_Running (V), Post => not Is_Compiling (V);

"It's Got Electrolytes" and "Verified"

SPARK guarantees some checks. Absence of Runtime Errors means no uncaught exceptions get thrown, no out of bounds accesses are possible, and no arithmetic overflows and underflows can occur. These always get checked.

Subprograms get proved that their outputs meet the postconditions when the preconditions are met.

However...

Preconditions and postconditions are only as good as what you feed into them. If you feed in junk, you're going to get out junk, as shown by this funny thing I forgot to replace.

All word codes are equal!

overriding
function "=" (A, B : Word_Code) return Boolean
is (True);

When iterating, I found that preliminarily putting in basic guardrails helped stand up functionality. Then slowly coaxing more and more constraints on each subprogram helped prevent getting too bogged down in proofs so that it felt like I was still making progress. It was painful to set up a bunch of proofs, find out that I was structuring something incorrectly and then ripping it out.

This also means that adding behavior while constructing your program can result in cases where the code would be valid, but because the constraints haven't been put in place, its usage is more limited. An example of this is when adding new state to the virtual machine, any previously proved subprograms will not have verified that the new state did not change. This can cause failing proofs when additional preconditions or postconditions are added in code which calls these subprograms.

This incremental ability to proof is interesting as you can mix SPARK and Ada code. You could start a project in Ada and then slowly move stable sections into SPARK. I'd suspect a critical system would be specified more in-depth before implementation, to prevent missed checks like the one above.

To Words

Custom word definitions in Forth are called "colon definitions." Having custom word definitions was critically important to me as an implementer, since it means that it is closer to a "true" implementation of a programming language.

My verified postfix calculator hard coded the mapping of tokens to operations, so I needed a way to do this at runtime.

Since I am sans pointers in SPARK, this turned into an array each for word headers and word names. "Word headers" describe the name and how a word is executed.

--  A consistently sized description of a word.
type Word_Header is record
    -- Range of the user-usable name for this word.
    Name_Start : Name_Index := 1;
    Name_End   : Name_Index := 1;
    Code       : Word_Code := (Form => Form_Intrinsic, Intrinsic => Nop);
    Immediate  : Boolean := False;
end record;

type Word_Table is record
    -- Character storage for user inputs for all words.
    Names           : String (1 .. Word_Name_Storage_Size) := [others => ' '];
    Name_Space_Used : Name_Space_Count := 0;
    Words           : Word_Array;
    Words_Used      : Word_Count := 0;
end record;

Indexes into the word table doubled as "execution tokens" in this Forth-like.

Max_Words : constant := 256;
subtype Word_Count is Natural range 0 .. Max_Words;
subtype Word_Index is Word_Id range 1 .. Max_Words;
type Word_Array is array (Word_Index) of Word_Header;

Using Ada's type system, I created different semantically unique and distinct integer types for the indexes into these arrays. Subtypes allow introduction of new bounds which SPARK will use in proofs. If you coerce something into another type, SPARK will force it to previously been bounds checked or proved to be a valid conversion. This ensures indices into array are always valid.

Using a bulk allocation for word names also isn't a big deal since Ada's strings are character arrays (slices) with a length and not null-terminated.

procedure Allocate_Word
    (Table     : in out Word_Table;
    Name      : String;
    Code      : Word_Code;
    Immediate : Boolean;
    Has_Value : Boolean)
with
    Pre  =>
        Can_Allocate_Word (Table)
        and then Name'Length in Word_Length
        and then Can_Allocate_Name (Table, Name'Length),
    Post =>
        Table.Words (Word_Index (Table.Words_Used)).Name_Start
        = Table.Name_Space_Used'Old + 1
        and then Table.Words (Word_Index (Table.Words_Used)).Name_End
                = Table.Words (Word_Index (Table.Words_Used)).Name_Start
                    - 1
                    + Name'Length
        and then Table.Words_Used = Table.Words_Used'Old + 1
        and then Table.Name_Space_Used
                = Table.Name_Space_Used'Old + Name_Space_Count (Name'Length);

The Pre checks catch instances where the appropriate checks for allocation have not yet been made. When this procedure is called from SPARK code, those preconditions are guaranteed.

--  The appropriate name was given from the table.
Table.Words (Word_Index (Table.Words_Used)).Name_Start = Table.Name_Space_Used'Old + 1
and then Table.Words (Word_Index (Table.Words_Used)).Name_End = Table.Words (Word_Index (Table.Words_Used)).Name_Start - 1 + Name'Length

--  One additional word was allocated.
and then Table.Words_Used = Table.Words_Used'Old + 1

--  The correct amount of allocated space was used.
and then Table.Name_Space_Used = Table.Name_Space_Used'Old + Name_Space_Count (Name'Length);

Executing words

My initial plan for implementing words was to have built-ins which were native Ada procedures, whereas compiled functions would run the address interpreter to run provided code.

A big hiccup there is that SPARK doesn't allow side effects when using access to subprograms (function pointers).

This resulted in me splitting word code across three boundaries, instead of the two I was hoping. I used a discriminant (tagged union) for the executable part of each word, with instructions stored collectively in one large block, similar to names.

type Word_Form is (Form_Intrinsic, Form_Procedure_Access, Form_Instructions);

type Op_Intrinsic is (Nop, Clear_Error, Words, Print, CR, Print_Stack, Dump_VM);
type Op_Procedure is not null access procedure (V : in out VMS.VM);

type Word_Code (Form : Word_Form := Form_Intrinsic) is record
    case Form is
        when Form_Intrinsic =>
            Intrinsic : Op_Intrinsic := Nop;

        when Form_Procedure_Access =>
            Builtin : Op_Procedure;

        when Form_Instructions =>
            Start : Instruction_Address := 1;
    end case;
end record;

Executing words looks roughly like this:

procedure Execute (V : in out VM; Op : Word_Id) is
begin
    --  Check if stopped

    --  Ensure word id is valid

    case V.Words.Words (Op).Code.Form is
        when Form_Intrinsic        =>
            --  Switch on the stored intrinsic to execute.

        when Form_Procedure_Access =>
            --  Call the stored builtin procedure

        when Form_Instructions     =>
            --  Start or push return address and jump to the stored instruction start.
    end case;
end Execute;

Instructions

Instructions were handled roughly the same way as names. A big bulk storage for possible instructions, and a count of the number of instructions. The address interpreter gets kicked off or continued when executing a user-defined word.

SPARK ensures that no invalid instructions get executed and that the instruction pointer stays in the array of valid instructions.

--  Instruction memory of execution tokens used for compiled words.
Instructions : Cell_Array (1 .. Max_Instructions) := [others => 0];

--  Instruction pointer.  Points to the next instruction to execute when
--  executing the current instruction.
IP : Instruction_Address := 1;

--  The total number of instructions written into the instruction array.
--  Only this number of instructions are valid and should be run.
--  This is only updated when exiting compiling mode.
Num_Instructions : Instruction_Count := 0;

SPARK allowed proving properties of some words which write into this instruction block.

For instructions, IF verifies the current instruction position gets written to the top of the stack. The idea here is that IF inserts a conditional jump and then a distance to jump. The distance isn't known, so an empty value gets written in the instruction array and filled in later. The value placed at the top of the stack is that location in the instruction array to fill in.

procedure Op_If (V : in out VM)
with
Pre  => Is_Running (V),
Post =>
    (not Is_Running (V))
    or else (Is_Running (V)
            and then Same_Words (V, V'Old)
            and then Same_Address_Interpreter (V, V'Old)
            and then V.Param_Top = V'Old.Param_Top + 1
            and then V.Params (V.Param_Top) = V.Num_Instructions
            and then V.Num_Instructions = V.Num_Instructions'Old + 2);

That instruction place must be a valid instruction address, and THEN must write an appropriate and valid instruction address. This just verifies that the jump stays in the array, it doesn't verify that this jump doesn't jump outside the instructions of the current word being defined. This could be done by adding more state to track the first instruction, but I ran out of time during the holidays.

procedure Op_Then (V : in out VM)
with
Pre  => Is_Running (V),
Post =>
    (not Is_Running (V))
    or else (V.Param_Top = V'Old.Param_Top - 1
            and then Param_Peek (V'Old) in Instruction_Address
            and then Max_Instructions - Param_Peek (V'Old)
                        >= V.Instructions
                            (Positive (Param_Peek (V'Old))
                            --  The jump written should be a valid IP.
                            )
            and then Is_Valid_Jump
                        (V
                        --  Origin
                        ,
                        Param_Peek
                            (V'Old
                            --  Offset
                            )
                        + V.Instructions (Positive (Param_Peek (V'Old)))));

function Is_Valid_Jump (V : VM; IP : Cell) return Boolean
is (IP >= 1 and then IP <= V.Num_Instructions + 1)
with Global => null;

Next Steps

The biggest gaps in the implementation are that it does not support variables and constants and that it's missing DOES>. Supporting floating point would also be a good addition.

As-is, the current VM is interesting, but not super useful. It will require another level of restructuring so that the VM instead of just being a VM, different backends could be used. This would allow emitting x86 or ARM assembly, or creation of ELFs or PE files.

Summary

I wrote a Forth-like virtual machine in Ada/SPARK, with a lot of proved properties. I wasn't entirely sure it was a doable project when I started, but it looks like a third pass would move me much closer to a more generally usable Forth.

Working with the SPARK subset of Ada continues to get easier and faster. When I started first trying to write any verified code, it felt like a complete slog, like I was etching every line of code with a chisel on stone. There's been a lot of improvements making it simpler to get started, such as integration with Alire. It also seems easier to get proofs to work, it definitely seems like there's been improvements on SPARK side, though I've learned better ways of writing SPARK code.

Learning about Forth and writing a Forth-like was a lot of fun, and I'd highly recommend trying out Forth if you've never use it before.

Anteforth's code is available here.