Writing a "Forth-like" in SPARK
This is the second stop on my journey writing a verified Forth. The final goal would be to make a verified Forth (like seedForth) which could be used as a compiler backend as described on the Ada forum.
Previously, I wrote a postfix calculator to learn more about the SPARK subset of Ada, and to see if a small postfix calculator could have its operations verified. I've never worked with Forth before, so this step was to understand that language more and the implementation issues.
Forth Background
Forth is a neat concatenative programming language in how much power it can get from its minimal syntax. The primary element in the syntax is the word, represented in text as a sequence of non-whitespace characters.
Words operate on a variety of data structures: a stack containing data (parameter stack), a stack containing return addresses (return stack), a text input (terminal input buffer), and a dictionary of words.
Forth programs are lists of words. Words do everything, from comments, to doing arithmetic, to starting and terminating definitions of custom words.
Forth's directness promotes its use in bootstrapping systems, and a lot of implementations I came across directly emitted assembly. I wanted to avoid complications dealing with assembly and its implications with the prover, so instead I implemented a Forth-like virtual machine. I'll call it "Forth-like" since I'm not sure what subset of language features minimally constitutes a "Forth," and I'm not looking to raise anyone's ire or bikeshed.
Wanting to run it interactively and from a file, I implemented a tokenizer. Some functions typically implemented in the Forth itself (like the text interpreter) were done in SPARK. This allows some behavior to be verified, like that the tokenizer terminates.
The first Apollo mission didn't immediately land on the moon. Maybe the next generation of what I write will be a seedForth-like implementation.
My goal was to support defining custom words and have some form of control flow. I didn't get to variables and constants (! or @) or to DOES>. This has been a great experience so far, so I might add those in the future.
Verification with SPARK
This Forth implementation verifies specific behavior with SPARK as begun and described in the postfix calculator project. If the VM remains in an "Ok" state provided no error occurs with these errors detected and caught:
- arithmetic would overflow or underflow
- data or return stack overflowed or underflowed
- any word, name, and instruction allocations failed
- any invalid word attempts execution
Operations are proved to various degrees.
Stack operations prove that their results are correct, such as ROT (rotate). 1 2 3 on the stack is proven to result in 2 3 1 on the stack, unless there's not enough elements, which will result in an error.
This is what the description of ROT looks like.
This procedure does not depend on any global variables:
The virtual machine must be running:
Contract_Cases provides different guarantees based on which condition matches when entering the procedure.
If there's not enough values, then an error is signaled:
Otherwise, the values have been rotated appropriately and nothing else in the VM was changed.
Param_Stack_Equal_From_Bottom_Until is a nifty and small function I wrote to help the stack checks, and it uses Ada's quantified expressions:
The prover doesn't understand what subprograms do, just their preconditions and their postconditions. Anything not listed in a postcondition, or guaranteed elsewhere by a Type_Invariant, Abstract_State, Static_Predicate or Dynamic_Predicate has an unknown state. This opaqueness is crucial to understand as I had multiple occasions where the solution to fix a proof was simply, "I need to add more postconditions."
Other words, like :, are encoded with a few guarantees on correctness, partially due to time constraints, which leads to the next point.
"It's Got Electrolytes" and "Verified"
SPARK guarantees some checks. Absence of Runtime Errors means no uncaught exceptions get thrown, no out of bounds accesses are possible, and no arithmetic overflows and underflows can occur. These always get checked.
Subprograms get proved that their outputs meet the postconditions when the preconditions are met.
However...
Preconditions and postconditions are only as good as what you feed into them. If you feed in junk, you're going to get out junk, as shown by this funny thing I forgot to replace.
All word codes are equal!
When iterating, I found that preliminarily putting in basic guardrails helped stand up functionality. Then slowly coaxing more and more constraints on each subprogram helped prevent getting too bogged down in proofs so that it felt like I was still making progress. It was painful to set up a bunch of proofs, find out that I was structuring something incorrectly and then ripping it out.
This also means that adding behavior while constructing your program can result in cases where the code would be valid, but because the constraints haven't been put in place, its usage is more limited. An example of this is when adding new state to the virtual machine, any previously proved subprograms will not have verified that the new state did not change. This can cause failing proofs when additional preconditions or postconditions are added in code which calls these subprograms.
This incremental ability to proof is interesting as you can mix SPARK and Ada code. You could start a project in Ada and then slowly move stable sections into SPARK. I'd suspect a critical system would be specified more in-depth before implementation, to prevent missed checks like the one above.
To Words
Custom word definitions in Forth are called "colon definitions." Having custom word definitions was critically important to me as an implementer, since it means that it is closer to a "true" implementation of a programming language.
My verified postfix calculator hard coded the mapping of tokens to operations, so I needed a way to do this at runtime.
Since I am sans pointers in SPARK, this turned into an array each for word headers and word names. "Word headers" describe the name and how a word is executed.
Indexes into the word table doubled as "execution tokens" in this Forth-like.
Using Ada's type system, I created different semantically unique and distinct integer types for the indexes into these arrays. Subtypes allow introduction of new bounds which SPARK will use in proofs. If you coerce something into another type, SPARK will force it to previously been bounds checked or proved to be a valid conversion. This ensures indices into array are always valid.
Using a bulk allocation for word names also isn't a big deal since Ada's strings are character arrays (slices) with a length and not null-terminated.
The Pre checks catch instances where the appropriate checks for allocation have not yet been made. When this procedure is called from SPARK code, those preconditions are guaranteed.
Executing words
My initial plan for implementing words was to have built-ins which were native Ada procedures, whereas compiled functions would run the address interpreter to run provided code.
A big hiccup there is that SPARK doesn't allow side effects when using access to subprograms (function pointers).
This resulted in me splitting word code across three boundaries, instead of the two I was hoping. I used a discriminant (tagged union) for the executable part of each word, with instructions stored collectively in one large block, similar to names.
Executing words looks roughly like this:
Instructions
Instructions were handled roughly the same way as names. A big bulk storage for possible instructions, and a count of the number of instructions. The address interpreter gets kicked off or continued when executing a user-defined word.
SPARK ensures that no invalid instructions get executed and that the instruction pointer stays in the array of valid instructions.
SPARK allowed proving properties of some words which write into this instruction block.
For instructions, IF verifies the current instruction position gets written to the top of the stack. The idea here is that IF inserts a conditional jump and then a distance to jump. The distance isn't known, so an empty value gets written in the instruction array and filled in later. The value placed at the top of the stack is that location in the instruction array to fill in.
That instruction place must be a valid instruction address, and THEN must write an appropriate and valid instruction address. This just verifies that the jump stays in the array, it doesn't verify that this jump doesn't jump outside the instructions of the current word being defined. This could be done by adding more state to track the first instruction, but I ran out of time during the holidays.
Next Steps
The biggest gaps in the implementation are that it does not support variables and constants and that it's missing DOES>. Supporting floating point would also be a good addition.
As-is, the current VM is interesting, but not super useful. It will require another level of restructuring so that the VM instead of just being a VM, different backends could be used. This would allow emitting x86 or ARM assembly, or creation of ELFs or PE files.
Summary
I wrote a Forth-like virtual machine in Ada/SPARK, with a lot of proved properties. I wasn't entirely sure it was a doable project when I started, but it looks like a third pass would move me much closer to a more generally usable Forth.
Working with the SPARK subset of Ada continues to get easier and faster. When I started first trying to write any verified code, it felt like a complete slog, like I was etching every line of code with a chisel on stone. There's been a lot of improvements making it simpler to get started, such as integration with Alire. It also seems easier to get proofs to work, it definitely seems like there's been improvements on SPARK side, though I've learned better ways of writing SPARK code.
Learning about Forth and writing a Forth-like was a lot of fun, and I'd highly recommend trying out Forth if you've never use it before.