Theory Compiler

(*  Title:       A Shorter Compiler Correctness Proof for Language IMP
    Author:      Pasquale Noce
                 Software Engineer at HID Global, Italy
                 pasquale dot noce dot lavoro at gmail dot com
                 pasquale dot noce at hidglobal dot com
*)

section "Compiler formalization"

theory Compiler
  imports
    "HOL-IMP.BExp"
    "HOL-IMP.Star"
begin


subsection "Introduction"

text ‹
This paper presents a compiler correctness proof for the didactic imperative programming language
IMP, introduced in \<^cite>‹"Nipkow14"›, shorter than the proof described in \<^cite>‹"Nipkow14"› and
included in the Isabelle2021 distribution \<^cite>‹"Klein21"›. Actually, the size of the presented
proof is just two thirds of the book's proof in the number of formal text lines, and as such it
promises to constitute a further enhanced reference for the formal verification of compilers meant
for larger, real-world programming languages.

Given compiler \emph{completeness}, viz. the simulation of source code execution by compiled code,
"in a deterministic language like IMP", compiler correctness "reduces to preserving termination: if
the machine program terminates, so must the source program", even though proving this "is not much
easier" (\<^cite>‹"Nipkow14"›, section 8.4). However, the presented proof does not depend on language
determinism, so that the proposed approach is applicable to non-deterministic languages as well.

As a confirmation, this paper extends IMP with an additional command @{text "c⇩₁ OR c⇩₂"}, standing
for the non-deterministic choice between commands @{text c⇩₁} and @{text c⇩₂}, and proves compiler
\emph{correctness}, viz. the simulation of compiled code execution by source code, for such extended
language. Of course, the aforesaid comparison between proof sizes does not consider the lines in the
proof of lemma @{text ccomp_correct} (which proves compiler correctness for commands) pertaining to
non-deterministic choice, since this command is not included in the original language IMP. Anyway,
non-deterministic choice turns out to extend that proof just by a modest number of lines.

For further information about the formal definitions and proofs contained in this paper, see
Isabelle documentation, particularly \<^cite>‹"Paulson21"›, \<^cite>‹"Nipkow21"›, \<^cite>‹"Krauss21"›,
and \<^cite>‹"Nipkow11"›.
›


subsection "Definitions"

text ‹
Here below are the definitions of IMP commands, extended with non-deterministic choice, as well as
of their big-step semantics.

As in the original theory file \<^cite>‹"Klein21"›, program counter's values are modeled using type
@{typ int} rather than @{typ nat}. As a result, the same declarations and definitions used in
\<^cite>‹"Klein21"› to deal with this modeling choice are adopted here as well.

\null
›

declare [[coercion_enabled]]
declare [[coercion "int :: nat ⇒ int"]]
declare [[syntax_ambiguity_warning = false]]

datatype com =
  SKIP |
  Assign vname aexp  (‹_ ::= _› [1000, 61] 61) |
  Seq com com  (‹_;;/ _› [60, 61] 60) |
  If bexp com com  (‹(IF _/ THEN _/ ELSE _)› [0, 0, 61] 61) |
  Or com com  (‹(_ OR _)› [60, 61] 61) |
  While bexp com  (‹(WHILE _/ DO _)› [0, 61] 61)

inductive big_step :: "com × state ⇒ state ⇒ bool" (infix ‹⇒› 55) where
Skip:  "(SKIP, s) ⇒ s" |
Assign:  "(x ::= a, s) ⇒ s(x := aval a s)" |
Seq:  "⟦(c⇩₁, s⇩₁) ⇒ s⇩₂; (c⇩₂, s⇩₂) ⇒ s⇩₃⟧ ⟹ (c⇩₁;; c⇩₂, s⇩₁) ⇒ s⇩₃" |
IfTrue:  "⟦bval b s; (c⇩₁, s) ⇒ t⟧ ⟹ (IF b THEN c⇩₁ ELSE c⇩₂, s) ⇒ t" |
IfFalse:  "⟦¬ bval b s; (c⇩₂, s) ⇒ t⟧ ⟹ (IF b THEN c⇩₁ ELSE c⇩₂, s) ⇒ t" |
Or1:  "(c⇩₁, s) ⇒ t ⟹ (c⇩₁ OR c⇩₂, s) ⇒ t" |
Or2:  "(c⇩₂, s) ⇒ t ⟹ (c⇩₁ OR c⇩₂, s) ⇒ t" |
WhileFalse:  "¬ bval b s ⟹ (WHILE b DO c, s) ⇒ s" |
WhileTrue:  "⟦bval b s⇩₁; (c, s⇩₁) ⇒ s⇩₂; (WHILE b DO c, s⇩₂) ⇒ s⇩₃⟧ ⟹
  (WHILE b DO c, s⇩₁) ⇒ s⇩₃"

declare big_step.intros [intro]

abbreviation (output)
"isize xs ≡ int (length xs)"

notation isize (‹size›)

primrec (nonexhaustive) inth :: "'a list ⇒ int ⇒ 'a" (infixl ‹!!› 100) where
"(x # xs) !! i = (if i = 0 then x else xs !! (i - 1))"

lemma inth_append [simp]:
 "0 ≤ i ⟹
    (xs @ ys) !! i = (if i < size xs then xs !! i else ys !! (i - size xs))"
by (induction xs arbitrary: i, auto simp: algebra_simps)

text ‹
\null

Next, the instruction set and its semantics are defined. Particularly, to allow for the compilation
of non-deterministic choice commands, the instruction set is extended with an additional instruction
@{text JMPND} performing a non-deterministic jump -- viz. as a result of its execution, the program
counter unconditionally either jumps by the specified offset, or just moves to the next instruction.

As instruction execution can be non-deterministic, an inductively defined predicate @{text iexec},
rather than a simple non-recursive function as the one used in \<^cite>‹"Klein21"›, must be introduced
to define instruction semantics.

\null
›

datatype instr = 
  LOADI int | LOAD vname | ADD | STORE vname |
  JMP int | JMPLESS int | JMPGE int | JMPND int

type_synonym stack = "val list"
type_synonym config = "int × state × stack"

abbreviation "hd2 xs ≡ hd (tl xs)"
abbreviation "tl2 xs ≡ tl (tl xs)"

inductive iexec :: "instr × config ⇒ config ⇒ bool" (infix ‹↦› 55) where
LoadI:  "(LOADI i, pc, s, stk) ↦ (pc + 1, s, i # stk)" |
Load:  "(LOAD x, pc, s, stk) ↦ (pc + 1, s, s x # stk)" |
Add:  "(ADD, pc, s, stk) ↦ (pc + 1, s, (hd2 stk + hd stk) # tl2 stk)" |
Store:  "(STORE x, pc, s, stk) ↦ (pc + 1, s(x := hd stk), tl stk)" |
Jmp:  "(JMP i, pc, s, stk) ↦ (pc + i + 1, s, stk)" |
JmpLessY:  "hd2 stk < hd stk ⟹
  (JMPLESS i, pc, s, stk) ↦ (pc + i + 1, s, tl2 stk)" |
JmpLessN:  "hd stk ≤ hd2 stk ⟹
  (JMPLESS i, pc, s, stk) ↦ (pc + 1, s, tl2 stk)" |
JmpGeY:  "hd stk ≤ hd2 stk ⟹
  (JMPGE i, pc, s, stk) ↦ (pc + i + 1, s, tl2 stk)" |
JmpGeN:  "hd2 stk < hd stk ⟹
  (JMPGE i, pc, s, stk) ↦ (pc + 1, s, tl2 stk)" |
JmpNdY:  "(JMPND i, pc, s, stk) ↦ (pc + i + 1, s, stk)" |
JmpNdN:  "(JMPND i, pc, s, stk) ↦ (pc + 1, s, stk)"

declare iexec.intros [intro]

inductive_cases LoadIE  [elim!]:  "(LOADI i, pc, s, stk) ↦ cf"
inductive_cases LoadE  [elim!]:  "(LOAD x, pc, s, stk) ↦ cf"
inductive_cases AddE  [elim!]:  "(ADD, pc, s, stk) ↦ cf"
inductive_cases StoreE  [elim!]:  "(STORE x, pc, s, stk) ↦ cf"
inductive_cases JmpE  [elim!]:  "(JMP i, pc, s, stk) ↦ cf"
inductive_cases JmpLessE  [elim!]:  "(JMPLESS i, pc, s, stk) ↦ cf"
inductive_cases JmpGeE  [elim!]:  "(JMPGE i, pc, s, stk) ↦ cf"
inductive_cases JmpNdE  [elim!]:  "(JMPND i, pc, s, stk) ↦ cf"

definition exec1 :: "instr list ⇒ config ⇒ config ⇒ bool"
  (‹(_/ ⊢/ _/ →/ _)› 55) where
"P ⊢ cf → cf' ≡ (P !! fst cf, cf) ↦ cf' ∧ 0 ≤ fst cf ∧ fst cf < size P"

abbreviation exec :: "instr list ⇒ config ⇒ config ⇒ bool"
  (‹(_/ ⊢/ _/ →*/ _)› 55) where
"exec P ≡ star (exec1 P)"

text ‹
\null

Next, compilation is formalized for arithmetic and boolean expressions (functions @{text acomp} and
@{text bcomp}), as well as for commands (function @{text ccomp}). Particularly, as opposed to what
happens in \<^cite>‹"Klein21"›, here @{text bcomp} takes a single input, viz. a 3-tuple comprised of
a boolean expression, a flag, and a jump offset. In this way, all three functions accept a single
input, which enables to streamline the compiler correctness proof developed in what follows.

\null
›

primrec acomp :: "aexp ⇒ instr list" where
"acomp (N i) = [LOADI i]" |
"acomp (V x) = [LOAD x]" |
"acomp (Plus a⇩₁ a⇩₂) = acomp a⇩₁ @ acomp a⇩₂ @ [ADD]"

fun bcomp :: "bexp × bool × int ⇒ instr list" where
"bcomp (Bc v, f, i) = (if v = f then [JMP i] else [])" |
"bcomp (Not b, f, i) = bcomp (b, ¬ f, i)" |
"bcomp (And b⇩₁ b⇩₂, f, i) =
  (let cb⇩₂ = bcomp (b⇩₂, f, i);
     cb⇩₁ = bcomp (b⇩₁, False, size cb⇩₂ + (if f then 0 else i))
   in cb⇩₁ @ cb⇩₂)" |
"bcomp (Less a⇩₁ a⇩₂, f, i) =
  acomp a⇩₁ @ acomp a⇩₂ @ (if f then [JMPLESS i] else [JMPGE i])"

primrec ccomp :: "com ⇒ instr list" where
"ccomp SKIP = []" |
"ccomp (x ::= a) = acomp a @ [STORE x]" |
"ccomp (c⇩₁;; c⇩₂) = ccomp c⇩₁ @ ccomp c⇩₂" |
"ccomp (IF b THEN c⇩₁ ELSE c⇩₂) =
  (let cc⇩₁ = ccomp c⇩₁; cc⇩₂ = ccomp c⇩₂; cb = bcomp (b, False, size cc⇩₁ + 1)
   in cb @ cc⇩₁ @ JMP (size cc⇩₂) # cc⇩₂)" |
"ccomp (c⇩₁ OR c⇩₂) =
  (let cc⇩₁ = ccomp c⇩₁; cc⇩₂ = ccomp c⇩₂
   in JMPND (size cc⇩₁ + 1) # cc⇩₁ @ JMP (size cc⇩₂) # cc⇩₂)" |
"ccomp (WHILE b DO c) =
  (let cc = ccomp c; cb = bcomp (b, False, size cc + 1)
   in cb @ cc @ [JMP (- (size cb + size cc + 1))])"

text ‹
\null

Finally, two lemmas are proven automatically (both seem not to be included in the standard library,
though being quite basic) and registered for use by automatic proof tactics. In more detail:

  ▪ The former lemma is an elimination rule similar to @{thm [source] impCE}, with the difference
that it retains the antecedent of the implication in the premise where the consequent is assumed to
hold. This rule permits to have both assumptions @{prop "¬ bval b s"} and @{prop "bval b s"} in the
respective cases resulting from the execution of boolean expression @{text b} in state @{text s}.

  ▪ The latter one is an introduction rule similar to @{thm [source] Suc_lessI}, with the difference
that its second assumption is more convenient for proving statements of the form @{prop "Suc m < n"}
arising from the compiler correctness proof developed in what follows.

\null
›

lemma impCE2 [elim!]:
 "⟦P ⟶ Q; ¬ P ⟹ R; P ⟹ Q ⟹ R⟧ ⟹ R"
by blast

lemma Suc_lessI2 [intro!]:
 "⟦m < n; m ≠ n - 1⟧ ⟹ Suc m < n"
by simp

end