Theory AutoCorresInfrastructure
chapter ‹Overview of AutoCorres \label{chap:AutoCorresInfrastructure}›
theory AutoCorresInfrastructure
imports AutoCorres
begin
text ‹
This theory elaborates on some of the (internal) AutoCorres infrastructure and is also supposed to
act as a testbench for this infrastructure, e.g. simplifier setup.
Following the Isabelle tradition user relevant changes to AutoCorres are described in
the file 🗏‹../NEWS›.
›
section ‹Building Blocks›
text ‹'AutoCorres' has the following major building blocks and contributions.
▪ The 'C-parser' (by Michael Norrish) takes C-Input files and parses them to 'SIMPL' (by Norbert Schirmer) programs within Isabelle/HOL:
▪ C-parser: \autoref{chap:strictc}
▪ SIMPL: @{url ‹https://www-wjp.cs.uni-saarland.de/leute/private_homepages/nschirmer/pub/schirmer_phd.pdf›}
▪ The unified memory model (UMM) (by Harvey Tuch) models C-types as HOL-types with conversions between
the typed view and the raw-byte-level view. Moreover it provides a separation logic framework.
▪ UMM: @{url ‹https://trustworthy.systems/publications/papers/Tuch%3Aphd.pdf›}
▪ AutoCorres (by David Greenaway): @{url ‹https://trustworthy.systems/publications/nicta_full_text/8758.pdf›}
›
text‹
Some historic remarks, that might give some insight why things are as they are.
The motivation for AutoCorres was the C-verification tasks coming from the sel4 Projects
@{url ‹https://sel4.systems›}, a verified microkernel, written in C.
The project started with high level HOL-specifications, refining them to monadic functions (within
HOL and Haskell) and then refining them to C-code. The nondetermistic state-monad
goes back to Cook. et al 🌐‹http://www.cse.unsw.edu.au/~kleing/papers/Cock_KS_08.pdf›.
To link the C-code with the monadic functions
within Isabelle / HOL the C-parser was written. The C-parser produces SIMPL programs, and
SIMPL is equipped with various semantics (big and small-step) and a Hoare-logic framework including
a verification condition generator.
The correspondence of the C-functions represented in SIMPL and the monadic functions was manually
expressed within this framework.
AutoCorres was then built after (or in parallel) to this verification effort in order to speed
up the process for future projects.
While the main concepts described in the publications above are still valid, a lot
of things have evolved and were extended. Especially the implementation details have
changed. In this document we also give some notes on these differences. Also see
🗏‹../NEWS›.
›
section ‹C Parser›
text ‹
Some remarks on the C-Parser
▪ Lexical and syntactical analysis is implemented using ML-Lex / ML-Yacc coming from the MLton project
🌐‹http://mlton.org/›. Originally mlton was used to generate the ML files from the grammers.
Meanwhile this is all integrated into Isabelle/ML.
▪ The C parser first generates an ML-data structure representing the program. It then does some
transformations on that data structure (e.g. storing results of nested function calls into temporary
variables), before translating it to SIMPL. In roughly the following steps:
▸ Generating HOL-types for the C-types.
▪ The C parser performs a complete program analysis to determine the types used.
▪ Defines the types and performs the UMM proofs, in particular conversions and properties to and
from raw byte lists.
▸ Defining the state-space type: global variables, local variables, heap variables. The C parser
does a complete program analysis to determine how global and heap variables are used to decide whether
to model them as part of the globals-record or the heap. When no 'address-of' a variable is
calculated it is stored as part of the global variable record, otherwise it is stored within
the heap. Global variables are more abstract to deal with compared to heap variables.
▸ Defining the SIMPL procedures
▸ Proving 'modifies' specifications for the procedures, i.e. frame conditions on global variables.
›
section ‹AutoCorres›
text ‹AutoCorres abstracts the SIMPL program to a monadic HOL function in various phases, producing
correspondence (a.k.a. simulation, a.k.a. refinement) theorems along the way. Note that the translation
(and the correspondence theorems) might only be partial in the following sense: A phase might
introduce additional guards into the code. Simulation only holds for programs that do not fail
on the abstract execution. Note that non-termination is currently also modelled as failure. So
simulation only holds for terminating abstract programs:
▪ Correspondence relation: @{const ac_corres}
Each phase is processed in similar stages:
▪ Raw translation and refinement-proof
▪ Optimisation of the output program, sometimes distinguished between a more lightweight
"peephole" optimisation and a more heavyweight "flow" optimisation
▪ Define a constant for the output of the translation for that phase.
The phases are:
▪ From SIMPL (deep embedding of statements, shallow embedding of expressions) to L1 (shallow
embedding of statements and expressions). The transformation preserves the state-space representation,
and merely focuses on translating SIMPL statements to monadic functions:
▪ Definition of L1: 🗏‹../L1Defs.thy›
▪ Correspondence relation: @{const L1corres}
▪ Peephole optimisation: 🗏‹../L1Peephole.thy›
▪ Flow optimisation: 🗏‹../ExceptionRewrite.thy›
▪ Main ML file: 🗏‹../simpl_conv.ML›
▪ Local variable abstraction, from L1 to L2. In L2, local variables are represented as lambda
abstractions. So the underlying state-space type changes, getting rid of the local variable record.
As local variables are now treated as lambda abstractions, and thus names become meaningless,
autocorres keeps the original names around as part of (logically redundant) name annotations within
the L2 constants, e.g. @{term "L2_gets (λ_. x) [c_source_name_hint]"}. Lists of names are
used, as variables might pile up in tuples. These annotations are removed in the final
polishing phase of autocorres, attempting to rename the bound variables accordingly on the fly.
▪ Definition of L2: 🗏‹../L2Defs.thy›
▪ Correspondence relation: @{const L2corres}
▪ Exception rewriting, introducing nested exceptions: 🗏‹../L2ExceptionRewrite.thy›
▪ Peephole optimisation: 🗏‹../L2Peephole.thy›
▪ Main ML files: 🗏‹../local_var_extract.ML›, 🗏‹../l2_opt.ML›
From now on we stay in the language L2, and also the optimisation stages are reused.
▪ In/Out parameter abstraction (IO). Pointer parameters are replaced by passing value parameters.
This step may eliminate pointers to local variables.
▪ Theory: 🗏‹../In_Out_Parameters.thy›
▪ Correspondence relation: @{const IOcorres}
▪ Documentation: 🗏‹In_Out_Parameters_Ex.thy›
▪ Heap abstraction, referred to as heap-lifting (HL).
The monolithic-byte-oriented heap is abstracted to a typed-split-heap model. For
that the new type ‹lifted_globals› is introduced. Note, that the separation logic of Tuch also
attempts to give a typed view on top of the byte-level-heap. In case of nested structure types
it even is capable to express the various levels of typed-heap views from bytes to individual structure fields
to (nested) structures. AutoCorres takes a simpler, more pragmatic approach here and only
provides a single split-heap abstraction on the level of complete types. This means there
is a heap for every compound type and for every primitive type but not for nested
structures. However, autocorres allows to
mix between the abstractions layers (split-heap vs. byte-heap) on the granularity of functions.
So you can model low-level functions like memory-allocators on the untyped byte-heap and then still
use these functions within other functions in the split-heap abstraction.
Meanwhile, genuine support for a split-heap approach with addressable nested structures was added. It is
described in 🗏‹open_struct.thy›.
▪ Main theory: 🗏‹../HeapLift.thy›
▪ Correspondence relation: @{const L2Tcorres}
▪ Main ML files: 🗏‹../heap_lift_base.ML›, 🗏‹../heap_lift.ML›
▪ Word abstraction (WA): (un)signed n-bit words are converted to @{typ int} or @{typ nat}. This
conversion only affects local variable (lambda abstraction), so the underlying state-space type
is maintained. When reading and storing to memory the values are converted accordingly.
▪ Main theory: 🗏‹../WordAbstract.thy›
▪ Correspondence relation: @{const corresTA}
▪ Main ML file: 🗏‹../word_abstract.ML›
▪ Type strengthening (TS), sometimes also referred to as type lifting or type specialisation:
Best effort approach to simplify the structure of the monad as far as possible:
▪ Pure functions
▪ Reader monad, read-only state-dependant functions.
▪ Option (reader) monad, in particular for potential non-terminating functions (recursion / while).
Nontermination is treated as failure in autocorres and failure of guards are
represented as a result of @{term None}
▪ Nondeterministic state monad (exceptional control flow confined within function boundaries).
Failure (of guards) and nontermination are represented by @{const Failure} as outcome.
▪ Nondeterministic state monad with exit (exceptional control flow crossing function boundaries).
New monad types can be registered within Isabelle / ML.
▪ Main theories: 🗏‹../TypeStrengthen.thy›, 🗏‹../Refines_Spec.thy›
▪ Correspondence relation: @{const refines}. Originally this was actually
based on equality. As we now implement (mutually) recursive functions by a CCPO
@{command fixed_point} instead of introducing an explicit measure parameter we changed to
simulation. Equality is not @{const ccpo.admissible}, whereas simulation is.
▪ Main ML files: 🗏‹../type_strengthen.ML›, 🗏‹../monad_types.ML›, 🗏‹../monad_convert.ML›
▪ Polish. Performed as a part of type strengthening. Here remaining special and limited L2 definitions are
expanded to 'ordinary' monadic functions. Also annotations of original c variable names are removed
and bound variables are named accordingly.
›
section ‹AutoCorres Flow›
text ‹
The original AutoCorres implementation of David Greenaway was refined in several ways. In particular
from a high level perspective the concepts of incremental build and parallel processing where
unified:
▪ Parallel processing is moved to the outermost invocation of autocorres. All tasks
are calculated respecting the dependencies of the call graph and the various autocorres phases, cf.
@{ML AutoCorres.parallel_autocorres}.
▪ Every task then invokes a distinct call to @{ML AutoCorres.do_autocorres} exactly specifying
the scope (which function clique) and which phase via the options.
▪ The results of an invocation are maintained in generic data, such that they are available to
subsequent dependent calls.
The common parts of each phase are implemented in @{ML_structure AutoCorresUtil}.
The last phase, Type strengthening (TS), historically played a special role and still does not
make much use of @{ML_structure AutoCorresUtil}, but conceptually it follows the same idea:
Functions are processed in a sequence of cliques, from the bottom of the call graph to the top.
Here a clique is either a single function or a group of strongly connected recursive calls.
After processing of a phase the clique might get splitted due to dead code elimination. This
general idea is implemented in @{ML AutoCorresUtil.convert_and_define_cliques}.
›
subsection ‹General Remarks›
text ‹
▪ The main autocorres files, putting everything together:
🗏‹../AutoCorres.thy›, 🗏‹../autocorres.ML›
▪ There are two main approaches in a single phase to come up with an abstract program (output
of the phase) and the correspondence proof to the concrete program (input of the phase):
▪ Implicit synthesis of the abstract program from the concrete program by applying "intro" rules,
where the abstract program is initialised with a schematic variable. Prototypical examples are
the heap abstraction, word abstraction and type strengthening.
▪ First explicitly calculate the abstract program (fragment) within ML from the
concrete program (fragment), and then perform the correspondence proof. An prototypical
example is the local variable abstraction.
▪ The common aspects of the various stages within a single phase is (partially) generalised into
library functions: 🗏‹../autocorres_util.ML›
›
subsection ‹Links to more documentation / examples›
text ‹
▪ Pointers to local variables: 🗏‹pointers_to_locals.thy›
▪ In-Out parameters: 🗏‹In_Out_Parameters_Ex.thy›
▪ Addressable fields and open types: 🗏‹open_struct.thy›
▪ Function pointers: 🗏‹fnptr.thy›
▪ Unions: 🗏‹union_ac.thy›
›
section ‹Overview on the Locales›
text ‹The representation of local variables in SIMPL and L1 where changed from records
to 'Statespaces' as implemented by @{command statespace} and finally to
positional variables as in \<^typ>‹locals› in 🗏‹../c-parser/CLocals.thy›.
This allows for an uniform representation of local variables as a function from
'nat to byte list', Typing is achieved by ML data organised in locales and bundles,
cf. 🗏‹../c-parser/CTranslationInfrastructure.thy›.
These are also used to do the L1 and the L2 correspondence proofs. Starting from
L2 on the local variables are represented as lambda bindings.
We describe the various locales (and bundles) later on with our running examples.
To support function pointers even more locales where introduced. See 🗏‹fnptr.thy›.
›
section ‹Example Program›