Theory In_Out_Parameters_Ex
section "In-Out Parameters, Abstracting Pointers to Values"
theory In_Out_Parameters_Ex imports "AutoCorres"
begin
consts abs_step:: "word32 ⇒ word32 ⇒ bool"
subsection ‹Overview›
text ‹We introduce a new phase in AutoCorres to replace pointer parameters by ∗‹in/out parameters›:
Instead of passing a pointer into a function we pass the initial value of the pointer
(by dereferencing the pointer) into the function and return the value at the end of the function
as additional output. E.g. a function ▩‹void inc(unsigned *n)› becomes ▩‹unsigned inc(unsigned n)›.
The initial motivation for this phase was to get rid of explicit pointers to local variables
and replace them by ordinary values instead.
This conversion has two main aspects:
▪ Function signatures may change: pointer parameters become value parameters and
the function may return the value of the pointer at the end as an additonal
return value (tupled with the ordinary return value). Functions with side effects on the heap
may become pure functions on values by this transformation.
▪ As a result of the first transformation, pointers to local variables may be eliminated
and be replaced by bound variables. This means that \<^const>‹stack_heap_state.with_fresh_stack_ptr›
disappears.
The new phase is called ∗‹IO› and is placed between L2 and HL. This means local variables are
already represented as bound variables in L2 and we still operate on a monolithic heap.
›
subsection ‹Building Blocks›
text ‹
What ingredients do we need for the abstraction relation in the refinement proof?
▪ Heaps: As we eliminate pointers, the original function manipulates the heap more
often then the resulting function. So we need a notion to relate the heap states and keep track
of the relevant pointers. As the stack of local variable pointers is also modelled in the heap
the stack free locations are also of interest. The heaps of the original and the resulting
program may differ in these locations. The heap value of a stack free location should be
irrelevant for the program.
▪ Function signatures: The function signature changes, pointer parameters become ordinary values,
the return value is tupled with output parameters.
In the following we refer to the original state (containing the heap) as \<^term>‹s› and the
resulting / abstract state as \<^term>‹t›.
As I first idea we want to relate states \<^term>‹s› and \<^term>‹t› such that the heaps are the
same except for some pointers we want to eliminate and the stack free locations. It would be natural
to describe this as a relation. However, it turns out that dealing with relations within the
refinement proof are hard to make admissible in order to support recursive functions. Having a
abstraction / lifting function (instead of a relation) from \<^term>‹s› to \<^term>‹t› makes admissibility
straightforward.
As a prerequisite we encourage the reader to consider the documentation about
pointers to local variables: 🗏‹pointers_to_locals.thy›
›
context heap_state
begin
subsubsection ‹\<^const>‹frame››
text ‹We want to express that certain portions of the heap (typing and values) are 'irrelevant',
in particular regarding (free) stack space. The notion of 'irrelevant' is a bit vague, it means
that the behaviour of the resulting program does not depend on those locations and also that it does not
modify those locations. Moreover, we prefer an abstraction function rather than an
relation between states to avoid admissibility issues for refinement.
The central property is \<^term>‹t = frame A t⇩0 s›, for \<^term>‹A› being a set of addresses and
\<^term>‹t›, \<^term>‹t⇩0› and \<^term>‹s› being states. Think of \<^term>‹A› as the set of ∗‹allocated› addresses
containing those pointers we want to ∗‹abstract› aka. eliminate.›
thm frame_def
text ‹
The standard use case we have in mind is @{term "A ∩ stack_free (htd s) = {}"}, hence
@{prop "A - stack_free (htd s) = A"}, but nevertheless the intuition is:
▪ stack free typing from \<^term>‹s› is preserved, the framed sate has at least as many stack free
addresses as the original one. So we can simulate any stack allocation.
▪ heap values for stack free and \<^term>‹A› are taken from reference state \<^term>‹t⇩0›, this
captures that we do not depend on the original values in \<^term>‹s› for those addresses.
▪ typing for allocations in \<^term>‹A› is taken from reference state \<^term>‹t⇩0›, this captures
that we do not depend on the original typing in \<^term>‹s› for those addresses.
By taking the same reference state \<^term>‹t⇩0› to frame two states \<^term>‹s› and \<^term>‹s'›,
we can express that the 'irrelevant' parts of the heap did not change in the respective
framed states \<^term>‹frame A t⇩0 s› and \<^term>‹frame A t⇩0 s'›.
›
subsubsection ‹\<^const>‹rel_alloc›, \<^const>‹rel_stack›, \<^const>‹rel_sum_stack››
text ‹The core invariant between the states that is maintained by the refinement is
\<^term>‹rel_alloc 𝒮 M A t⇩0 s t›:
▪ \<^term>‹𝒮› is a static set of addresses containing the stack. Stack allocation and stack release
are contained within \<^term>‹𝒮›.
▪ \<^term>‹M› is the set of addresses that might be modified by the original program.
▪ \<^term>‹A› is the set of addresses that are eliminated (abstracted) in the resulting program. Another
good intuition about the set \<^term>‹A› is that the original function might read from and depend on
those addresses but the resulting function does not. Moreover, the resulting function does not
change any of heap locations in \<^term>‹A›. The resulting function is agnostic to those locations.
It neither reads nor modifies them.
For any heap valuation of \<^term>‹A› the abstract function simulates the original one.
▪ \<^term>‹t⇩0› is the reference state explained above.
▪ \<^term>‹s› is the state of the original program
▪ \<^term>‹t› is the state of the resulting program.
›
thm rel_alloc_def [of 𝒮 M A t⇩0]
text ‹
The properties of \<^const>‹rel_alloc› are:
▪ \<^term>‹t = frame A t⇩0 s›, the resulting heap in \<^term>‹t› is the same as the original in \<^term>‹s›,
except for the addresses in \<^term>‹A› and the stack free space.
▪ \<^term>‹stack_free (htd s) ⊆ 𝒮›: The stack free space is contained in \<^term>‹𝒮›.
▪ \<^term>‹stack_free (htd s) ∩ A = {}›: We only want to eliminate properly allocated pointers, which
are not contained in the stack free space.
▪ \<^term>‹stack_free (htd s) ∩ M = {}›: We only mention properly allocated pointers to be modified.
Modifications within the stack free space are irrelevant, as they are abstracted by the framing
anyways.
›
text ‹
For an original function \<^term>‹f› and the abstract aka. lifted function \<^term>‹g› the refinement
property has the following shape:
\<^prop>‹rel_alloc 𝒮 M A t⇩0 s t ⟹ refines f g s t (rel_stack 𝒮 M1 A s t⇩0 (rel_xval_stack L R))›
Note that the output relation of \<^const>‹refines› relates both the resulting states and return
values to each other. The return values of the error monad are related by \<^const>‹rel_sum_stack›, taking a
relation \<^term>‹L› for error values (\<^term>‹Inl›) and the \<^term>‹R› for normal termination (\<^term>‹Inr›).
›
thm rel_sum_stack_def
thm rel_stack_def [of 𝒮 M1 A s t⇩0 "rel_sum_stack L R"]
text ‹Consider the output states \<^term>‹s'› and \<^term>‹t'› and the output values \<^term>‹v› and \<^term>‹w›
for the original and the abstracted function. Then we have:
▪ \<^term>‹rel_xval_stack L R (hmem s') v w›: The result values are related, (see below).
▪ \<^term>‹rel_alloc 𝒮 M1 A t⇩0 s' t'›: In particular \<^term>‹t' = frame A t⇩0 s'›. Note that we use
the same \<^term>‹A› and the same reference state \<^term>‹t⇩0› as for the initial states. So this means
that the abstract program \<^term>‹g› does not change the heap values and typing in \<^term>‹A› and does
not depend on the values.
▪ \<^term>‹equal_upto (M1 ∪ stack_free (htd s')) (hmem s') (hmem s)›: The original program only modifies
pointers contained in \<^term>‹M1› or in the stack free portions of the heap
(e.g. by nested function calls).
▪ \<^term>‹equal_upto M1 (htd s') (htd s)›: Changes to the heap typing of the original program
are confined within set \<^term>‹M1›.
▪ \<^term>‹equal_on 𝒮 (htd s') (htd s)›: The stack typing of the original program remains unchanged.
Note that it might change temporarily by nested function calls but the stacking discipline
restores the state.
By construction we have \<^term>‹M1 ⊆ M›.
›
paragraph ‹Stacking pointers: \<^const>‹rel_singleton_stack›, \<^const>‹rel_push››
text ‹
The relation on the output value captures the idea that portions of the heap are stacked / tupled
into result values. On the boundaries of a function the relation on error values \<^term>‹L› is always the
trivial \<^term>‹λ_. (=)›. Errors are terminal as there is no exception handling in C. Within
the boundaries of a function there can be more complex relations reflecting the nesting of
break / continue / return and goto.
For ordinary values the values of pointers are tupled by \<^const>‹rel_singleton_stack›, in case
the original function returned no result (void), or nested by \<^const>‹rel_push› in case the
original function returned some result.
›
thm rel_singleton_stack_def
thm rel_push_def
text ‹
A typical relation is \<^term>‹R = rel_singleton_stack p› for some pointer \<^term>‹p›. This means
that the abstract value can be obtained by looking into the heap at pointer \<^term>‹p›:
\<^term>‹rel_singleton_stack p h () v› means that \<^term>‹v = h_val h p›. The heap of the original program
is propagated down the into the relations.
Similar \<^term>‹rel_push p (λ_. (=)) h x (v, x)› relates the result value \<^term>‹x› to the tuple
\<^term>‹(v, x)› where \<^term>‹v = h_val h p›. Note that \<^term>‹rel_push› can be nested represent multiple
output parameters.
Depending on the role of the pointers there are additional assumptions about the pointers,
in particular things like \<^term>‹ptr_span p ⊆ A› or \<^term>‹ptr_span p ⊆ M› and disjointness properties
like \<^term>‹distinct_sets [ptr_span p, ptr_span q]›.
›
subsubsection ‹\<^const>‹IOcorres››
text ‹
There is an additional predicate \<^const>‹IOcorres› which represents the result of the refinement
proof in the canonical autocorres ideom that is used within the other phases. This form
is what is expected when constructing the final theorem combining all autocorres layers
@{thm ac_corres_chain_sims}. The \<^term>‹refines› version and the \<^term>‹IOcorres› version can
be converted into each other by @{thm ac_corres_chain_sims}.
›
thm IOcorres_def
thm IOcorres_refines_conv
thm ac_corres_chain_sims
end