Theory union_ac
section "Unions"
theory union_ac
imports
"AutoCorres"
begin
subsection ‹Union support in C-Parser and Autocorres›
text ‹
Fully fledged C unions can have a lot of semantic subtleties in particular with respect to
undefined behavior when it comes to different padding in the variants, effective types and accessing
the union via different variants, e.g. writing a pointer through one variant and subsequently
reading or writing the same union via another variant.
See for example 🌐‹https://robbertkrebbers.nl/research/thesis.pdf› in subsections 2.5.5 ff. for
a discussion of some fine points.
Our current solution is very pragmatic by only supporting a subset of unions that does not exhibit
the more "weired" behaviours:
▪ Single variant unions
▪ Multi-variant unions where each variant is packed (no implicit padding) and has the same size.
For these limited unions the semantics is deterministic and basically boils down to casting / coercing different
C-types. Note that every @{class c_type} is equipped with conversions @{const from_bytes} and @{const to_bytes}
which allows us to take a value, convert it to a byte list and then make another value
(potentially with a different type) from that byte list again. So the basic machinery is already there.
The C parser analyses which variants of a union are actually actively used in the code you want to verify.
Only those variants (internally referred to as active variants) are taken into account. If there
is only a single variant we are fine and a single record type is constructed in HOL as the C-type
representing that type. It is the same construction as for a structure type of that variant.
In case there are multiple active variants the C parser checks that all those variants have the
same size and are ∗‹packed› which means that there is no padding. The original C code
might have to be rewritten with explicit padding fields to fulfill this requirement.
We create a record type for each variant of the union, like a 'struct' for each variant. One
variant is considered as the ∗‹canonical› variant. Currently this always is the first variant of
the union. The idea of the canonical variant is that every access or update to a union is always
done via this canonical variant and then 'casted' to the variant you need. This casting is implemented
via @{const ptr_coerce} for pointers and @{const coerce} and @{const coerce_map} for values.
Example:
▩‹
typedef struct {
unsigned fst;
unsigned snd;
} unsigned_pair;
typedef struct {
signed fst;
signed snd;
} signed_pair;
typedef union {
unsigned_pair u;
signed_pair s;
} open_union;
›
The union ‹open_union› has two variants. The first one ‹u› is considered to be the canonical one.
So when you want to update a variant ‹s› in the heap via a pointer to ‹open_union› we first read the
canonical variant ‹u› from the heap then coerce the value to a variant ‹s›, then the update on the
value is performed then the resulting value is coerced back to variant ‹u› and finally written
back to the heap.
A good mental model for this implementation is that accesses to unions are normalised towards the
canonical variants of the root pointers. This also carries over to the split heap model you obtain
when you perform the HL phase of autocorres. There is only one relevant heap for the union, which
is the heap for the canonical variant. This means that you can also define ‹addressable_fields› for
the canonical variant. Currently you cannot define ‹addressable_fields› for the other variants.
›
include_C_file "union.h" for union.c