Theory SCC_Bloemen_Sequential
section ‹Overview›
text ‹
Computing the maximal strongly connected components (SCCs) of a
finite directed graph is a celebrated problem in the
theory of graph algorithms. Although Tarjan's algorithm~\<^cite>‹"tarjan:depth-first"›
is perhaps the best-known solution, there are many others. In his PhD
thesis, Bloemen~\<^cite>‹"bloemen:strong"› presents an algorithm that is itself based
on earlier algorithms by Munro~\<^cite>‹"munro:efficient"› and
Dijkstra~\<^cite>‹"dijkstra:finding"›. Just like these algorithms, Bloemen's
solution is based on enumerating SCCs in a depth-first traversal of the graph.
Gabow's algorithm that has already been formalized in Isabelle~\<^cite>‹"lammich:gabow"›
also falls into this category of solutions.
Nevertheless, Bloemen goes on to present a parallel variant of the algorithm
suitable for execution on multi-core processors, based on clever data structures
that minimize locking.
In the following, we encode the sequential version of the algorithm in the
proof assistant Isabelle/HOL, and prove its correctness. Bloemen's thesis
briefly and informally explains why the algorithm is correct. Our proof expands
on these arguments, making them completely formal. The encoding is based on
a direct representation of the algorithm as a pair of mutually recursive
functions; we are not aiming at extracting executable code.
›
theory SCC_Bloemen_Sequential
imports Main
begin
text ‹
The record below represents the variables of the
algorithm. Most variables correspond to those used in
Bloemen's presentation. Thus, the variable @{text 𝒮}
associates to every node the set of nodes that have
already been determined to be part of the same SCC. A core
invariant of the algorithm will be that this mapping represents
equivalence classes of nodes: for all nodes @{text v} and @{text w},
we maintain the relationship
@{text "v ∈ 𝒮 w ⟷ 𝒮 v = 𝒮 w."}
In an actual implementation of this algorithm, this variable
could conveniently be represented by a union-find structure.
Variable @{text stack} holds the list of roots of these
(not yet maximal) SCCs, in depth-first order,
@{text visited} and @{text explored}
represent the nodes that have already been seen, respectively
that have been completely explored, by the algorithm, and
@{text sccs} is the set of maximal SCCs that the algorithm
has found so far.
Additionally, the record holds some auxiliary variables that
are used in the proof of correctness. In particular,
@{text root} denotes the node on which the algorithm was called,
@{text cstack} represents the call stack of the recursion of
function @{text dfs},
and @{text vsuccs} stores the successors of each node
that have already been visited by the function @{text dfss}
that loops over all successors of a given node.
›