Theory CVM_Abstract_Algorithm
section ‹Abstract Algorithm\label{sec:cvm_abs}›
text ‹This section verifies an abstract version of the CVM algorithm, where the subsampling step
can be an arbitrary randomized algorithm fulfilling an expectation invariant.
The abstract algorithm is presented in Algorithm~\ref{alg:cvm_abs}.
\begin{algorithm}[h!]
\caption{Abstract CVM algorithm.\label{alg:cvm_abs}}
\begin{algorithmic}[1]
\Require Stream elements $a_1,\ldots,a_l$, $0 < \varepsilon$, $0 < \delta < 1$, $\frac{1/2} \leq f < 1$
\Ensure An estimate $R$, s.t., $\prob \left( | R - |A| | > \varepsilon |A| \right) \leq \delta$ where $A := \{a_1,\ldots,a_l\}.$
\State $\chi \gets \{\}, p \gets 1, n \geq \left\lceil \frac{12}{\varepsilon^2} \ln(\frac{3l}{\delta}) \right\rceil$
\For{$i \gets 1$ to $l$}
\State $b \getsr \Ber(p)$ \Comment insert $a_i$ with probability $p$ (and remove it otherwise)
\If{$b$}
\State $\chi \gets \chi \cup \{a_i\}$
\Else
\State $\chi \gets \chi - \{a_i\}$
\EndIf
\If{$|\chi| = n$}
\State $\chi \getsr \mathrm{subsample}(\chi)$ \Comment abstract subsampling step
\State $p \gets p f$
\EndIf
\EndFor
\State \Return $\frac{|\chi|}{p}$ \Comment estimate cardinality of $A$
\end{algorithmic}
\end{algorithm}
For the subsampling step we assume that it fulfills the following inequality:
\begin{equation}
\label{eq:subsample_condition}
\int_{\mathrm{subsample}(\chi)} \left(\prod_{i \in S} g(i \in \omega) \right)\, d \omega \leq
\prod_{i \in S} \left(\int_{Ber(f)} g(\omega)\, d\omega\right)
\end{equation}
for all non-negative functions $g$ and $S \subseteq \chi$,
where $\mathrm{Ber}(p)$ denotes the Bernoulli-distribution.
The original CVM algorithm uses a subsampling step where each element of $\chi$ is retained
independently with probability $f$. It is straightforward to see that this fulfills the above
condition (with equality).
The new CVM algorithm variant proposed in this work uses a subsampling step where a random
$nf$-sized subset of $\chi$ is kept. This also fulfills the above inequality, although this is
harder to prove and will be explained in more detail in Section~\ref{sec:cvm_new}.
In this section, we will verify that the above abstract algorithm indeed fulfills the desired
conditions on its estimate, as well as unbiasedness, i.e., that: $\expect [R] = |A|$.
The part that is not going to be verified in this section, is the fact that the algorithm keeps at
most $n$ elements in the state $\chi$, because it is not unconditionally true, but will be ensured
(by different means) for the concrete instantiations in the following sections.
An informal version of this proof is presented in Appendix~\ref{apx:informal_proof}.
For important lemmas and theorems, we include a reference to the corresponding statement in the
appendix.›
theory CVM_Abstract_Algorithm
imports
"HOL-Decision_Procs.Approximation"
CVM_Preliminary
Finite_Fields.Finite_Fields_More_PMF
Universal_Hash_Families.Universal_Hash_Families_More_Product_PMF
begin
unbundle no_vec_syntax