Aho-Corasick String Matching

Arthur Freitas Ramos 📧, David Barros Hulak and Ruy Jose Guerra Barretto de Queiroz

May 4, 2026

Abstract

This development formalizes the Aho-Corasick multi-pattern string matching algorithm over lists. The verified executable reference automaton uses the canonical Aho-Corasick state: after reading a text prefix, the state is the longest suffix of that prefix that is also a pattern prefix. The development also proves a state-only transition theorem, a fold-based scan invariant, and a recursive failure-link transition refining the canonical transition, together with equivalent state-carrying and failure-link search procedures. The main theorem shows that the search procedure reports exactly the pattern occurrences ending at each text position.

License

BSD License

Note

Codex with Gpt 5.5 on xhigh was used to help with proof engineering. Proof manually checked by the authors.

Topics

Session Aho_Corasick