Two-Pass Greedy Regular Expression Parsing

From thelas.dk

Jump to: navigation, search

Abstract

We present a new algorithm for producing greedy parses for regular expressions in a semi-streaming fashion. The algorithm operates in only 2 passes, executes in time linear in the input and in the regular expression, stores fewer bits in a log than previous algorithms and outputs a compact bit coding of greedy parse trees. The input need not be stored at all; each input symbol can be discarded after being processed in the first pass. Previous scalable regular expression parsing algorithms require more log storage and either 3 passes where the first consists of reversing the input or they do not or are not known to produce a greedy parse.

The empirical evaluation indicates that our C-based prototype is surprisingly competitive with tools with restricted functionally such as grep.

To be presented at CIAA 2013.

Personal tools