regex - Why do most languages implement wildcard regular expressions inefficiently? -
i given link following article regarding implementation of regular expressions in many modern languages.
http://swtch.com/~rsc/regexp/regexp1.html
tl;dnr: regular expressions such (a?)^na^n
fixed $n$ take exponential time matched against, say, a^n
because implemented via backtracking on string when matching ?
section. implementing these nfa keeping state lists makes more efficient obvious reasons
the details of how each language implements these isn't detailed (and article old), i'm curious: what, if any, drawbacks of using nfa opposed other implementation techniques. thing can come bells , whistles of libraries either a) building nfa features impractical or b) there conflicting performance issue between expression above , other, possibly more common, operation.
while possible construct dfas handle these complex cases (the tcl re engine, written henry spencer, proof example; article linked indicated performance data) it's exceptionally hard.
one key thing though if can detect never need matching group information, can (for many res, without internal backreferences) transform re 1 only uses parentheses grouping allowing more efficient re generated (so (a?){n}a{n}
— i'm using modern conventional syntax — becomes equivalent a{n,2n}
). backreferences break major optimisation; it's not nothing in henry's re code (alluded above) there code comment describing them “feature black lagoon”. 1 of best comments i've ever read in code (with exception of references academic papers describe algorithm encoded).
on other hand, perl/pcre style engines recursive-descent evaluation schemes, can ascribe saner set of semantics mixed greediness res, , many other things besides. (at extreme end of this, recursive patterns — (?r)
et al — impossible automata-theoretic approaches. require stack match, making them formally not regular expressions.)
on practical level, cost of building nfa , dfa compile can quite high. need clever caching make not expensive. , on practical level, pcre , perl implementations have had lot more developer effort applied them.
Comments
Post a Comment