Department of Computer Science
Permanent URI for this community
Browse
Browsing Department of Computer Science by browse.metadata.advisor "Bester, Willem"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemAn investigation into performance-related issues of regular expression matching(Stellenbosch : Stellenbosch University, 2022-11) Van Litsenborgh, Pieter Steyn; Van der Merwe, Brink; Bester, Willem; Stellenbosch University. Faculty of Science. Dept. of Computer Science.ENGLISH ABSTRACT: Regular expressions (regexes) constitute a concise, powerful, and useful pattern matching language for strings. They are widely supported in programming languages, text processing programs, and (advanced) text editors. In general, regex matching is performed efficiently, but in some cases, a vulnerable regex (and an exploit string) can cause the matching procedure to take exponential time in the length of the input string. As regexes are frequently used in userfacing circumstances, vulnerable regexes open up the underlying application to a potential denial of service vector. Several regex engines choose to implement an algorithm that will always match an input string in linear time, but can only support a subset of modern regex constructs. We show how to implement a construct known as Regular Expressions with Lookahead (REwLA), and investigate various state complexity results when converting REwLA to equivalent deterministic finite automata (DFA). The relationship between REwLA with atomic operations and REwLA without is investigated, and an algorithm for translating a REwLA with atomic operations to a REwLA without is provided. Vulnerable regexes only exist when the regex engine implements a backtracking algorithm. We extend non-deterministic finite a utomata (NFA) and regexes by adding memoization to these formalisms. Furthermore, we generalise the concept of ambiguity in order to be applicable to memoized extensions. These extensions are aimed at improving the matching time of backtracking regex engines.