Regular Expressions

Regular expressions are a common way to describe text strings for searching. Many variants exist and we describe here the variant, which is used by Scanner.

A regular expression is a string in which some characters (metacharacters) have special meanings. The following characters are metacharacters:

* + ? [] ^ .

We can see a regular expression as a compact notation for a variety of ordinary strings (no special characters). We say that the expression match all strings in this set. We are content with a few examples:

   

The method findInLine in Scanner takes a regular expression as a parameter and is looking for a matching substring in the current row. If it finds one, it returns it and the Scanner's current position is moved immediately after the matched string. If no matching string is found, it returns null and the current position is unchanged (i.e. you have searched forward on the line without success; leftover of the line being looked at is regarded as unread).   

A regular expression can match multiple substrings in a line: if the current line is "<tt>pattern</tt>", then the expression "<.*>" will match both the tag <tt> as well as the entire line. The rule is then that findInLine returns the longest matching string.