Regular Expression Support

Top  Previous  Next

Regular expressions are formulas that can be used to match strings of text that follow some pattern. They allow their users to succinctly express a set of character matching rules that would otherwise require a large number of switches and logical operations.

 

When you first see a regular expression, it may appear somewhat intimidating and complex. But, in reality, regular expressions can be as simple or involved as you wish and still be effective. Once you understand the meaning of a handful of special regular expression characters (called metacharacters), you'll be able to match filename patterns with ease.

 

This help file will not provide an in-depth tutorial on the formation of regular expressions, simply because a large number of these exist on the Internet today for free. Simply visit your favorite search engine and enter "regular expressions" into the search box. You'll find a variety of guides and other useful materials to help you along.

 

There are, however, subtle differences between the regular expression syntax engines that various applications employ. The charts below provide an overview of the regular expression metacharacters and abbreviations supported by FolderSizes.

 

Metacharacter

Meaning

.

Matches any single character.

[ ]

Indicates a character class. Matches any character inside the brackets (for example, [abc] matches "a", "b", and "c").

^

If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets (for example, [^abc] matches all characters except "a", "b", and "c"). If ^ is at the beginning of the regular expression, it matches the beginning of the input (for example, ^[abc] will only match input that begins with "a", "b", or "c").

-

In a character class, indicates a range of characters (for example, [0-9] matches any of the digits "0" through "9").        

?

Indicates that the preceding expression is optional: it matches once or not at all (for example, [0-9][0-9]? matches "2" and "12").

+

Indicates that the preceding expression matches one or more times (for example, [0-9]+ matches "1", "13", "666", and so on).

*

Indicates that the preceding expression matches zero or more times.

??, +?, *?

Non-greedy versions of ?, +, and *. These match as little as possible, unlike the greedy versions which match as much as possible. Example: given the input "<abc><def>", <.*?> matches "<abc>" while <.*> matches "<abc><def>".

( )

Grouping operator. Example: (\d+,)*\d+ matches a list of numbers separated by commas (such as "1" or "1,23,456").

\

Escape character: interpret the next character literally (for example, [0-9]+ matches one or more digits, but [0-9]\+ matches a digit followed by a plus character). Also used for abbreviations (such as \a for any alphanumeric character; see table below). If \ is followed by a number n, it matches the nth match group (starting from 0). Example: <{.*?}>.*?</\0> matches "<head>Contents</head>".

$

At the end of a regular expression, this character matches the end of the input. Example: [0-9]$ matches a digit at the end of the input.

|

Alternation operator: separates two expressions, exactly one of which matches (for example, T|the matches "The" or "the").

!

Negation operator: the expression following ! does not match the input. Example: a!b matches "a" not followed by "b".

 

Abbreviations

 

\a        Any alphanumeric character: ([a-zA-Z0-9])

\b        White space (blank): ([ \\t])

\c        Any alphabetic character: ([a-zA-Z])

\d        Any decimal digit: ([0-9])

\h        Any hexadecimal digit: ([0-9a-fA-F])

\n        Newline: (\r|(\r?\n))

\q        A quoted string: (\"[^\"]*\")|(\'[^\']*\')

\w        A simple word: ([a-zA-Z]+)

\z        An integer: ([0-9]+)