Regular expression
From TheBeard Science Project Wiki
Revision as of 16:46, 29 February 2016 by Beard (talk | contribs) (Created page with "<pre> [\^$.|?*+() need to be escaped with \ OR - 'gr[ae]y' will match 'gray' OR 'gray' OR - 'cat|dog' matches patterns 'cat' or 'dog' RANGE - [0-9] to match the range. [0-9a-f...")
[\^$.|?*+() need to be escaped with \
OR - 'gr[ae]y' will match 'gray' OR 'gray'
OR - 'cat|dog' matches patterns 'cat' or 'dog'
RANGE - [0-9] to match the range. [0-9a-fA-FxX] matches 0-9, a-f, A-F, x, or X.
NEGATE - [^x] does not match x.
ANCHOR - ^ at beginning of string and $ at end of string. ^b only matches the first b in bob.
OPTIONAL - 'colou?r' makes the u optional, thus matching 'colour' or 'color'
REPETITION - [0-9]+ match once or more times, [0-9]* match zero or more times
\b - word boundary. characterized as a word character next to a non word character.
\d - matches digit character
\D - matches non digits character
\w - matches word character (alphanumeric)
\W - matches non word character (alphanumeric)
\s - matches whitespace character
\S - matches non whitespace character
. - dot matches single character. 'gr.y'
* - matches 0 or more.
? - matches 0 or 1
+ - matches 1 or more of previous character. 'er+or' will match 'eror' OR 'error' and so on
.* - used like *
[abc] - match 1 character which is a, b, or c
[^abc] - NOT match 1 character which is a, b, or c
{} - match specified number of previous character. 'er{1,2}or' matches 'eror' or 'error' only
^ - match following set of characters (word) only if they are the first on a line. ^error
$ - matches if previous characters are last on a line. error$
() - groups things together. example\.(com|org) matches .com or .org
(?=x) - "lookahead" to look for a character (x), but don't match it
(?!x) - negated "lookahead"
\t - tab
\r - carriage return
\n - line feed
\a - bell
\e - escape
\f - form feed
\v - vertical tab
\xFF - hexadecimal
\uFFFF - unicode
windows text files use \r\n, unix uses \n
EXAMPLES (work with sed):
Remove all leading whitespaces = s/^[ \t]*//
Remove all trailing whitespaces = s/[ \t]*$//
Remove all HTML/XML tags = s/<[^>]\+>//g
Replace & with & = s/&/\&/g
Replace ' with ' = s/\'/'/g
Escape all special characters - sed -e 's/[\/&]/\\&/g'
EXAMPLES (work with grep -P):
Search for strings between tags on single line = <td>(.*?)<\/td>
Phone Number = [0-9 \(\)\.\-]*?
[^\x00-\x7F] - matches non-ASCII characters
MORE EXAMPLES:
href=\"(?!/|#) - find relative path names in HTML links