2017年1月15日 星期日

Commonly used regular expression symbols

Reference from Web Scraping with Python , Ryan Mitchell
SymbolsMeaningExampleExample Match
*Matches the preceding character, subexpression, or bracketed character, 0 or more timesa*b*aaaaaaaa,aaabbbbb, bbbbbb
+Matches the preceding character, subexpression, or bracketed character,1 or more timesa+b+aaaaaaaab, aaabbbbb, abbbbbb
[]Matches any character within the brackets (i.e., “Pick any one of these things”)[A-Z]*APPLE,
CAPITALS,
QWERTY
()A grouped subexpression (these are evaluated fi rst, in the “order of operations” of regular expressions)(a*b)*aaabaab, abaaab, ababaaaaab
{m, n}Matches the preceding character, subexpression, or bracketed character between m and n times (inclusive)a{2,3}b{2,3}aabbb, aaabbb,aabb
[^]Matches any single character that is not in the brackets[^A-Z]*apple,lowercase,qwerty
|Matches any character, string of characters, or subexpression, separated by the “I” (note that this is a vertical bar, or “pipe,” not a capital “i”)b(a|i|e)dbad, bid, bed
.Matches any single character (including symbols, numbers, a space, etc.)b.dbad, bzd, b$d, b d
^Indicates that a character or subexpression occurs at the beginning of a string^aapple, asdf, a
\An escape character (this allows you to use “special” characters as their literal meaning)\. \| \\. | \
$Often used at the end of a regular expression, it means “match this up to the end of the string.” Without it, every regular expression has a defacto “.*” at the end of it, accepting strings where only the fi rst part of the string matches. This can be thougt of as analogous to the ^ symbol.[A-Z]*[a-z]*$ABCabc, zzzyx, Bob
?!“Does not contain.” This odd pairing of symbols, immediately preceding a character (or regular expression), indicates that that character should not be found in that specific place in the larger string. This can be tricky to use; after all, the character might be found in a diff erent part of the string. If trying to eliminate a character entirely, use in conjunction with a ^ and $ at either end.^((?![A-Z]).)*$no-caps-here, $ymb0ls a4e f!ne

沒有留言:

張貼留言