Initial Commit
This commit is contained in:
425
database/perl/lib/pods/perlreref.pod
Normal file
425
database/perl/lib/pods/perlreref.pod
Normal file
@@ -0,0 +1,425 @@
|
||||
=head1 NAME
|
||||
|
||||
perlreref - Perl Regular Expressions Reference
|
||||
|
||||
=head1 DESCRIPTION
|
||||
|
||||
This is a quick reference to Perl's regular expressions.
|
||||
For full information see L<perlre> and L<perlop>, as well
|
||||
as the L</"SEE ALSO"> section in this document.
|
||||
|
||||
=head2 OPERATORS
|
||||
|
||||
C<=~> determines to which variable the regex is applied.
|
||||
In its absence, $_ is used.
|
||||
|
||||
$var =~ /foo/;
|
||||
|
||||
C<!~> determines to which variable the regex is applied,
|
||||
and negates the result of the match; it returns
|
||||
false if the match succeeds, and true if it fails.
|
||||
|
||||
$var !~ /foo/;
|
||||
|
||||
C<m/pattern/msixpogcdualn> searches a string for a pattern match,
|
||||
applying the given options.
|
||||
|
||||
m Multiline mode - ^ and $ match internal lines
|
||||
s match as a Single line - . matches \n
|
||||
i case-Insensitive
|
||||
x eXtended legibility - free whitespace and comments
|
||||
p Preserve a copy of the matched string -
|
||||
${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
|
||||
o compile pattern Once
|
||||
g Global - all occurrences
|
||||
c don't reset pos on failed matches when using /g
|
||||
a restrict \d, \s, \w and [:posix:] to match ASCII only
|
||||
aa (two a's) also /i matches exclude ASCII/non-ASCII
|
||||
l match according to current locale
|
||||
u match according to Unicode rules
|
||||
d match according to native rules unless something indicates
|
||||
Unicode
|
||||
n Non-capture mode. Don't let () fill in $1, $2, etc...
|
||||
|
||||
If 'pattern' is an empty string, the last I<successfully> matched
|
||||
regex is used. Delimiters other than '/' may be used for both this
|
||||
operator and the following ones. The leading C<m> can be omitted
|
||||
if the delimiter is '/'.
|
||||
|
||||
C<qr/pattern/msixpodualn> lets you store a regex in a variable,
|
||||
or pass one around. Modifiers as for C<m//>, and are stored
|
||||
within the regex.
|
||||
|
||||
C<s/pattern/replacement/msixpogcedual> substitutes matches of
|
||||
'pattern' with 'replacement'. Modifiers as for C<m//>,
|
||||
with two additions:
|
||||
|
||||
e Evaluate 'replacement' as an expression
|
||||
r Return substitution and leave the original string untouched.
|
||||
|
||||
'e' may be specified multiple times. 'replacement' is interpreted
|
||||
as a double quoted string unless a single-quote (C<'>) is the delimiter.
|
||||
|
||||
C<m?pattern?> is like C<m/pattern/> but matches only once. No alternate
|
||||
delimiters can be used. Must be reset with reset().
|
||||
|
||||
=head2 SYNTAX
|
||||
|
||||
\ Escapes the character immediately following it
|
||||
. Matches any single character except a newline (unless /s is
|
||||
used)
|
||||
^ Matches at the beginning of the string (or line, if /m is used)
|
||||
$ Matches at the end of the string (or line, if /m is used)
|
||||
* Matches the preceding element 0 or more times
|
||||
+ Matches the preceding element 1 or more times
|
||||
? Matches the preceding element 0 or 1 times
|
||||
{...} Specifies a range of occurrences for the element preceding it
|
||||
[...] Matches any one of the characters contained within the brackets
|
||||
(...) Groups subexpressions for capturing to $1, $2...
|
||||
(?:...) Groups subexpressions without capturing (cluster)
|
||||
| Matches either the subexpression preceding or following it
|
||||
\g1 or \g{1}, \g2 ... Matches the text from the Nth group
|
||||
\1, \2, \3 ... Matches the text from the Nth group
|
||||
\g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
|
||||
\g{name} Named backreference
|
||||
\k<name> Named backreference
|
||||
\k'name' Named backreference
|
||||
(?P=name) Named backreference (python syntax)
|
||||
|
||||
=head2 ESCAPE SEQUENCES
|
||||
|
||||
These work as in normal strings.
|
||||
|
||||
\a Alarm (beep)
|
||||
\e Escape
|
||||
\f Formfeed
|
||||
\n Newline
|
||||
\r Carriage return
|
||||
\t Tab
|
||||
\037 Char whose ordinal is the 3 octal digits, max \777
|
||||
\o{2307} Char whose ordinal is the octal number, unrestricted
|
||||
\x7f Char whose ordinal is the 2 hex digits, max \xFF
|
||||
\x{263a} Char whose ordinal is the hex number, unrestricted
|
||||
\cx Control-x
|
||||
\N{name} A named Unicode character or character sequence
|
||||
\N{U+263D} A Unicode character by hex ordinal
|
||||
|
||||
\l Lowercase next character
|
||||
\u Titlecase next character
|
||||
\L Lowercase until \E
|
||||
\U Uppercase until \E
|
||||
\F Foldcase until \E
|
||||
\Q Disable pattern metacharacters until \E
|
||||
\E End modification
|
||||
|
||||
For Titlecase, see L</Titlecase>.
|
||||
|
||||
This one works differently from normal strings:
|
||||
|
||||
\b An assertion, not backspace, except in a character class
|
||||
|
||||
=head2 CHARACTER CLASSES
|
||||
|
||||
[amy] Match 'a', 'm' or 'y'
|
||||
[f-j] Dash specifies "range"
|
||||
[f-j-] Dash escaped or at start or end means 'dash'
|
||||
[^f-j] Caret indicates "match any character _except_ these"
|
||||
|
||||
The following sequences (except C<\N>) work within or without a character class.
|
||||
The first six are locale aware, all are Unicode aware. See L<perllocale>
|
||||
and L<perlunicode> for details.
|
||||
|
||||
\d A digit
|
||||
\D A nondigit
|
||||
\w A word character
|
||||
\W A non-word character
|
||||
\s A whitespace character
|
||||
\S A non-whitespace character
|
||||
\h A horizontal whitespace
|
||||
\H A non horizontal whitespace
|
||||
\N A non newline (when not followed by '{NAME}';;
|
||||
not valid in a character class; equivalent to [^\n]; it's
|
||||
like '.' without /s modifier)
|
||||
\v A vertical whitespace
|
||||
\V A non vertical whitespace
|
||||
\R A generic newline (?>\v|\x0D\x0A)
|
||||
|
||||
\pP Match P-named (Unicode) property
|
||||
\p{...} Match Unicode property with name longer than 1 character
|
||||
\PP Match non-P
|
||||
\P{...} Match lack of Unicode property with name longer than 1 char
|
||||
\X Match Unicode extended grapheme cluster
|
||||
|
||||
POSIX character classes and their Unicode and Perl equivalents:
|
||||
|
||||
ASCII- Full-
|
||||
POSIX range range backslash
|
||||
[[:...:]] \p{...} \p{...} sequence Description
|
||||
|
||||
-----------------------------------------------------------------------
|
||||
alnum PosixAlnum XPosixAlnum 'alpha' plus 'digit'
|
||||
alpha PosixAlpha XPosixAlpha Alphabetic characters
|
||||
ascii ASCII Any ASCII character
|
||||
blank PosixBlank XPosixBlank \h Horizontal whitespace;
|
||||
full-range also
|
||||
written as
|
||||
\p{HorizSpace} (GNU
|
||||
extension)
|
||||
cntrl PosixCntrl XPosixCntrl Control characters
|
||||
digit PosixDigit XPosixDigit \d Decimal digits
|
||||
graph PosixGraph XPosixGraph 'alnum' plus 'punct'
|
||||
lower PosixLower XPosixLower Lowercase characters
|
||||
print PosixPrint XPosixPrint 'graph' plus 'space',
|
||||
but not any Controls
|
||||
punct PosixPunct XPosixPunct Punctuation and Symbols
|
||||
in ASCII-range; just
|
||||
punct outside it
|
||||
space PosixSpace XPosixSpace \s Whitespace
|
||||
upper PosixUpper XPosixUpper Uppercase characters
|
||||
word PosixWord XPosixWord \w 'alnum' + Unicode marks
|
||||
+ connectors, like
|
||||
'_' (Perl extension)
|
||||
xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
|
||||
ASCII-range is
|
||||
[0-9A-Fa-f]
|
||||
|
||||
Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed
|
||||
in L<perluniprops/Properties accessible through \p{} and \P{}>
|
||||
|
||||
Within a character class:
|
||||
|
||||
POSIX traditional Unicode
|
||||
[:digit:] \d \p{Digit}
|
||||
[:^digit:] \D \P{Digit}
|
||||
|
||||
=head2 ANCHORS
|
||||
|
||||
All are zero-width assertions.
|
||||
|
||||
^ Match string start (or line, if /m is used)
|
||||
$ Match string end (or line, if /m is used) or before newline
|
||||
\b{} Match boundary of type specified within the braces
|
||||
\B{} Match wherever \b{} doesn't match
|
||||
\b Match word boundary (between \w and \W)
|
||||
\B Match except at word boundary (between \w and \w or \W and \W)
|
||||
\A Match string start (regardless of /m)
|
||||
\Z Match string end (before optional newline)
|
||||
\z Match absolute string end
|
||||
\G Match where previous m//g left off
|
||||
\K Keep the stuff left of the \K, don't include it in $&
|
||||
|
||||
=head2 QUANTIFIERS
|
||||
|
||||
Quantifiers are greedy by default and match the B<longest> leftmost.
|
||||
|
||||
Maximal Minimal Possessive Allowed range
|
||||
------- ------- ---------- -------------
|
||||
{n,m} {n,m}? {n,m}+ Must occur at least n times
|
||||
but no more than m times
|
||||
{n,} {n,}? {n,}+ Must occur at least n times
|
||||
{n} {n}? {n}+ Must occur exactly n times
|
||||
* *? *+ 0 or more times (same as {0,})
|
||||
+ +? ++ 1 or more times (same as {1,})
|
||||
? ?? ?+ 0 or 1 time (same as {0,1})
|
||||
|
||||
The possessive forms (new in Perl 5.10) prevent backtracking: what gets
|
||||
matched by a pattern with a possessive quantifier will not be backtracked
|
||||
into, even if that causes the whole match to fail.
|
||||
|
||||
There is no quantifier C<{,n}>. That's currently illegal.
|
||||
|
||||
=head2 EXTENDED CONSTRUCTS
|
||||
|
||||
(?#text) A comment
|
||||
(?:...) Groups subexpressions without capturing (cluster)
|
||||
(?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
|
||||
(?=...) Zero-width positive lookahead assertion
|
||||
(*pla:...) Same, starting in 5.32; experimentally in 5.28
|
||||
(*positive_lookahead:...) Same, same versions as *pla
|
||||
(?!...) Zero-width negative lookahead assertion
|
||||
(*nla:...) Same, starting in 5.32; experimentally in 5.28
|
||||
(*negative_lookahead:...) Same, same versions as *nla
|
||||
(?<=...) Zero-width positive lookbehind assertion
|
||||
(*plb:...) Same, starting in 5.32; experimentally in 5.28
|
||||
(*positive_lookbehind:...) Same, same versions as *plb
|
||||
(?<!...) Zero-width negative lookbehind assertion
|
||||
(*nlb:...) Same, starting in 5.32; experimentally in 5.28
|
||||
(*negative_lookbehind:...) Same, same versions as *plb
|
||||
(?>...) Grab what we can, prohibit backtracking
|
||||
(*atomic:...) Same, starting in 5.32; experimentally in 5.28
|
||||
(?|...) Branch reset
|
||||
(?<name>...) Named capture
|
||||
(?'name'...) Named capture
|
||||
(?P<name>...) Named capture (python syntax)
|
||||
(?[...]) Extended bracketed character class
|
||||
(?{ code }) Embedded code, return value becomes $^R
|
||||
(??{ code }) Dynamic regex, return value used as regex
|
||||
(?N) Recurse into subpattern number N
|
||||
(?-N), (?+N) Recurse into Nth previous/next subpattern
|
||||
(?R), (?0) Recurse at the beginning of the whole pattern
|
||||
(?&name) Recurse into a named subpattern
|
||||
(?P>name) Recurse into a named subpattern (python syntax)
|
||||
(?(cond)yes|no)
|
||||
(?(cond)yes) Conditional expression, where "(cond)" can be:
|
||||
(?=pat) lookahead; also (*pla:pat)
|
||||
(*positive_lookahead:pat)
|
||||
(?!pat) negative lookahead; also (*nla:pat)
|
||||
(*negative_lookahead:pat)
|
||||
(?<=pat) lookbehind; also (*plb:pat)
|
||||
(*lookbehind:pat)
|
||||
(?<!pat) negative lookbehind; also (*nlb:pat)
|
||||
(*negative_lookbehind:pat)
|
||||
(N) subpattern N has matched something
|
||||
(<name>) named subpattern has matched something
|
||||
('name') named subpattern has matched something
|
||||
(?{code}) code condition
|
||||
(R) true if recursing
|
||||
(RN) true if recursing into Nth subpattern
|
||||
(R&name) true if recursing into named subpattern
|
||||
(DEFINE) always false, no no-pattern allowed
|
||||
|
||||
=head2 VARIABLES
|
||||
|
||||
$_ Default variable for operators to use
|
||||
|
||||
$` Everything prior to matched string
|
||||
$& Entire matched string
|
||||
$' Everything after to matched string
|
||||
|
||||
${^PREMATCH} Everything prior to matched string
|
||||
${^MATCH} Entire matched string
|
||||
${^POSTMATCH} Everything after to matched string
|
||||
|
||||
Note to those still using Perl 5.18 or earlier:
|
||||
The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
|
||||
within your program. Consult L<perlvar> for C<@->
|
||||
to see equivalent expressions that won't cause slow down.
|
||||
See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
|
||||
can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
|
||||
and C<${^POSTMATCH}>, but for them to be defined, you have to
|
||||
specify the C</p> (preserve) modifier on your regular expression.
|
||||
In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference.
|
||||
|
||||
$1, $2 ... hold the Xth captured expr
|
||||
$+ Last parenthesized pattern match
|
||||
$^N Holds the most recently closed capture
|
||||
$^R Holds the result of the last (?{...}) expr
|
||||
@- Offsets of starts of groups. $-[0] holds start of whole match
|
||||
@+ Offsets of ends of groups. $+[0] holds end of whole match
|
||||
%+ Named capture groups
|
||||
%- Named capture groups, as array refs
|
||||
|
||||
Captured groups are numbered according to their I<opening> paren.
|
||||
|
||||
=head2 FUNCTIONS
|
||||
|
||||
lc Lowercase a string
|
||||
lcfirst Lowercase first char of a string
|
||||
uc Uppercase a string
|
||||
ucfirst Titlecase first char of a string
|
||||
fc Foldcase a string
|
||||
|
||||
pos Return or set current match position
|
||||
quotemeta Quote metacharacters
|
||||
reset Reset m?pattern? status
|
||||
study Analyze string for optimizing matching
|
||||
|
||||
split Use a regex to split a string into parts
|
||||
|
||||
The first five of these are like the escape sequences C<\L>, C<\l>,
|
||||
C<\U>, C<\u>, and C<\F>. For Titlecase, see L</Titlecase>; For
|
||||
Foldcase, see L</Foldcase>.
|
||||
|
||||
=head2 TERMINOLOGY
|
||||
|
||||
=head3 Titlecase
|
||||
|
||||
Unicode concept which most often is equal to uppercase, but for
|
||||
certain characters like the German "sharp s" there is a difference.
|
||||
|
||||
=head3 Foldcase
|
||||
|
||||
Unicode form that is useful when comparing strings regardless of case,
|
||||
as certain characters have complex one-to-many case mappings. Primarily a
|
||||
variant of lowercase.
|
||||
|
||||
=head1 AUTHOR
|
||||
|
||||
Iain Truskett. Updated by the Perl 5 Porters.
|
||||
|
||||
This document may be distributed under the same terms as Perl itself.
|
||||
|
||||
=head1 SEE ALSO
|
||||
|
||||
=over 4
|
||||
|
||||
=item *
|
||||
|
||||
L<perlretut> for a tutorial on regular expressions.
|
||||
|
||||
=item *
|
||||
|
||||
L<perlrequick> for a rapid tutorial.
|
||||
|
||||
=item *
|
||||
|
||||
L<perlre> for more details.
|
||||
|
||||
=item *
|
||||
|
||||
L<perlvar> for details on the variables.
|
||||
|
||||
=item *
|
||||
|
||||
L<perlop> for details on the operators.
|
||||
|
||||
=item *
|
||||
|
||||
L<perlfunc> for details on the functions.
|
||||
|
||||
=item *
|
||||
|
||||
L<perlfaq6> for FAQs on regular expressions.
|
||||
|
||||
=item *
|
||||
|
||||
L<perlrebackslash> for a reference on backslash sequences.
|
||||
|
||||
=item *
|
||||
|
||||
L<perlrecharclass> for a reference on character classes.
|
||||
|
||||
=item *
|
||||
|
||||
The L<re> module to alter behaviour and aid
|
||||
debugging.
|
||||
|
||||
=item *
|
||||
|
||||
L<perldebug/"Debugging Regular Expressions">
|
||||
|
||||
=item *
|
||||
|
||||
L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
|
||||
for details on regexes and internationalisation.
|
||||
|
||||
=item *
|
||||
|
||||
I<Mastering Regular Expressions> by Jeffrey Friedl
|
||||
(L<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
|
||||
reference on the topic.
|
||||
|
||||
=back
|
||||
|
||||
=head1 THANKS
|
||||
|
||||
David P.C. Wollmann,
|
||||
Richard Soderberg,
|
||||
Sean M. Burke,
|
||||
Tom Christiansen,
|
||||
Jim Cromie,
|
||||
and
|
||||
Jeffrey Goff
|
||||
for useful advice.
|
||||
|
||||
=cut
|
||||
Reference in New Issue
Block a user