Initial Commit

2025-12-03 16:38:10 +01:00
parent c5e26bf594
commit b732d8d4b5
17680 changed files with 5977495 additions and 2 deletions
--- a/database/perl/lib/pods/perlreref.pod
+++ b/database/perl/lib/pods/perlreref.pod
@@ -0,0 +1,425 @@
+=head1 NAME
+
+perlreref - Perl Regular Expressions Reference
+
+=head1 DESCRIPTION
+
+This is a quick reference to Perl's regular expressions.
+For full information see L<perlre> and L<perlop>, as well
+as the L</"SEE ALSO"> section in this document.
+
+=head2 OPERATORS
+
+C<=~> determines to which variable the regex is applied.
+In its absence, $_ is used.
+
+    $var =~ /foo/;
+
+C<!~> determines to which variable the regex is applied,
+and negates the result of the match; it returns
+false if the match succeeds, and true if it fails.
+
+    $var !~ /foo/;
+
+C<m/pattern/msixpogcdualn> searches a string for a pattern match,
+applying the given options.
+
+    m  Multiline mode - ^ and $ match internal lines
+    s  match as a Single line - . matches \n
+    i  case-Insensitive
+    x  eXtended legibility - free whitespace and comments
+    p  Preserve a copy of the matched string -
+       ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
+    o  compile pattern Once
+    g  Global - all occurrences
+    c  don't reset pos on failed matches when using /g
+    a  restrict \d, \s, \w and [:posix:] to match ASCII only
+    aa (two a's) also /i matches exclude ASCII/non-ASCII
+    l  match according to current locale
+    u  match according to Unicode rules
+    d  match according to native rules unless something indicates
+       Unicode
+    n  Non-capture mode. Don't let () fill in $1, $2, etc...
+
+If 'pattern' is an empty string, the last I<successfully> matched
+regex is used. Delimiters other than '/' may be used for both this
+operator and the following ones. The leading C<m> can be omitted
+if the delimiter is '/'.
+
+C<qr/pattern/msixpodualn> lets you store a regex in a variable,
+or pass one around. Modifiers as for C<m//>, and are stored
+within the regex.
+
+C<s/pattern/replacement/msixpogcedual> substitutes matches of
+'pattern' with 'replacement'. Modifiers as for C<m//>,
+with two additions:
+
+    e  Evaluate 'replacement' as an expression
+    r  Return substitution and leave the original string untouched.
+
+'e' may be specified multiple times. 'replacement' is interpreted
+as a double quoted string unless a single-quote (C<'>) is the delimiter.
+
+C<m?pattern?> is like C<m/pattern/> but matches only once. No alternate
+delimiters can be used.  Must be reset with reset().
+
+=head2 SYNTAX
+
+ \       Escapes the character immediately following it
+ .       Matches any single character except a newline (unless /s is
+           used)
+ ^       Matches at the beginning of the string (or line, if /m is used)
+ $       Matches at the end of the string (or line, if /m is used)
+ *       Matches the preceding element 0 or more times
+ +       Matches the preceding element 1 or more times
+ ?       Matches the preceding element 0 or 1 times
+ {...}   Specifies a range of occurrences for the element preceding it
+ [...]   Matches any one of the characters contained within the brackets
+ (...)   Groups subexpressions for capturing to $1, $2...
+ (?:...) Groups subexpressions without capturing (cluster)
+ |       Matches either the subexpression preceding or following it
+ \g1 or \g{1}, \g2 ...    Matches the text from the Nth group
+ \1, \2, \3 ...           Matches the text from the Nth group
+ \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
+ \g{name}     Named backreference
+ \k<name>     Named backreference
+ \k'name'     Named backreference
+ (?P=name)    Named backreference (python syntax)
+
+=head2 ESCAPE SEQUENCES
+
+These work as in normal strings.
+
+   \a       Alarm (beep)
+   \e       Escape
+   \f       Formfeed
+   \n       Newline
+   \r       Carriage return
+   \t       Tab
+   \037     Char whose ordinal is the 3 octal digits, max \777
+   \o{2307} Char whose ordinal is the octal number, unrestricted
+   \x7f     Char whose ordinal is the 2 hex digits, max \xFF
+   \x{263a} Char whose ordinal is the hex number, unrestricted
+   \cx      Control-x
+   \N{name} A named Unicode character or character sequence
+   \N{U+263D} A Unicode character by hex ordinal
+
+   \l  Lowercase next character
+   \u  Titlecase next character
+   \L  Lowercase until \E
+   \U  Uppercase until \E
+   \F  Foldcase until \E
+   \Q  Disable pattern metacharacters until \E
+   \E  End modification
+
+For Titlecase, see L</Titlecase>.
+
+This one works differently from normal strings:
+
+   \b  An assertion, not backspace, except in a character class
+
+=head2 CHARACTER CLASSES
+
+   [amy]    Match 'a', 'm' or 'y'
+   [f-j]    Dash specifies "range"
+   [f-j-]   Dash escaped or at start or end means 'dash'
+   [^f-j]   Caret indicates "match any character _except_ these"
+
+The following sequences (except C<\N>) work within or without a character class.
+The first six are locale aware, all are Unicode aware. See L<perllocale>
+and L<perlunicode> for details.
+
+   \d      A digit
+   \D      A nondigit
+   \w      A word character
+   \W      A non-word character
+   \s      A whitespace character
+   \S      A non-whitespace character
+   \h      A horizontal whitespace
+   \H      A non horizontal whitespace
+   \N      A non newline (when not followed by '{NAME}';;
+           not valid in a character class; equivalent to [^\n]; it's
+           like '.' without /s modifier)
+   \v      A vertical whitespace
+   \V      A non vertical whitespace
+   \R      A generic newline           (?>\v|\x0D\x0A)
+
+   \pP     Match P-named (Unicode) property
+   \p{...} Match Unicode property with name longer than 1 character
+   \PP     Match non-P
+   \P{...} Match lack of Unicode property with name longer than 1 char
+   \X      Match Unicode extended grapheme cluster
+
+POSIX character classes and their Unicode and Perl equivalents:
+
+            ASCII-         Full-
+   POSIX    range          range    backslash
+ [[:...:]]  \p{...}        \p{...}   sequence    Description
+
+ -----------------------------------------------------------------------
+ alnum   PosixAlnum       XPosixAlnum            'alpha' plus 'digit'
+ alpha   PosixAlpha       XPosixAlpha            Alphabetic characters
+ ascii   ASCII                                   Any ASCII character
+ blank   PosixBlank       XPosixBlank   \h       Horizontal whitespace;
+                                                   full-range also
+                                                   written as
+                                                   \p{HorizSpace} (GNU
+                                                   extension)
+ cntrl   PosixCntrl       XPosixCntrl            Control characters
+ digit   PosixDigit       XPosixDigit   \d       Decimal digits
+ graph   PosixGraph       XPosixGraph            'alnum' plus 'punct'
+ lower   PosixLower       XPosixLower            Lowercase characters
+ print   PosixPrint       XPosixPrint            'graph' plus 'space',
+                                                   but not any Controls
+ punct   PosixPunct       XPosixPunct            Punctuation and Symbols
+                                                   in ASCII-range; just
+                                                   punct outside it
+ space   PosixSpace       XPosixSpace   \s       Whitespace
+ upper   PosixUpper       XPosixUpper            Uppercase characters
+ word    PosixWord        XPosixWord    \w       'alnum' + Unicode marks
+                                                    + connectors, like
+                                                    '_' (Perl extension)
+ xdigit  ASCII_Hex_Digit  XPosixDigit            Hexadecimal digit,
+                                                    ASCII-range is
+                                                    [0-9A-Fa-f]
+
+Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed
+in L<perluniprops/Properties accessible through \p{} and \P{}>
+
+Within a character class:
+
+    POSIX      traditional   Unicode
+  [:digit:]       \d        \p{Digit}
+  [:^digit:]      \D        \P{Digit}
+
+=head2 ANCHORS
+
+All are zero-width assertions.
+
+   ^  Match string start (or line, if /m is used)
+   $  Match string end (or line, if /m is used) or before newline
+   \b{} Match boundary of type specified within the braces
+   \B{} Match wherever \b{} doesn't match
+   \b Match word boundary (between \w and \W)
+   \B Match except at word boundary (between \w and \w or \W and \W)
+   \A Match string start (regardless of /m)
+   \Z Match string end (before optional newline)
+   \z Match absolute string end
+   \G Match where previous m//g left off
+   \K Keep the stuff left of the \K, don't include it in $&
+
+=head2 QUANTIFIERS
+
+Quantifiers are greedy by default and match the B<longest> leftmost.
+
+   Maximal Minimal Possessive Allowed range
+   ------- ------- ---------- -------------
+   {n,m}   {n,m}?  {n,m}+     Must occur at least n times
+                              but no more than m times
+   {n,}    {n,}?   {n,}+      Must occur at least n times
+   {n}     {n}?    {n}+       Must occur exactly n times
+   *       *?      *+         0 or more times (same as {0,})
+   +       +?      ++         1 or more times (same as {1,})
+   ?       ??      ?+         0 or 1 time (same as {0,1})
+
+The possessive forms (new in Perl 5.10) prevent backtracking: what gets
+matched by a pattern with a possessive quantifier will not be backtracked
+into, even if that causes the whole match to fail.
+
+There is no quantifier C<{,n}>. That's currently illegal.
+
+=head2 EXTENDED CONSTRUCTS
+
+   (?#text)          A comment
+   (?:...)           Groups subexpressions without capturing (cluster)
+   (?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
+   (?=...)           Zero-width positive lookahead assertion
+   (*pla:...)        Same, starting in 5.32; experimentally in 5.28
+   (*positive_lookahead:...) Same, same versions as *pla
+   (?!...)           Zero-width negative lookahead assertion
+   (*nla:...)        Same, starting in 5.32; experimentally in 5.28
+   (*negative_lookahead:...) Same, same versions as *nla
+   (?<=...)          Zero-width positive lookbehind assertion
+   (*plb:...)        Same, starting in 5.32; experimentally in 5.28
+   (*positive_lookbehind:...) Same, same versions as *plb
+   (?<!...)          Zero-width negative lookbehind assertion
+   (*nlb:...)        Same, starting in 5.32; experimentally in 5.28
+   (*negative_lookbehind:...) Same, same versions as *plb
+   (?>...)           Grab what we can, prohibit backtracking
+   (*atomic:...)     Same, starting in 5.32; experimentally in 5.28
+   (?|...)           Branch reset
+   (?<name>...)      Named capture
+   (?'name'...)      Named capture
+   (?P<name>...)     Named capture (python syntax)
+   (?[...])          Extended bracketed character class
+   (?{ code })       Embedded code, return value becomes $^R
+   (??{ code })      Dynamic regex, return value used as regex
+   (?N)              Recurse into subpattern number N
+   (?-N), (?+N)      Recurse into Nth previous/next subpattern
+   (?R), (?0)        Recurse at the beginning of the whole pattern
+   (?&name)          Recurse into a named subpattern
+   (?P>name)         Recurse into a named subpattern (python syntax)
+   (?(cond)yes|no)
+   (?(cond)yes)      Conditional expression, where "(cond)" can be:
+                     (?=pat)   lookahead; also (*pla:pat)
+                               (*positive_lookahead:pat)
+                     (?!pat)   negative lookahead; also (*nla:pat)
+                               (*negative_lookahead:pat)
+                     (?<=pat)  lookbehind; also (*plb:pat)
+                               (*lookbehind:pat)
+                     (?<!pat)  negative lookbehind; also (*nlb:pat)
+                               (*negative_lookbehind:pat)
+                     (N)       subpattern N has matched something
+                     (<name>)  named subpattern has matched something
+                     ('name')  named subpattern has matched something
+                     (?{code}) code condition
+                     (R)       true if recursing
+                     (RN)      true if recursing into Nth subpattern
+                     (R&name)  true if recursing into named subpattern
+                     (DEFINE)  always false, no no-pattern allowed
+
+=head2 VARIABLES
+
+   $_    Default variable for operators to use
+
+   $`    Everything prior to matched string
+   $&    Entire matched string
+   $'    Everything after to matched string
+
+   ${^PREMATCH}   Everything prior to matched string
+   ${^MATCH}      Entire matched string
+   ${^POSTMATCH}  Everything after to matched string
+
+Note to those still using Perl 5.18 or earlier:
+The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
+within your program. Consult L<perlvar> for C<@->
+to see equivalent expressions that won't cause slow down.
+See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
+can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
+and C<${^POSTMATCH}>, but for them to be defined, you have to
+specify the C</p> (preserve) modifier on your regular expression.
+In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference.
+
+   $1, $2 ...  hold the Xth captured expr
+   $+    Last parenthesized pattern match
+   $^N   Holds the most recently closed capture
+   $^R   Holds the result of the last (?{...}) expr
+   @-    Offsets of starts of groups. $-[0] holds start of whole match
+   @+    Offsets of ends of groups. $+[0] holds end of whole match
+   %+    Named capture groups
+   %-    Named capture groups, as array refs
+
+Captured groups are numbered according to their I<opening> paren.
+
+=head2 FUNCTIONS
+
+   lc          Lowercase a string
+   lcfirst     Lowercase first char of a string
+   uc          Uppercase a string
+   ucfirst     Titlecase first char of a string
+   fc          Foldcase a string
+
+   pos         Return or set current match position
+   quotemeta   Quote metacharacters
+   reset       Reset m?pattern? status
+   study       Analyze string for optimizing matching
+
+   split       Use a regex to split a string into parts
+
+The first five of these are like the escape sequences C<\L>, C<\l>,
+C<\U>, C<\u>, and C<\F>.  For Titlecase, see L</Titlecase>; For
+Foldcase, see L</Foldcase>.
+
+=head2 TERMINOLOGY
+
+=head3 Titlecase
+
+Unicode concept which most often is equal to uppercase, but for
+certain characters like the German "sharp s" there is a difference.
+
+=head3 Foldcase
+
+Unicode form that is useful when comparing strings regardless of case,
+as certain characters have complex one-to-many case mappings. Primarily a
+variant of lowercase.
+
+=head1 AUTHOR
+
+Iain Truskett. Updated by the Perl 5 Porters.
+
+This document may be distributed under the same terms as Perl itself.
+
+=head1 SEE ALSO
+
+=over 4
+
+=item *
+
+L<perlretut> for a tutorial on regular expressions.
+
+=item *
+
+L<perlrequick> for a rapid tutorial.
+
+=item *
+
+L<perlre> for more details.
+
+=item *
+
+L<perlvar> for details on the variables.
+
+=item *
+
+L<perlop> for details on the operators.
+
+=item *
+
+L<perlfunc> for details on the functions.
+
+=item *
+
+L<perlfaq6> for FAQs on regular expressions.
+
+=item *
+
+L<perlrebackslash> for a reference on backslash sequences.
+
+=item *
+
+L<perlrecharclass> for a reference on character classes.
+
+=item *
+
+The L<re> module to alter behaviour and aid
+debugging.
+
+=item *
+
+L<perldebug/"Debugging Regular Expressions">
+
+=item *
+
+L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
+for details on regexes and internationalisation.
+
+=item *
+
+I<Mastering Regular Expressions> by Jeffrey Friedl
+(L<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
+reference on the topic.
+
+=back
+
+=head1 THANKS
+
+David P.C. Wollmann,
+Richard Soderberg,
+Sean M. Burke,
+Tom Christiansen,
+Jim Cromie,
+and
+Jeffrey Goff
+for useful advice.
+
+=cut