Initial Commit
This commit is contained in:
304
database/perl/vendor/lib/Unicode/GCString.pod
vendored
Normal file
304
database/perl/vendor/lib/Unicode/GCString.pod
vendored
Normal file
@@ -0,0 +1,304 @@
|
||||
=encoding utf-8
|
||||
|
||||
=head1 NAME
|
||||
|
||||
Unicode::GCString - String as Sequence of UAX #29 Grapheme Clusters
|
||||
|
||||
=head1 SYNOPSIS
|
||||
|
||||
use Unicode::GCString;
|
||||
$gcstring = Unicode::GCString->new($string);
|
||||
|
||||
=head1 DESCRIPTION
|
||||
|
||||
Unicode::GCString treats Unicode string as a sequence of
|
||||
I<extended grapheme clusters> defined by Unicode Standard Annex #29 [UAX #29].
|
||||
|
||||
B<Grapheme cluster> is a sequence of Unicode character(s) that consists of one
|
||||
B<grapheme base> and optional B<grapheme extender> and/or
|
||||
B<“prepend” character>. It is close in that people consider as I<character>.
|
||||
|
||||
=head2 Public Interface
|
||||
|
||||
=head3 Constructors
|
||||
|
||||
=over 4
|
||||
|
||||
=item new (STRING, [KEY => VALUE, ...])
|
||||
|
||||
=item new (STRING, [LINEBREAK])
|
||||
|
||||
I<Constructor>.
|
||||
Create new grapheme cluster string (Unicode::GCString object) from
|
||||
Unicode string STRING.
|
||||
|
||||
About optional KEY => VALUE pairs see L<Unicode::LineBreak/Options>.
|
||||
On second form, L<Unicode::LineBreak> object LINEBREAK controls
|
||||
breaking features.
|
||||
|
||||
B<Note>:
|
||||
The first form was introduced by release 2012.10.
|
||||
|
||||
=item copy
|
||||
|
||||
I<Copy constructor>.
|
||||
Create a copy of grapheme cluster string.
|
||||
Next position of new string is set at beginning.
|
||||
|
||||
=back
|
||||
|
||||
=head3 Sizes
|
||||
|
||||
=over 4
|
||||
|
||||
=item chars
|
||||
|
||||
I<Instance method>.
|
||||
Returns number of Unicode characters grapheme cluster string includes,
|
||||
i.e. length as Unicode string.
|
||||
|
||||
=item columns
|
||||
|
||||
I<Instance method>.
|
||||
Returns total number of columns of grapheme clusters
|
||||
defined by built-in character database.
|
||||
For more details see L<Unicode::LineBreak/DESCRIPTION>.
|
||||
|
||||
=item length
|
||||
|
||||
I<Instance method>.
|
||||
Returns number of grapheme clusters contained in grapheme cluster string.
|
||||
|
||||
=back
|
||||
|
||||
=head3 Operations as String
|
||||
|
||||
=over 4
|
||||
|
||||
=item as_string
|
||||
|
||||
=item C<">OBJECTC<">
|
||||
|
||||
I<Instance method>.
|
||||
Convert grapheme cluster string to Unicode string explicitly.
|
||||
|
||||
=item cmp (STRING)
|
||||
|
||||
=item STRING C<cmp> STRING
|
||||
|
||||
I<Instance method>.
|
||||
Compare strings. There are no oddities.
|
||||
One of each STRING may be Unicode string.
|
||||
|
||||
=item concat (STRING)
|
||||
|
||||
=item STRING C<.> STRING
|
||||
|
||||
I<Instance method>.
|
||||
Concatenate STRINGs. One of each STRING may be Unicode string.
|
||||
Note that number of columns (see columns()) or grapheme clusters
|
||||
(see length()) of resulting string is not always equal to sum of both
|
||||
strings.
|
||||
Next position of new string is that set on the left value.
|
||||
|
||||
=item join ([STRING, ...])
|
||||
|
||||
I<Instance method>.
|
||||
Join STRINGs inserting grapheme cluster string.
|
||||
Any of STRINGs may be Unicode string.
|
||||
|
||||
=item substr (OFFSET, [LENGTH, [REPLACEMENT]])
|
||||
|
||||
I<Instance method>.
|
||||
Returns substring of grapheme cluster string.
|
||||
OFFSET and LENGTH are based on grapheme clusters.
|
||||
If REPLACEMENT is specified, substring is replaced by it.
|
||||
REPLACEMENT may be Unicode string.
|
||||
|
||||
Note:
|
||||
This method cannot return the lvalue, unlike built-in substr().
|
||||
|
||||
=back
|
||||
|
||||
=head3 Operations as Sequence of Grapheme Clusters
|
||||
|
||||
=over 4
|
||||
|
||||
=item as_array
|
||||
|
||||
=item C<@{>OBJECTC<}>
|
||||
|
||||
=item as_arrayref
|
||||
|
||||
I<Instance method>.
|
||||
Convert grapheme cluster string to an array of grapheme clusters.
|
||||
|
||||
=item eos
|
||||
|
||||
I<Instance method>.
|
||||
Test if current position is at end of grapheme cluster string.
|
||||
|
||||
=item item ([OFFSET])
|
||||
|
||||
I<Instance method>.
|
||||
Returns OFFSET-th grapheme cluster.
|
||||
If OFFSET was not specified, returns next grapheme cluster.
|
||||
|
||||
=item next
|
||||
|
||||
=item C<E<lt>>OBJECTC<E<gt>>
|
||||
|
||||
I<Instance method>, iterative.
|
||||
Returns next grapheme cluster and increment next position.
|
||||
|
||||
=item pos ([OFFSET])
|
||||
|
||||
I<Instance method>.
|
||||
If optional OFFSET is specified, set next position by it.
|
||||
Returns next position of grapheme cluster string.
|
||||
|
||||
=back
|
||||
|
||||
=begin comment
|
||||
|
||||
=head4 Methods planned to be deprecated
|
||||
|
||||
=over 4
|
||||
|
||||
=item flag ([OFFSET, [VALUE]])
|
||||
|
||||
I<Instance method>.
|
||||
Get or set flag value of OFFEST-th grapheme cluster.
|
||||
If OFFSET was not specified, returns flag value of next grapheme cluster.
|
||||
Flag value is an non-zero integer not greater than 255 and initially is 0.
|
||||
|
||||
Predefined flags are:
|
||||
|
||||
=over 4
|
||||
|
||||
=item Unicode::LineBreak::ALLOW_BEFORE
|
||||
|
||||
Allow line breaking just before this grapheme cluster.
|
||||
|
||||
=item Unicode::LineBreak::PROHIBIT_BEFORE
|
||||
|
||||
Prohibit line breaking just before this grapheme cluster.
|
||||
|
||||
=back
|
||||
|
||||
=item lbclass ([OFFSET])
|
||||
|
||||
I<Instance method>.
|
||||
Returns Line Breaking Class (See L<Unicode::LineBreak>) of the first
|
||||
character of OFFSET-th grapheme cluster.
|
||||
If OFFSET was not specified, returns class of next grapheme cluster.
|
||||
|
||||
B<Note>:
|
||||
Use lbc().
|
||||
|
||||
=item lbclass_ext ([OFFSET])
|
||||
|
||||
I<Instance method>.
|
||||
Returns Line Breaking Class (See L<Unicode::LineBreak>) of the last
|
||||
grapheme extender of OFFSET-th grapheme cluster. If there are no
|
||||
grapheme extenders or its class is CM, value of lbclass() is returned.
|
||||
|
||||
B<Note>:
|
||||
Use lbcext().
|
||||
|
||||
=back
|
||||
|
||||
=end comment
|
||||
|
||||
=head3 Miscelaneous
|
||||
|
||||
=over 4
|
||||
|
||||
=item lbc
|
||||
|
||||
I<Instance method>.
|
||||
Returns Line Breaking Class (See L<Unicode::LineBreak>) of the first
|
||||
character of first grapheme cluster.
|
||||
|
||||
=item lbcext
|
||||
|
||||
I<Instance method>.
|
||||
Returns Line Breaking Class (See L<Unicode::LineBreak>) of the last
|
||||
grapheme extender of last grapheme cluster.
|
||||
If there are no grapheme extenders or its class is CM, value of last
|
||||
grapheme base will be returned.
|
||||
|
||||
=back
|
||||
|
||||
=head1 CAVEATS
|
||||
|
||||
=over 4
|
||||
|
||||
=item *
|
||||
|
||||
The grapheme cluster should not be referred to as "grapheme"
|
||||
even though Larry does.
|
||||
|
||||
=item *
|
||||
|
||||
On Perl around 5.10.1, implicit conversion from Unicode::GCString object to
|
||||
Unicode string sometimes let C<"utf8_mg_pos_cache_update"> cache be confused.
|
||||
|
||||
For example, instead of doing
|
||||
|
||||
$sub = substr($gcstring, $i, $j);
|
||||
|
||||
do
|
||||
|
||||
$sub = substr("$gcstring", $i, $j);
|
||||
|
||||
$sub = substr($gcstring->as_string, $i, $j);
|
||||
|
||||
=item *
|
||||
|
||||
This module implements I<default> algorithm for determining grapheme cluster
|
||||
boundaries. Tailoring mechanism has not been supported yet.
|
||||
|
||||
=back
|
||||
|
||||
=head1 VERSION
|
||||
|
||||
Consult $VERSION variable.
|
||||
|
||||
=head2 Incompatible Changes
|
||||
|
||||
=over 4
|
||||
|
||||
=item Release 2013.10
|
||||
|
||||
=over 4
|
||||
|
||||
=item *
|
||||
|
||||
The new() method can take non-Unicode string argument.
|
||||
In this case it will be decoded by iso-8859-1 (Latin 1) character set.
|
||||
That method of former releases would die with non-Unicode inputs.
|
||||
|
||||
=back
|
||||
|
||||
=back
|
||||
|
||||
=head1 SEE ALSO
|
||||
|
||||
[UAX #29]
|
||||
Mark Davis (ed.) (2009-2013).
|
||||
I<Unicode Standard Annex #29: Unicode Text Segmentation>, Revisions 15-23.
|
||||
L<http://www.unicode.org/reports/tr29/>.
|
||||
|
||||
=head1 AUTHOR
|
||||
|
||||
Hatuka*nezumi - IKEDA Soji <hatuka(at)nezumi.nu>
|
||||
|
||||
=head1 COPYRIGHT
|
||||
|
||||
Copyright (C) 2009-2013 Hatuka*nezumi - IKEDA Soji.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the same terms as Perl itself.
|
||||
|
||||
=cut
|
||||
Reference in New Issue
Block a user