Initial Commit
This commit is contained in:
263
database/perl/lib/encoding/warnings.pm
Normal file
263
database/perl/lib/encoding/warnings.pm
Normal file
@@ -0,0 +1,263 @@
|
||||
package encoding::warnings;
|
||||
$encoding::warnings::VERSION = '0.13';
|
||||
|
||||
use strict;
|
||||
use 5.007;
|
||||
|
||||
=head1 NAME
|
||||
|
||||
encoding::warnings - Warn on implicit encoding conversions
|
||||
|
||||
=head1 VERSION
|
||||
|
||||
This document describes version 0.13 of encoding::warnings, released
|
||||
June 20, 2016.
|
||||
|
||||
=head1 NOTICE
|
||||
|
||||
As of Perl 5.26.0, this module has no effect. The internal Perl feature
|
||||
that was used to implement this module has been removed. In recent years,
|
||||
much work has been done on the Perl core to eliminate discrepancies in the
|
||||
treatment of upgraded versus downgraded strings. In addition, the
|
||||
L<encoding> pragma, which caused many of the problems, is no longer
|
||||
supported. Thus, the warnings this module produced are no longer
|
||||
necessary.
|
||||
|
||||
Hence, if you load this module on Perl 5.26.0, you will get one warning
|
||||
that the module is no longer supported; and the module will do nothing
|
||||
thereafter.
|
||||
|
||||
=head1 SYNOPSIS
|
||||
|
||||
use encoding::warnings; # or 'FATAL' to raise fatal exceptions
|
||||
|
||||
utf8::encode($a = chr(20000)); # a byte-string (raw bytes)
|
||||
$b = chr(20000); # a unicode-string (wide characters)
|
||||
|
||||
# "Bytes implicitly upgraded into wide characters as iso-8859-1"
|
||||
$c = $a . $b;
|
||||
|
||||
=head1 DESCRIPTION
|
||||
|
||||
=head2 Overview of the problem
|
||||
|
||||
By default, there is a fundamental asymmetry in Perl's unicode model:
|
||||
implicit upgrading from byte-strings to unicode-strings assumes that
|
||||
they were encoded in I<ISO 8859-1 (Latin-1)>, but unicode-strings are
|
||||
downgraded with UTF-8 encoding. This happens because the first 256
|
||||
codepoints in Unicode happens to agree with Latin-1.
|
||||
|
||||
However, this silent upgrading can easily cause problems, if you happen
|
||||
to mix unicode strings with non-Latin1 data -- i.e. byte-strings encoded
|
||||
in UTF-8 or other encodings. The error will not manifest until the
|
||||
combined string is written to output, at which time it would be impossible
|
||||
to see where did the silent upgrading occur.
|
||||
|
||||
=head2 Detecting the problem
|
||||
|
||||
This module simplifies the process of diagnosing such problems. Just put
|
||||
this line on top of your main program:
|
||||
|
||||
use encoding::warnings;
|
||||
|
||||
Afterwards, implicit upgrading of high-bit bytes will raise a warning.
|
||||
Ex.: C<Bytes implicitly upgraded into wide characters as iso-8859-1 at
|
||||
- line 7>.
|
||||
|
||||
However, strings composed purely of ASCII code points (C<0x00>..C<0x7F>)
|
||||
will I<not> trigger this warning.
|
||||
|
||||
You can also make the warnings fatal by importing this module as:
|
||||
|
||||
use encoding::warnings 'FATAL';
|
||||
|
||||
=head2 Solving the problem
|
||||
|
||||
Most of the time, this warning occurs when a byte-string is concatenated
|
||||
with a unicode-string. There are a number of ways to solve it:
|
||||
|
||||
=over 4
|
||||
|
||||
=item * Upgrade both sides to unicode-strings
|
||||
|
||||
If your program does not need compatibility for Perl 5.6 and earlier,
|
||||
the recommended approach is to apply appropriate IO disciplines, so all
|
||||
data in your program become unicode-strings. See L<encoding>, L<open> and
|
||||
L<perlfunc/binmode> for how.
|
||||
|
||||
=item * Downgrade both sides to byte-strings
|
||||
|
||||
The other way works too, especially if you are sure that all your data
|
||||
are under the same encoding, or if compatibility with older versions
|
||||
of Perl is desired.
|
||||
|
||||
You may downgrade strings with C<Encode::encode> and C<utf8::encode>.
|
||||
See L<Encode> and L<utf8> for details.
|
||||
|
||||
=item * Specify the encoding for implicit byte-string upgrading
|
||||
|
||||
If you are confident that all byte-strings will be in a specific
|
||||
encoding like UTF-8, I<and> need not support older versions of Perl,
|
||||
use the C<encoding> pragma:
|
||||
|
||||
use encoding 'utf8';
|
||||
|
||||
Similarly, this will silence warnings from this module, and preserve the
|
||||
default behaviour:
|
||||
|
||||
use encoding 'iso-8859-1';
|
||||
|
||||
However, note that C<use encoding> actually had three distinct effects:
|
||||
|
||||
=over 4
|
||||
|
||||
=item * PerlIO layers for B<STDIN> and B<STDOUT>
|
||||
|
||||
This is similar to what L<open> pragma does.
|
||||
|
||||
=item * Literal conversions
|
||||
|
||||
This turns I<all> literal string in your program into unicode-strings
|
||||
(equivalent to a C<use utf8>), by decoding them using the specified
|
||||
encoding.
|
||||
|
||||
=item * Implicit upgrading for byte-strings
|
||||
|
||||
This will silence warnings from this module, as shown above.
|
||||
|
||||
=back
|
||||
|
||||
Because literal conversions also work on empty strings, it may surprise
|
||||
some people:
|
||||
|
||||
use encoding 'big5';
|
||||
|
||||
my $byte_string = pack("C*", 0xA4, 0x40);
|
||||
print length $a; # 2 here.
|
||||
$a .= ""; # concatenating with a unicode string...
|
||||
print length $a; # 1 here!
|
||||
|
||||
In other words, do not C<use encoding> unless you are certain that the
|
||||
program will not deal with any raw, 8-bit binary data at all.
|
||||
|
||||
However, the C<Filter =E<gt> 1> flavor of C<use encoding> will I<not>
|
||||
affect implicit upgrading for byte-strings, and is thus incapable of
|
||||
silencing warnings from this module. See L<encoding> for more details.
|
||||
|
||||
=back
|
||||
|
||||
=head1 CAVEATS
|
||||
|
||||
For Perl 5.9.4 or later, this module's effect is lexical.
|
||||
|
||||
For Perl versions prior to 5.9.4, this module affects the whole script,
|
||||
instead of inside its lexical block.
|
||||
|
||||
=cut
|
||||
|
||||
# Constants.
|
||||
sub ASCII () { 0 }
|
||||
sub LATIN1 () { 1 }
|
||||
sub FATAL () { 2 }
|
||||
|
||||
sub import {
|
||||
if ($] >= 5.025003) {
|
||||
require Carp;
|
||||
Carp::cluck(
|
||||
"encoding::warnings is not supported on Perl 5.26.0 and later"
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
# Install a ${^ENCODING} handler if no other one are already in place.
|
||||
my $class = shift;
|
||||
my $fatal = shift || '';
|
||||
|
||||
local $@;
|
||||
return if ${^ENCODING} and ref(${^ENCODING}) ne $class;
|
||||
return unless eval { require Encode; 1 };
|
||||
|
||||
my $ascii = Encode::find_encoding('us-ascii') or return;
|
||||
my $latin1 = Encode::find_encoding('iso-8859-1') or return;
|
||||
|
||||
# Have to undef explicitly here
|
||||
undef ${^ENCODING};
|
||||
|
||||
# Install a warning handler for decode()
|
||||
my $decoder = bless(
|
||||
[
|
||||
$ascii,
|
||||
$latin1,
|
||||
(($fatal eq 'FATAL') ? 'Carp::croak' : 'Carp::carp'),
|
||||
], $class,
|
||||
);
|
||||
|
||||
no warnings 'deprecated';
|
||||
${^ENCODING} = $decoder;
|
||||
use warnings 'deprecated';
|
||||
$^H{$class} = 1;
|
||||
}
|
||||
|
||||
sub unimport {
|
||||
my $class = shift;
|
||||
$^H{$class} = undef;
|
||||
undef ${^ENCODING};
|
||||
}
|
||||
|
||||
# Don't worry about source code literals.
|
||||
sub cat_decode {
|
||||
my $self = shift;
|
||||
return $self->[LATIN1]->cat_decode(@_);
|
||||
}
|
||||
|
||||
# Warn if the data is not purely US-ASCII.
|
||||
sub decode {
|
||||
my $self = shift;
|
||||
|
||||
DO_WARN: {
|
||||
if ($] >= 5.009004) {
|
||||
my $hints = (caller(0))[10];
|
||||
$hints->{ref($self)} or last DO_WARN;
|
||||
}
|
||||
|
||||
local $@;
|
||||
my $rv = eval { $self->[ASCII]->decode($_[0], Encode::FB_CROAK()) };
|
||||
return $rv unless $@;
|
||||
|
||||
require Carp;
|
||||
no strict 'refs';
|
||||
$self->[FATAL]->(
|
||||
"Bytes implicitly upgraded into wide characters as iso-8859-1"
|
||||
);
|
||||
|
||||
}
|
||||
|
||||
return $self->[LATIN1]->decode(@_);
|
||||
}
|
||||
|
||||
sub name { 'iso-8859-1' }
|
||||
|
||||
1;
|
||||
|
||||
__END__
|
||||
|
||||
=head1 SEE ALSO
|
||||
|
||||
L<perlunicode>, L<perluniintro>
|
||||
|
||||
L<open>, L<utf8>, L<encoding>, L<Encode>
|
||||
|
||||
=head1 AUTHORS
|
||||
|
||||
Audrey Tang
|
||||
|
||||
=head1 COPYRIGHT
|
||||
|
||||
Copyright 2004, 2005, 2006, 2007 by Audrey Tang E<lt>cpan@audreyt.orgE<gt>.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the same terms as Perl itself.
|
||||
|
||||
See L<http://www.perl.com/perl/misc/Artistic.html>
|
||||
|
||||
=cut
|
||||
Reference in New Issue
Block a user