Initial Commit
This commit is contained in:
117
database/perl/vendor/lib/XML/Parser/Encodings/Japanese_Encodings.msg
vendored
Normal file
117
database/perl/vendor/lib/XML/Parser/Encodings/Japanese_Encodings.msg
vendored
Normal file
@@ -0,0 +1,117 @@
|
||||
Mapping files for Japanese encodings
|
||||
|
||||
1998 12/25
|
||||
|
||||
Fuji Xerox Information Systems
|
||||
MURATA Makoto
|
||||
|
||||
1. Overview
|
||||
|
||||
This version of XML::Parser and XML::Encoding does not come with map files for
|
||||
the charset "Shift_JIS" and the charset "euc-jp". Unfortunately, each of these
|
||||
charsets has more than one mapping. None of these mappings are
|
||||
considered as authoritative.
|
||||
|
||||
Therefore, we have come to believe that it is dangerous to provide map files
|
||||
for these charsets. Rather, we introduce several private charsets and map
|
||||
files for these private charsets. If IANA, Unicode Consoritum, and JIS
|
||||
eventually reach a consensus, we will be able to provide map files for
|
||||
"Shift_JIS" and "euc-jp".
|
||||
|
||||
2. Different mappings from existing charsets to Unicode
|
||||
|
||||
1) Different mappings in JIS X0221 and Unicode
|
||||
|
||||
The mapping between JIS X0208:1990 and Unicode 1.1 and the mapping
|
||||
between JIS X0212:1990 and Unicode 1.1 are published from Unicode
|
||||
consortium. They are available at
|
||||
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT and
|
||||
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0212.TXT,
|
||||
respectively.) These mapping files have a note as below:
|
||||
|
||||
# The kanji mappings are a normative part of ISO/IEC 10646. The
|
||||
# non-kanji mappings are provisional, pending definition of
|
||||
# official mappings by Japanese standards bodies.
|
||||
|
||||
Unfortunately, the non-kanji mappings in the Japanese standard for ISO 10646/1,
|
||||
namely JIS X 0221:1995, is different from the Unicode Consortium mapping since
|
||||
0x213D of JIS X 0208 is mapped to U+2014 (em dash) rather than U+2015
|
||||
(horizontal bar). Furthermore, JIS X 0221 clearly says that the mapping is
|
||||
informational and non-normative. As a result, some companies (e.g., Microsoft and
|
||||
Apple) have introduced slightly different mappings. Therefore, neither the
|
||||
Unicode consortium mapping nor the JIS X 0221 mapping are considered as
|
||||
authoritative.
|
||||
|
||||
2) Shift-JIS
|
||||
|
||||
This charset is especially problematic, since its definition has been unclear
|
||||
since its inception.
|
||||
|
||||
The current registration of the charset "Shift_JIS" is as below:
|
||||
|
||||
>Name: Shift_JIS (preferred MIME name)
|
||||
>MIBenum: 17
|
||||
>Source: A Microsoft code that extends csHalfWidthKatakana to include
|
||||
> kanji by adding a second byte when the value of the first
|
||||
> byte is in the ranges 81-9F or E0-EF.
|
||||
>Alias: MS_Kanji
|
||||
>Alias: csShiftJIS
|
||||
|
||||
First, this does not reference to the mapping "Shift-JIS to Unicode"
|
||||
published by the Unicode consortium (available at
|
||||
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/SHIFTJIS.TXT).
|
||||
|
||||
Second, "kanji" in this registration can be interepreted in different ways.
|
||||
Does this "kanji" reference to JIS X0208:1978, JIS X0208:1983, or JIS
|
||||
X0208:1990(== JIS X0208:1997)? These three standards are *incompatible* with
|
||||
each other. Moreover, we can even argue that "kanji" refers to JIS X0212 or
|
||||
ideographic characters in other countries.
|
||||
|
||||
Third, each company has extended Shift JIS. For example, Microsoft introduced
|
||||
OEM extensions (NEC extensionsand IBM extensions).
|
||||
|
||||
Forth, Shift JIS uses JIS X0201, which is almost upper-compatible with US-ASCII
|
||||
but is not quite. 5C and 7E of JIS X 0201 are different from backslash and
|
||||
tilde, respectively. However, many programming languages (e.g., Java)
|
||||
ignore this difference and assumes that 5C and 7E of Shift JIS are backslash
|
||||
and tilde.
|
||||
|
||||
|
||||
3. Proposed charsets and mappings
|
||||
|
||||
As a tentative solution, we introduce two private charsets for EUC-JP and four
|
||||
priviate charsets for Shift JIS.
|
||||
|
||||
1) EUC-JP
|
||||
|
||||
We have two charsets, namely "x-eucjp-unicode" and "x-eucjp-jisx0221". Their
|
||||
difference is only one code point. The mapping for the former is based
|
||||
on the Unicode Consortium mapping, while the latter is based on the JIS X0221
|
||||
mapping.
|
||||
|
||||
2) Shift JIS
|
||||
|
||||
We have four charsets, namely x-sjis-unicode, x-sjis-jisx0221,
|
||||
x-sjis-jdk117, and x-sjis-cp932.
|
||||
|
||||
The mapping for the charset x-sjis-unicode is the one published by the Unicode
|
||||
consortium. The mapping for x-sjis-jisx0221 is almost equivalent to
|
||||
x-sjis-unicode, but 0x213D of JIS X 0208 is mapped to U+2014 (em dash) rather
|
||||
than U+2015. The charset x-sjis-jdk117 is again almost equivalent to
|
||||
x-sjis-unicode, but 0x5C and 0x7E of JIS X0201 are mapped to backslash and
|
||||
tilde.
|
||||
|
||||
The charset x-sjis-cp932 is used by Microsoft Windows, and its mapping is
|
||||
published from the Unicode Consortium (available at:
|
||||
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.txt). The
|
||||
coded character set for this charset includes NEC-extensions and
|
||||
IBM-extensions. 0x5C and 0x7E of JIS X0201 are mapped to backslash and tilde;
|
||||
0x213D is mapped to U+2015; and 0x2140, 0x2141, 0x2142, and 0x215E of JIS X
|
||||
0208 are mapped to compatibility characters.
|
||||
|
||||
Makoto
|
||||
|
||||
Fuji Xerox Information Systems
|
||||
|
||||
Tel: +81-44-812-7230 Fax: +81-44-812-7231
|
||||
E-mail: murata@apsdc.ksp.fujixerox.co.jp
|
||||
51
database/perl/vendor/lib/XML/Parser/Encodings/README
vendored
Normal file
51
database/perl/vendor/lib/XML/Parser/Encodings/README
vendored
Normal file
@@ -0,0 +1,51 @@
|
||||
This directory contains binary encoding maps for some selected encodings.
|
||||
If they are placed in a directory listed in @XML::Parser::Expat::Encoding_Path,
|
||||
then they are automatically loaded by the XML::Parser::Expat::load_encoding
|
||||
function as needed. Otherwise you may load what you need directly by
|
||||
explicitly calling this function.
|
||||
|
||||
These maps were generated by a perl script that comes with the module
|
||||
XML::Encoding, compile_encoding, from XML formatted encoding maps that
|
||||
are distributed with that module. These XML encoding maps were generated
|
||||
in turn with a different script, domap, from mapping information contained
|
||||
on the Unicode version 2.0 CD-ROM. This CD-ROM comes with the Unicode
|
||||
Standard reference manual and can be ordered from the Unicode Consortium
|
||||
at http://www.unicode.org. The identical information is available on the
|
||||
internet at ftp://ftp.unicode.org/Public/MAPPINGS.
|
||||
|
||||
See the encoding.h header in the Expat sub-directory for a description of
|
||||
the structure of these files.
|
||||
|
||||
Clark Cooper
|
||||
December 12, 1998
|
||||
|
||||
================================================================
|
||||
|
||||
Contributed maps
|
||||
|
||||
This distribution contains four contributed encodings from MURATA Makoto
|
||||
<murata@apsdc.ksp.fujixerox.co.jp> that are variations on the encoding
|
||||
commonly called Shift_JIS:
|
||||
|
||||
x-sjis-cp932.enc
|
||||
x-sjis-jdk117.enc
|
||||
x-sjis-jisx0221.enc
|
||||
x-sjis-unicode.enc (This is the same encoding as the shift_jis.enc that
|
||||
was distributed with this module in version 2.17)
|
||||
|
||||
Please read his message (Japanese_Encodings.msg) about why these are here
|
||||
and why I've removed the shift_jis.enc encoding.
|
||||
|
||||
We also have two contributed encodings that are variations of the EUC-JP
|
||||
encoding from Yoshida Masato <yoshidam@inse.co.jp>:
|
||||
|
||||
x-euc-jp-jisx0221.enc
|
||||
x-euc-jp-unicode.enc
|
||||
|
||||
The comments that MURATA Makoto made in his message apply to these
|
||||
encodings too.
|
||||
|
||||
KangChan Lee <dolphin@comeng.chungnam.ac.kr> supplied the euc-kr encoding.
|
||||
|
||||
Clark Cooper
|
||||
December 26, 1998
|
||||
BIN
database/perl/vendor/lib/XML/Parser/Encodings/big5.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/big5.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/euc-kr.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/euc-kr.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/ibm866.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/ibm866.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-15.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-15.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-2.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-2.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-3.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-3.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-4.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-4.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-5.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-5.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-7.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-7.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-8.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-8.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-9.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/iso-8859-9.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/koi8-r.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/koi8-r.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/windows-1250.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/windows-1250.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/windows-1251.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/windows-1251.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/windows-1252.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/windows-1252.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/windows-1255.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/windows-1255.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-euc-jp-jisx0221.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-euc-jp-jisx0221.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-euc-jp-unicode.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-euc-jp-unicode.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-sjis-cp932.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-sjis-cp932.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-sjis-jdk117.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-sjis-jdk117.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-sjis-jisx0221.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-sjis-jisx0221.enc
vendored
Normal file
Binary file not shown.
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-sjis-unicode.enc
vendored
Normal file
BIN
database/perl/vendor/lib/XML/Parser/Encodings/x-sjis-unicode.enc
vendored
Normal file
Binary file not shown.
Reference in New Issue
Block a user