1058 lines
34 KiB
Plaintext
1058 lines
34 KiB
Plaintext
=head1 NAME
|
|
|
|
perliol - C API for Perl's implementation of IO in Layers.
|
|
|
|
=head1 SYNOPSIS
|
|
|
|
/* Defining a layer ... */
|
|
#include <perliol.h>
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
This document describes the behavior and implementation of the PerlIO
|
|
abstraction described in L<perlapio> when C<USE_PERLIO> is defined.
|
|
|
|
=head2 History and Background
|
|
|
|
The PerlIO abstraction was introduced in perl5.003_02 but languished as
|
|
just an abstraction until perl5.7.0. However during that time a number
|
|
of perl extensions switched to using it, so the API is mostly fixed to
|
|
maintain (source) compatibility.
|
|
|
|
The aim of the implementation is to provide the PerlIO API in a flexible
|
|
and platform neutral manner. It is also a trial of an "Object Oriented
|
|
C, with vtables" approach which may be applied to Raku.
|
|
|
|
=head2 Basic Structure
|
|
|
|
PerlIO is a stack of layers.
|
|
|
|
The low levels of the stack work with the low-level operating system
|
|
calls (file descriptors in C) getting bytes in and out, the higher
|
|
layers of the stack buffer, filter, and otherwise manipulate the I/O,
|
|
and return characters (or bytes) to Perl. Terms I<above> and I<below>
|
|
are used to refer to the relative positioning of the stack layers.
|
|
|
|
A layer contains a "vtable", the table of I/O operations (at C level
|
|
a table of function pointers), and status flags. The functions in the
|
|
vtable implement operations like "open", "read", and "write".
|
|
|
|
When I/O, for example "read", is requested, the request goes from Perl
|
|
first down the stack using "read" functions of each layer, then at the
|
|
bottom the input is requested from the operating system services, then
|
|
the result is returned up the stack, finally being interpreted as Perl
|
|
data.
|
|
|
|
The requests do not necessarily go always all the way down to the
|
|
operating system: that's where PerlIO buffering comes into play.
|
|
|
|
When you do an open() and specify extra PerlIO layers to be deployed,
|
|
the layers you specify are "pushed" on top of the already existing
|
|
default stack. One way to see it is that "operating system is
|
|
on the left" and "Perl is on the right".
|
|
|
|
What exact layers are in this default stack depends on a lot of
|
|
things: your operating system, Perl version, Perl compile time
|
|
configuration, and Perl runtime configuration. See L<PerlIO>,
|
|
L<perlrun/PERLIO>, and L<open> for more information.
|
|
|
|
binmode() operates similarly to open(): by default the specified
|
|
layers are pushed on top of the existing stack.
|
|
|
|
However, note that even as the specified layers are "pushed on top"
|
|
for open() and binmode(), this doesn't mean that the effects are
|
|
limited to the "top": PerlIO layers can be very 'active' and inspect
|
|
and affect layers also deeper in the stack. As an example there
|
|
is a layer called "raw" which repeatedly "pops" layers until
|
|
it reaches the first layer that has declared itself capable of
|
|
handling binary data. The "pushed" layers are processed in left-to-right
|
|
order.
|
|
|
|
sysopen() operates (unsurprisingly) at a lower level in the stack than
|
|
open(). For example in Unix or Unix-like systems sysopen() operates
|
|
directly at the level of file descriptors: in the terms of PerlIO
|
|
layers, it uses only the "unix" layer, which is a rather thin wrapper
|
|
on top of the Unix file descriptors.
|
|
|
|
=head2 Layers vs Disciplines
|
|
|
|
Initial discussion of the ability to modify IO streams behaviour used
|
|
the term "discipline" for the entities which were added. This came (I
|
|
believe) from the use of the term in "sfio", which in turn borrowed it
|
|
from "line disciplines" on Unix terminals. However, this document (and
|
|
the C code) uses the term "layer".
|
|
|
|
This is, I hope, a natural term given the implementation, and should
|
|
avoid connotations that are inherent in earlier uses of "discipline"
|
|
for things which are rather different.
|
|
|
|
=head2 Data Structures
|
|
|
|
The basic data structure is a PerlIOl:
|
|
|
|
typedef struct _PerlIO PerlIOl;
|
|
typedef struct _PerlIO_funcs PerlIO_funcs;
|
|
typedef PerlIOl *PerlIO;
|
|
|
|
struct _PerlIO
|
|
{
|
|
PerlIOl * next; /* Lower layer */
|
|
PerlIO_funcs * tab; /* Functions for this layer */
|
|
U32 flags; /* Various flags for state */
|
|
};
|
|
|
|
A C<PerlIOl *> is a pointer to the struct, and the I<application>
|
|
level C<PerlIO *> is a pointer to a C<PerlIOl *> - i.e. a pointer
|
|
to a pointer to the struct. This allows the application level C<PerlIO *>
|
|
to remain constant while the actual C<PerlIOl *> underneath
|
|
changes. (Compare perl's C<SV *> which remains constant while its
|
|
C<sv_any> field changes as the scalar's type changes.) An IO stream is
|
|
then in general represented as a pointer to this linked-list of
|
|
"layers".
|
|
|
|
It should be noted that because of the double indirection in a C<PerlIO *>,
|
|
a C<< &(perlio->next) >> "is" a C<PerlIO *>, and so to some degree
|
|
at least one layer can use the "standard" API on the next layer down.
|
|
|
|
A "layer" is composed of two parts:
|
|
|
|
=over 4
|
|
|
|
=item 1.
|
|
|
|
The functions and attributes of the "layer class".
|
|
|
|
=item 2.
|
|
|
|
The per-instance data for a particular handle.
|
|
|
|
=back
|
|
|
|
=head2 Functions and Attributes
|
|
|
|
The functions and attributes are accessed via the "tab" (for table)
|
|
member of C<PerlIOl>. The functions (methods of the layer "class") are
|
|
fixed, and are defined by the C<PerlIO_funcs> type. They are broadly the
|
|
same as the public C<PerlIO_xxxxx> functions:
|
|
|
|
struct _PerlIO_funcs
|
|
{
|
|
Size_t fsize;
|
|
char * name;
|
|
Size_t size;
|
|
IV kind;
|
|
IV (*Pushed)(pTHX_ PerlIO *f,
|
|
const char *mode,
|
|
SV *arg,
|
|
PerlIO_funcs *tab);
|
|
IV (*Popped)(pTHX_ PerlIO *f);
|
|
PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab,
|
|
PerlIO_list_t *layers, IV n,
|
|
const char *mode,
|
|
int fd, int imode, int perm,
|
|
PerlIO *old,
|
|
int narg, SV **args);
|
|
IV (*Binmode)(pTHX_ PerlIO *f);
|
|
SV * (*Getarg)(pTHX_ PerlIO *f, CLONE_PARAMS *param, int flags)
|
|
IV (*Fileno)(pTHX_ PerlIO *f);
|
|
PerlIO * (*Dup)(pTHX_ PerlIO *f,
|
|
PerlIO *o,
|
|
CLONE_PARAMS *param,
|
|
int flags)
|
|
/* Unix-like functions - cf sfio line disciplines */
|
|
SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count);
|
|
SSize_t (*Unread)(pTHX_ PerlIO *f, const void *vbuf, Size_t count);
|
|
SSize_t (*Write)(pTHX_ PerlIO *f, const void *vbuf, Size_t count);
|
|
IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence);
|
|
Off_t (*Tell)(pTHX_ PerlIO *f);
|
|
IV (*Close)(pTHX_ PerlIO *f);
|
|
/* Stdio-like buffered IO functions */
|
|
IV (*Flush)(pTHX_ PerlIO *f);
|
|
IV (*Fill)(pTHX_ PerlIO *f);
|
|
IV (*Eof)(pTHX_ PerlIO *f);
|
|
IV (*Error)(pTHX_ PerlIO *f);
|
|
void (*Clearerr)(pTHX_ PerlIO *f);
|
|
void (*Setlinebuf)(pTHX_ PerlIO *f);
|
|
/* Perl's snooping functions */
|
|
STDCHAR * (*Get_base)(pTHX_ PerlIO *f);
|
|
Size_t (*Get_bufsiz)(pTHX_ PerlIO *f);
|
|
STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f);
|
|
SSize_t (*Get_cnt)(pTHX_ PerlIO *f);
|
|
void (*Set_ptrcnt)(pTHX_ PerlIO *f,STDCHAR *ptr,SSize_t cnt);
|
|
};
|
|
|
|
The first few members of the struct give a function table size for
|
|
compatibility check "name" for the layer, the size to C<malloc> for the per-instance data,
|
|
and some flags which are attributes of the class as whole (such as whether it is a buffering
|
|
layer), then follow the functions which fall into four basic groups:
|
|
|
|
=over 4
|
|
|
|
=item 1.
|
|
|
|
Opening and setup functions
|
|
|
|
=item 2.
|
|
|
|
Basic IO operations
|
|
|
|
=item 3.
|
|
|
|
Stdio class buffering options.
|
|
|
|
=item 4.
|
|
|
|
Functions to support Perl's traditional "fast" access to the buffer.
|
|
|
|
=back
|
|
|
|
A layer does not have to implement all the functions, but the whole
|
|
table has to be present. Unimplemented slots can be NULL (which will
|
|
result in an error when called) or can be filled in with stubs to
|
|
"inherit" behaviour from a "base class". This "inheritance" is fixed
|
|
for all instances of the layer, but as the layer chooses which stubs
|
|
to populate the table, limited "multiple inheritance" is possible.
|
|
|
|
=head2 Per-instance Data
|
|
|
|
The per-instance data are held in memory beyond the basic PerlIOl
|
|
struct, by making a PerlIOl the first member of the layer's struct
|
|
thus:
|
|
|
|
typedef struct
|
|
{
|
|
struct _PerlIO base; /* Base "class" info */
|
|
STDCHAR * buf; /* Start of buffer */
|
|
STDCHAR * end; /* End of valid part of buffer */
|
|
STDCHAR * ptr; /* Current position in buffer */
|
|
Off_t posn; /* Offset of buf into the file */
|
|
Size_t bufsiz; /* Real size of buffer */
|
|
IV oneword; /* Emergency buffer */
|
|
} PerlIOBuf;
|
|
|
|
In this way (as for perl's scalars) a pointer to a PerlIOBuf can be
|
|
treated as a pointer to a PerlIOl.
|
|
|
|
=head2 Layers in action.
|
|
|
|
table perlio unix
|
|
| |
|
|
+-----------+ +----------+ +--------+
|
|
PerlIO ->| |--->| next |--->| NULL |
|
|
+-----------+ +----------+ +--------+
|
|
| | | buffer | | fd |
|
|
+-----------+ | | +--------+
|
|
| | +----------+
|
|
|
|
|
|
The above attempts to show how the layer scheme works in a simple case.
|
|
The application's C<PerlIO *> points to an entry in the table(s)
|
|
representing open (allocated) handles. For example the first three slots
|
|
in the table correspond to C<stdin>,C<stdout> and C<stderr>. The table
|
|
in turn points to the current "top" layer for the handle - in this case
|
|
an instance of the generic buffering layer "perlio". That layer in turn
|
|
points to the next layer down - in this case the low-level "unix" layer.
|
|
|
|
The above is roughly equivalent to a "stdio" buffered stream, but with
|
|
much more flexibility:
|
|
|
|
=over 4
|
|
|
|
=item *
|
|
|
|
If Unix level C<read>/C<write>/C<lseek> is not appropriate for (say)
|
|
sockets then the "unix" layer can be replaced (at open time or even
|
|
dynamically) with a "socket" layer.
|
|
|
|
=item *
|
|
|
|
Different handles can have different buffering schemes. The "top"
|
|
layer could be the "mmap" layer if reading disk files was quicker
|
|
using C<mmap> than C<read>. An "unbuffered" stream can be implemented
|
|
simply by not having a buffer layer.
|
|
|
|
=item *
|
|
|
|
Extra layers can be inserted to process the data as it flows through.
|
|
This was the driving need for including the scheme in perl 5.7.0+ - we
|
|
needed a mechanism to allow data to be translated between perl's
|
|
internal encoding (conceptually at least Unicode as UTF-8), and the
|
|
"native" format used by the system. This is provided by the
|
|
":encoding(xxxx)" layer which typically sits above the buffering layer.
|
|
|
|
=item *
|
|
|
|
A layer can be added that does "\n" to CRLF translation. This layer
|
|
can be used on any platform, not just those that normally do such
|
|
things.
|
|
|
|
=back
|
|
|
|
=head2 Per-instance flag bits
|
|
|
|
The generic flag bits are a hybrid of C<O_XXXXX> style flags deduced
|
|
from the mode string passed to C<PerlIO_open()>, and state bits for
|
|
typical buffer layers.
|
|
|
|
=over 4
|
|
|
|
=item PERLIO_F_EOF
|
|
|
|
End of file.
|
|
|
|
=item PERLIO_F_CANWRITE
|
|
|
|
Writes are permitted, i.e. opened as "w" or "r+" or "a", etc.
|
|
|
|
=item PERLIO_F_CANREAD
|
|
|
|
Reads are permitted i.e. opened "r" or "w+" (or even "a+" - ick).
|
|
|
|
=item PERLIO_F_ERROR
|
|
|
|
An error has occurred (for C<PerlIO_error()>).
|
|
|
|
=item PERLIO_F_TRUNCATE
|
|
|
|
Truncate file suggested by open mode.
|
|
|
|
=item PERLIO_F_APPEND
|
|
|
|
All writes should be appends.
|
|
|
|
=item PERLIO_F_CRLF
|
|
|
|
Layer is performing Win32-like "\n" mapped to CR,LF for output and CR,LF
|
|
mapped to "\n" for input. Normally the provided "crlf" layer is the only
|
|
layer that need bother about this. C<PerlIO_binmode()> will mess with this
|
|
flag rather than add/remove layers if the C<PERLIO_K_CANCRLF> bit is set
|
|
for the layers class.
|
|
|
|
=item PERLIO_F_UTF8
|
|
|
|
Data written to this layer should be UTF-8 encoded; data provided
|
|
by this layer should be considered UTF-8 encoded. Can be set on any layer
|
|
by ":utf8" dummy layer. Also set on ":encoding" layer.
|
|
|
|
=item PERLIO_F_UNBUF
|
|
|
|
Layer is unbuffered - i.e. write to next layer down should occur for
|
|
each write to this layer.
|
|
|
|
=item PERLIO_F_WRBUF
|
|
|
|
The buffer for this layer currently holds data written to it but not sent
|
|
to next layer.
|
|
|
|
=item PERLIO_F_RDBUF
|
|
|
|
The buffer for this layer currently holds unconsumed data read from
|
|
layer below.
|
|
|
|
=item PERLIO_F_LINEBUF
|
|
|
|
Layer is line buffered. Write data should be passed to next layer down
|
|
whenever a "\n" is seen. Any data beyond the "\n" should then be
|
|
processed.
|
|
|
|
=item PERLIO_F_TEMP
|
|
|
|
File has been C<unlink()>ed, or should be deleted on C<close()>.
|
|
|
|
=item PERLIO_F_OPEN
|
|
|
|
Handle is open.
|
|
|
|
=item PERLIO_F_FASTGETS
|
|
|
|
This instance of this layer supports the "fast C<gets>" interface.
|
|
Normally set based on C<PERLIO_K_FASTGETS> for the class and by the
|
|
existence of the function(s) in the table. However a class that
|
|
normally provides that interface may need to avoid it on a
|
|
particular instance. The "pending" layer needs to do this when
|
|
it is pushed above a layer which does not support the interface.
|
|
(Perl's C<sv_gets()> does not expect the streams fast C<gets> behaviour
|
|
to change during one "get".)
|
|
|
|
=back
|
|
|
|
=head2 Methods in Detail
|
|
|
|
=over 4
|
|
|
|
=item fsize
|
|
|
|
Size_t fsize;
|
|
|
|
Size of the function table. This is compared against the value PerlIO
|
|
code "knows" as a compatibility check. Future versions I<may> be able
|
|
to tolerate layers compiled against an old version of the headers.
|
|
|
|
=item name
|
|
|
|
char * name;
|
|
|
|
The name of the layer whose open() method Perl should invoke on
|
|
open(). For example if the layer is called APR, you will call:
|
|
|
|
open $fh, ">:APR", ...
|
|
|
|
and Perl knows that it has to invoke the PerlIOAPR_open() method
|
|
implemented by the APR layer.
|
|
|
|
=item size
|
|
|
|
Size_t size;
|
|
|
|
The size of the per-instance data structure, e.g.:
|
|
|
|
sizeof(PerlIOAPR)
|
|
|
|
If this field is zero then C<PerlIO_pushed> does not malloc anything
|
|
and assumes layer's Pushed function will do any required layer stack
|
|
manipulation - used to avoid malloc/free overhead for dummy layers.
|
|
If the field is non-zero it must be at least the size of C<PerlIOl>,
|
|
C<PerlIO_pushed> will allocate memory for the layer's data structures
|
|
and link new layer onto the stream's stack. (If the layer's Pushed
|
|
method returns an error indication the layer is popped again.)
|
|
|
|
=item kind
|
|
|
|
IV kind;
|
|
|
|
=over 4
|
|
|
|
=item * PERLIO_K_BUFFERED
|
|
|
|
The layer is buffered.
|
|
|
|
=item * PERLIO_K_RAW
|
|
|
|
The layer is acceptable to have in a binmode(FH) stack - i.e. it does not
|
|
(or will configure itself not to) transform bytes passing through it.
|
|
|
|
=item * PERLIO_K_CANCRLF
|
|
|
|
Layer can translate between "\n" and CRLF line ends.
|
|
|
|
=item * PERLIO_K_FASTGETS
|
|
|
|
Layer allows buffer snooping.
|
|
|
|
=item * PERLIO_K_MULTIARG
|
|
|
|
Used when the layer's open() accepts more arguments than usual. The
|
|
extra arguments should come not before the C<MODE> argument. When this
|
|
flag is used it's up to the layer to validate the args.
|
|
|
|
=back
|
|
|
|
=item Pushed
|
|
|
|
IV (*Pushed)(pTHX_ PerlIO *f,const char *mode, SV *arg);
|
|
|
|
The only absolutely mandatory method. Called when the layer is pushed
|
|
onto the stack. The C<mode> argument may be NULL if this occurs
|
|
post-open. The C<arg> will be non-C<NULL> if an argument string was
|
|
passed. In most cases this should call C<PerlIOBase_pushed()> to
|
|
convert C<mode> into the appropriate C<PERLIO_F_XXXXX> flags in
|
|
addition to any actions the layer itself takes. If a layer is not
|
|
expecting an argument it need neither save the one passed to it, nor
|
|
provide C<Getarg()> (it could perhaps C<Perl_warn> that the argument
|
|
was un-expected).
|
|
|
|
Returns 0 on success. On failure returns -1 and should set errno.
|
|
|
|
=item Popped
|
|
|
|
IV (*Popped)(pTHX_ PerlIO *f);
|
|
|
|
Called when the layer is popped from the stack. A layer will normally
|
|
be popped after C<Close()> is called. But a layer can be popped
|
|
without being closed if the program is dynamically managing layers on
|
|
the stream. In such cases C<Popped()> should free any resources
|
|
(buffers, translation tables, ...) not held directly in the layer's
|
|
struct. It should also C<Unread()> any unconsumed data that has been
|
|
read and buffered from the layer below back to that layer, so that it
|
|
can be re-provided to what ever is now above.
|
|
|
|
Returns 0 on success and failure. If C<Popped()> returns I<true> then
|
|
I<perlio.c> assumes that either the layer has popped itself, or the
|
|
layer is super special and needs to be retained for other reasons.
|
|
In most cases it should return I<false>.
|
|
|
|
=item Open
|
|
|
|
PerlIO * (*Open)(...);
|
|
|
|
The C<Open()> method has lots of arguments because it combines the
|
|
functions of perl's C<open>, C<PerlIO_open>, perl's C<sysopen>,
|
|
C<PerlIO_fdopen> and C<PerlIO_reopen>. The full prototype is as
|
|
follows:
|
|
|
|
PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab,
|
|
PerlIO_list_t *layers, IV n,
|
|
const char *mode,
|
|
int fd, int imode, int perm,
|
|
PerlIO *old,
|
|
int narg, SV **args);
|
|
|
|
Open should (perhaps indirectly) call C<PerlIO_allocate()> to allocate
|
|
a slot in the table and associate it with the layers information for
|
|
the opened file, by calling C<PerlIO_push>. The I<layers> is an
|
|
array of all the layers destined for the C<PerlIO *>, and any
|
|
arguments passed to them, I<n> is the index into that array of the
|
|
layer being called. The macro C<PerlIOArg> will return a (possibly
|
|
C<NULL>) SV * for the argument passed to the layer.
|
|
|
|
Where a layer opens or takes ownership of a file descriptor, that layer is
|
|
responsible for getting the file descriptor's close-on-exec flag into the
|
|
correct state. The flag should be clear for a file descriptor numbered
|
|
less than or equal to C<PL_maxsysfd>, and set for any file descriptor
|
|
numbered higher. For thread safety, when a layer opens a new file
|
|
descriptor it should if possible open it with the close-on-exec flag
|
|
initially set.
|
|
|
|
The I<mode> string is an "C<fopen()>-like" string which would match
|
|
the regular expression C</^[I#]?[rwa]\+?[bt]?$/>.
|
|
|
|
The C<'I'> prefix is used during creation of C<stdin>..C<stderr> via
|
|
special C<PerlIO_fdopen> calls; the C<'#'> prefix means that this is
|
|
C<sysopen> and that I<imode> and I<perm> should be passed to
|
|
C<PerlLIO_open3>; C<'r'> means B<r>ead, C<'w'> means B<w>rite and
|
|
C<'a'> means B<a>ppend. The C<'+'> suffix means that both reading and
|
|
writing/appending are permitted. The C<'b'> suffix means file should
|
|
be binary, and C<'t'> means it is text. (Almost all layers should do
|
|
the IO in binary mode, and ignore the b/t bits. The C<:crlf> layer
|
|
should be pushed to handle the distinction.)
|
|
|
|
If I<old> is not C<NULL> then this is a C<PerlIO_reopen>. Perl itself
|
|
does not use this (yet?) and semantics are a little vague.
|
|
|
|
If I<fd> not negative then it is the numeric file descriptor I<fd>,
|
|
which will be open in a manner compatible with the supplied mode
|
|
string, the call is thus equivalent to C<PerlIO_fdopen>. In this case
|
|
I<nargs> will be zero.
|
|
The file descriptor may have the close-on-exec flag either set or clear;
|
|
it is the responsibility of the layer that takes ownership of it to get
|
|
the flag into the correct state.
|
|
|
|
If I<nargs> is greater than zero then it gives the number of arguments
|
|
passed to C<open>, otherwise it will be 1 if for example
|
|
C<PerlIO_open> was called. In simple cases SvPV_nolen(*args) is the
|
|
pathname to open.
|
|
|
|
If a layer provides C<Open()> it should normally call the C<Open()>
|
|
method of next layer down (if any) and then push itself on top if that
|
|
succeeds. C<PerlIOBase_open> is provided to do exactly that, so in
|
|
most cases you don't have to write your own C<Open()> method. If this
|
|
method is not defined, other layers may have difficulty pushing
|
|
themselves on top of it during open.
|
|
|
|
If C<PerlIO_push> was performed and open has failed, it must
|
|
C<PerlIO_pop> itself, since if it's not, the layer won't be removed
|
|
and may cause bad problems.
|
|
|
|
Returns C<NULL> on failure.
|
|
|
|
=item Binmode
|
|
|
|
IV (*Binmode)(pTHX_ PerlIO *f);
|
|
|
|
Optional. Used when C<:raw> layer is pushed (explicitly or as a result
|
|
of binmode(FH)). If not present layer will be popped. If present
|
|
should configure layer as binary (or pop itself) and return 0.
|
|
If it returns -1 for error C<binmode> will fail with layer
|
|
still on the stack.
|
|
|
|
=item Getarg
|
|
|
|
SV * (*Getarg)(pTHX_ PerlIO *f,
|
|
CLONE_PARAMS *param, int flags);
|
|
|
|
Optional. If present should return an SV * representing the string
|
|
argument passed to the layer when it was
|
|
pushed. e.g. ":encoding(ascii)" would return an SvPV with value
|
|
"ascii". (I<param> and I<flags> arguments can be ignored in most
|
|
cases)
|
|
|
|
C<Dup> uses C<Getarg> to retrieve the argument originally passed to
|
|
C<Pushed>, so you must implement this function if your layer has an
|
|
extra argument to C<Pushed> and will ever be C<Dup>ed.
|
|
|
|
=item Fileno
|
|
|
|
IV (*Fileno)(pTHX_ PerlIO *f);
|
|
|
|
Returns the Unix/Posix numeric file descriptor for the handle. Normally
|
|
C<PerlIOBase_fileno()> (which just asks next layer down) will suffice
|
|
for this.
|
|
|
|
Returns -1 on error, which is considered to include the case where the
|
|
layer cannot provide such a file descriptor.
|
|
|
|
=item Dup
|
|
|
|
PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o,
|
|
CLONE_PARAMS *param, int flags);
|
|
|
|
XXX: Needs more docs.
|
|
|
|
Used as part of the "clone" process when a thread is spawned (in which
|
|
case param will be non-NULL) and when a stream is being duplicated via
|
|
'&' in the C<open>.
|
|
|
|
Similar to C<Open>, returns PerlIO* on success, C<NULL> on failure.
|
|
|
|
=item Read
|
|
|
|
SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count);
|
|
|
|
Basic read operation.
|
|
|
|
Typically will call C<Fill> and manipulate pointers (possibly via the
|
|
API). C<PerlIOBuf_read()> may be suitable for derived classes which
|
|
provide "fast gets" methods.
|
|
|
|
Returns actual bytes read, or -1 on an error.
|
|
|
|
=item Unread
|
|
|
|
SSize_t (*Unread)(pTHX_ PerlIO *f,
|
|
const void *vbuf, Size_t count);
|
|
|
|
A superset of stdio's C<ungetc()>. Should arrange for future reads to
|
|
see the bytes in C<vbuf>. If there is no obviously better implementation
|
|
then C<PerlIOBase_unread()> provides the function by pushing a "fake"
|
|
"pending" layer above the calling layer.
|
|
|
|
Returns the number of unread chars.
|
|
|
|
=item Write
|
|
|
|
SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count);
|
|
|
|
Basic write operation.
|
|
|
|
Returns bytes written or -1 on an error.
|
|
|
|
=item Seek
|
|
|
|
IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence);
|
|
|
|
Position the file pointer. Should normally call its own C<Flush>
|
|
method and then the C<Seek> method of next layer down.
|
|
|
|
Returns 0 on success, -1 on failure.
|
|
|
|
=item Tell
|
|
|
|
Off_t (*Tell)(pTHX_ PerlIO *f);
|
|
|
|
Return the file pointer. May be based on layers cached concept of
|
|
position to avoid overhead.
|
|
|
|
Returns -1 on failure to get the file pointer.
|
|
|
|
=item Close
|
|
|
|
IV (*Close)(pTHX_ PerlIO *f);
|
|
|
|
Close the stream. Should normally call C<PerlIOBase_close()> to flush
|
|
itself and close layers below, and then deallocate any data structures
|
|
(buffers, translation tables, ...) not held directly in the data
|
|
structure.
|
|
|
|
Returns 0 on success, -1 on failure.
|
|
|
|
=item Flush
|
|
|
|
IV (*Flush)(pTHX_ PerlIO *f);
|
|
|
|
Should make stream's state consistent with layers below. That is, any
|
|
buffered write data should be written, and file position of lower layers
|
|
adjusted for data read from below but not actually consumed.
|
|
(Should perhaps C<Unread()> such data to the lower layer.)
|
|
|
|
Returns 0 on success, -1 on failure.
|
|
|
|
=item Fill
|
|
|
|
IV (*Fill)(pTHX_ PerlIO *f);
|
|
|
|
The buffer for this layer should be filled (for read) from layer
|
|
below. When you "subclass" PerlIOBuf layer, you want to use its
|
|
I<_read> method and to supply your own fill method, which fills the
|
|
PerlIOBuf's buffer.
|
|
|
|
Returns 0 on success, -1 on failure.
|
|
|
|
=item Eof
|
|
|
|
IV (*Eof)(pTHX_ PerlIO *f);
|
|
|
|
Return end-of-file indicator. C<PerlIOBase_eof()> is normally sufficient.
|
|
|
|
Returns 0 on end-of-file, 1 if not end-of-file, -1 on error.
|
|
|
|
=item Error
|
|
|
|
IV (*Error)(pTHX_ PerlIO *f);
|
|
|
|
Return error indicator. C<PerlIOBase_error()> is normally sufficient.
|
|
|
|
Returns 1 if there is an error (usually when C<PERLIO_F_ERROR> is set),
|
|
0 otherwise.
|
|
|
|
=item Clearerr
|
|
|
|
void (*Clearerr)(pTHX_ PerlIO *f);
|
|
|
|
Clear end-of-file and error indicators. Should call C<PerlIOBase_clearerr()>
|
|
to set the C<PERLIO_F_XXXXX> flags, which may suffice.
|
|
|
|
=item Setlinebuf
|
|
|
|
void (*Setlinebuf)(pTHX_ PerlIO *f);
|
|
|
|
Mark the stream as line buffered. C<PerlIOBase_setlinebuf()> sets the
|
|
PERLIO_F_LINEBUF flag and is normally sufficient.
|
|
|
|
=item Get_base
|
|
|
|
STDCHAR * (*Get_base)(pTHX_ PerlIO *f);
|
|
|
|
Allocate (if not already done so) the read buffer for this layer and
|
|
return pointer to it. Return NULL on failure.
|
|
|
|
=item Get_bufsiz
|
|
|
|
Size_t (*Get_bufsiz)(pTHX_ PerlIO *f);
|
|
|
|
Return the number of bytes that last C<Fill()> put in the buffer.
|
|
|
|
=item Get_ptr
|
|
|
|
STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f);
|
|
|
|
Return the current read pointer relative to this layer's buffer.
|
|
|
|
=item Get_cnt
|
|
|
|
SSize_t (*Get_cnt)(pTHX_ PerlIO *f);
|
|
|
|
Return the number of bytes left to be read in the current buffer.
|
|
|
|
=item Set_ptrcnt
|
|
|
|
void (*Set_ptrcnt)(pTHX_ PerlIO *f,
|
|
STDCHAR *ptr, SSize_t cnt);
|
|
|
|
Adjust the read pointer and count of bytes to match C<ptr> and/or C<cnt>.
|
|
The application (or layer above) must ensure they are consistent.
|
|
(Checking is allowed by the paranoid.)
|
|
|
|
=back
|
|
|
|
=head2 Utilities
|
|
|
|
To ask for the next layer down use PerlIONext(PerlIO *f).
|
|
|
|
To check that a PerlIO* is valid use PerlIOValid(PerlIO *f). (All
|
|
this does is really just to check that the pointer is non-NULL and
|
|
that the pointer behind that is non-NULL.)
|
|
|
|
PerlIOBase(PerlIO *f) returns the "Base" pointer, or in other words,
|
|
the C<PerlIOl*> pointer.
|
|
|
|
PerlIOSelf(PerlIO* f, type) return the PerlIOBase cast to a type.
|
|
|
|
Perl_PerlIO_or_Base(PerlIO* f, callback, base, failure, args) either
|
|
calls the I<callback> from the functions of the layer I<f> (just by
|
|
the name of the IO function, like "Read") with the I<args>, or if
|
|
there is no such callback, calls the I<base> version of the callback
|
|
with the same args, or if the f is invalid, set errno to EBADF and
|
|
return I<failure>.
|
|
|
|
Perl_PerlIO_or_fail(PerlIO* f, callback, failure, args) either calls
|
|
the I<callback> of the functions of the layer I<f> with the I<args>,
|
|
or if there is no such callback, set errno to EINVAL. Or if the f is
|
|
invalid, set errno to EBADF and return I<failure>.
|
|
|
|
Perl_PerlIO_or_Base_void(PerlIO* f, callback, base, args) either calls
|
|
the I<callback> of the functions of the layer I<f> with the I<args>,
|
|
or if there is no such callback, calls the I<base> version of the
|
|
callback with the same args, or if the f is invalid, set errno to
|
|
EBADF.
|
|
|
|
Perl_PerlIO_or_fail_void(PerlIO* f, callback, args) either calls the
|
|
I<callback> of the functions of the layer I<f> with the I<args>, or if
|
|
there is no such callback, set errno to EINVAL. Or if the f is
|
|
invalid, set errno to EBADF.
|
|
|
|
=head2 Implementing PerlIO Layers
|
|
|
|
If you find the implementation document unclear or not sufficient,
|
|
look at the existing PerlIO layer implementations, which include:
|
|
|
|
=over
|
|
|
|
=item * C implementations
|
|
|
|
The F<perlio.c> and F<perliol.h> in the Perl core implement the
|
|
"unix", "perlio", "stdio", "crlf", "utf8", "byte", "raw", "pending"
|
|
layers, and also the "mmap" and "win32" layers if applicable.
|
|
(The "win32" is currently unfinished and unused, to see what is used
|
|
instead in Win32, see L<PerlIO/"Querying the layers of filehandles"> .)
|
|
|
|
PerlIO::encoding, PerlIO::scalar, PerlIO::via in the Perl core.
|
|
|
|
PerlIO::gzip and APR::PerlIO (mod_perl 2.0) on CPAN.
|
|
|
|
=item * Perl implementations
|
|
|
|
PerlIO::via::QuotedPrint in the Perl core and PerlIO::via::* on CPAN.
|
|
|
|
=back
|
|
|
|
If you are creating a PerlIO layer, you may want to be lazy, in other
|
|
words, implement only the methods that interest you. The other methods
|
|
you can either replace with the "blank" methods
|
|
|
|
PerlIOBase_noop_ok
|
|
PerlIOBase_noop_fail
|
|
|
|
(which do nothing, and return zero and -1, respectively) or for
|
|
certain methods you may assume a default behaviour by using a NULL
|
|
method. The Open method looks for help in the 'parent' layer.
|
|
The following table summarizes the behaviour:
|
|
|
|
method behaviour with NULL
|
|
|
|
Clearerr PerlIOBase_clearerr
|
|
Close PerlIOBase_close
|
|
Dup PerlIOBase_dup
|
|
Eof PerlIOBase_eof
|
|
Error PerlIOBase_error
|
|
Fileno PerlIOBase_fileno
|
|
Fill FAILURE
|
|
Flush SUCCESS
|
|
Getarg SUCCESS
|
|
Get_base FAILURE
|
|
Get_bufsiz FAILURE
|
|
Get_cnt FAILURE
|
|
Get_ptr FAILURE
|
|
Open INHERITED
|
|
Popped SUCCESS
|
|
Pushed SUCCESS
|
|
Read PerlIOBase_read
|
|
Seek FAILURE
|
|
Set_cnt FAILURE
|
|
Set_ptrcnt FAILURE
|
|
Setlinebuf PerlIOBase_setlinebuf
|
|
Tell FAILURE
|
|
Unread PerlIOBase_unread
|
|
Write FAILURE
|
|
|
|
FAILURE Set errno (to EINVAL in Unixish, to LIB$_INVARG in VMS)
|
|
and return -1 (for numeric return values) or NULL (for
|
|
pointers)
|
|
INHERITED Inherited from the layer below
|
|
SUCCESS Return 0 (for numeric return values) or a pointer
|
|
|
|
=head2 Core Layers
|
|
|
|
The file C<perlio.c> provides the following layers:
|
|
|
|
=over 4
|
|
|
|
=item "unix"
|
|
|
|
A basic non-buffered layer which calls Unix/POSIX C<read()>, C<write()>,
|
|
C<lseek()>, C<close()>. No buffering. Even on platforms that distinguish
|
|
between O_TEXT and O_BINARY this layer is always O_BINARY.
|
|
|
|
=item "perlio"
|
|
|
|
A very complete generic buffering layer which provides the whole of
|
|
PerlIO API. It is also intended to be used as a "base class" for other
|
|
layers. (For example its C<Read()> method is implemented in terms of
|
|
the C<Get_cnt()>/C<Get_ptr()>/C<Set_ptrcnt()> methods).
|
|
|
|
"perlio" over "unix" provides a complete replacement for stdio as seen
|
|
via PerlIO API. This is the default for USE_PERLIO when system's stdio
|
|
does not permit perl's "fast gets" access, and which do not
|
|
distinguish between C<O_TEXT> and C<O_BINARY>.
|
|
|
|
=item "stdio"
|
|
|
|
A layer which provides the PerlIO API via the layer scheme, but
|
|
implements it by calling system's stdio. This is (currently) the default
|
|
if system's stdio provides sufficient access to allow perl's "fast gets"
|
|
access and which do not distinguish between C<O_TEXT> and C<O_BINARY>.
|
|
|
|
=item "crlf"
|
|
|
|
A layer derived using "perlio" as a base class. It provides Win32-like
|
|
"\n" to CR,LF translation. Can either be applied above "perlio" or serve
|
|
as the buffer layer itself. "crlf" over "unix" is the default if system
|
|
distinguishes between C<O_TEXT> and C<O_BINARY> opens. (At some point
|
|
"unix" will be replaced by a "native" Win32 IO layer on that platform,
|
|
as Win32's read/write layer has various drawbacks.) The "crlf" layer is
|
|
a reasonable model for a layer which transforms data in some way.
|
|
|
|
=item "mmap"
|
|
|
|
If Configure detects C<mmap()> functions this layer is provided (with
|
|
"perlio" as a "base") which does "read" operations by mmap()ing the
|
|
file. Performance improvement is marginal on modern systems, so it is
|
|
mainly there as a proof of concept. It is likely to be unbundled from
|
|
the core at some point. The "mmap" layer is a reasonable model for a
|
|
minimalist "derived" layer.
|
|
|
|
=item "pending"
|
|
|
|
An "internal" derivative of "perlio" which can be used to provide
|
|
Unread() function for layers which have no buffer or cannot be
|
|
bothered. (Basically this layer's C<Fill()> pops itself off the stack
|
|
and so resumes reading from layer below.)
|
|
|
|
=item "raw"
|
|
|
|
A dummy layer which never exists on the layer stack. Instead when
|
|
"pushed" it actually pops the stack removing itself, it then calls
|
|
Binmode function table entry on all the layers in the stack - normally
|
|
this (via PerlIOBase_binmode) removes any layers which do not have
|
|
C<PERLIO_K_RAW> bit set. Layers can modify that behaviour by defining
|
|
their own Binmode entry.
|
|
|
|
=item "utf8"
|
|
|
|
Another dummy layer. When pushed it pops itself and sets the
|
|
C<PERLIO_F_UTF8> flag on the layer which was (and now is once more)
|
|
the top of the stack.
|
|
|
|
=back
|
|
|
|
In addition F<perlio.c> also provides a number of C<PerlIOBase_xxxx()>
|
|
functions which are intended to be used in the table slots of classes
|
|
which do not need to do anything special for a particular method.
|
|
|
|
=head2 Extension Layers
|
|
|
|
Layers can be made available by extension modules. When an unknown layer
|
|
is encountered the PerlIO code will perform the equivalent of :
|
|
|
|
use PerlIO 'layer';
|
|
|
|
Where I<layer> is the unknown layer. F<PerlIO.pm> will then attempt to:
|
|
|
|
require PerlIO::layer;
|
|
|
|
If after that process the layer is still not defined then the C<open>
|
|
will fail.
|
|
|
|
The following extension layers are bundled with perl:
|
|
|
|
=over 4
|
|
|
|
=item ":encoding"
|
|
|
|
use Encoding;
|
|
|
|
makes this layer available, although F<PerlIO.pm> "knows" where to
|
|
find it. It is an example of a layer which takes an argument as it is
|
|
called thus:
|
|
|
|
open( $fh, "<:encoding(iso-8859-7)", $pathname );
|
|
|
|
=item ":scalar"
|
|
|
|
Provides support for reading data from and writing data to a scalar.
|
|
|
|
open( $fh, "+<:scalar", \$scalar );
|
|
|
|
When a handle is so opened, then reads get bytes from the string value
|
|
of I<$scalar>, and writes change the value. In both cases the position
|
|
in I<$scalar> starts as zero but can be altered via C<seek>, and
|
|
determined via C<tell>.
|
|
|
|
Please note that this layer is implied when calling open() thus:
|
|
|
|
open( $fh, "+<", \$scalar );
|
|
|
|
=item ":via"
|
|
|
|
Provided to allow layers to be implemented as Perl code. For instance:
|
|
|
|
use PerlIO::via::StripHTML;
|
|
open( my $fh, "<:via(StripHTML)", "index.html" );
|
|
|
|
See L<PerlIO::via> for details.
|
|
|
|
=back
|
|
|
|
=head1 TODO
|
|
|
|
Things that need to be done to improve this document.
|
|
|
|
=over
|
|
|
|
=item *
|
|
|
|
Explain how to make a valid fh without going through open()(i.e. apply
|
|
a layer). For example if the file is not opened through perl, but we
|
|
want to get back a fh, like it was opened by Perl.
|
|
|
|
How PerlIO_apply_layera fits in, where its docs, was it made public?
|
|
|
|
Currently the example could be something like this:
|
|
|
|
PerlIO *foo_to_PerlIO(pTHX_ char *mode, ...)
|
|
{
|
|
char *mode; /* "w", "r", etc */
|
|
const char *layers = ":APR"; /* the layer name */
|
|
PerlIO *f = PerlIO_allocate(aTHX);
|
|
if (!f) {
|
|
return NULL;
|
|
}
|
|
|
|
PerlIO_apply_layers(aTHX_ f, mode, layers);
|
|
|
|
if (f) {
|
|
PerlIOAPR *st = PerlIOSelf(f, PerlIOAPR);
|
|
/* fill in the st struct, as in _open() */
|
|
st->file = file;
|
|
PerlIOBase(f)->flags |= PERLIO_F_OPEN;
|
|
|
|
return f;
|
|
}
|
|
return NULL;
|
|
}
|
|
|
|
=item *
|
|
|
|
fix/add the documentation in places marked as XXX.
|
|
|
|
=item *
|
|
|
|
The handling of errors by the layer is not specified. e.g. when $!
|
|
should be set explicitly, when the error handling should be just
|
|
delegated to the top layer.
|
|
|
|
Probably give some hints on using SETERRNO() or pointers to where they
|
|
can be found.
|
|
|
|
=item *
|
|
|
|
I think it would help to give some concrete examples to make it easier
|
|
to understand the API. Of course I agree that the API has to be
|
|
concise, but since there is no second document that is more of a
|
|
guide, I think that it'd make it easier to start with the doc which is
|
|
an API, but has examples in it in places where things are unclear, to
|
|
a person who is not a PerlIO guru (yet).
|
|
|
|
=back
|
|
|
|
=cut
|