[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are no specific programs or servers associated with the I/O subsystem, since it is used to interact with almost all servers in the GNU Hurd. It provides facilities for reading and writing I/O channels, which are the underlying implementation of file and socket descriptors in the GNU C library.
4.1 Iohelp Library | I/O authentication and lock management. | |
4.2 Pager Library | Implementing multithreaded external pagers. | |
4.3 I/O Interface | RPC-based input/output channels. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The <hurd/iohelp.h>
file declares several functions which are
useful for low-level I/O implementations. Most Hurd servers do not call
these functions directly, but they are used by several of the Hurd
filesystem and networking helper libraries. libiohelp
requires
libthreads
.
4.1.1 I/O Users | User authentication management. | |
4.1.2 Conch Management | Deprecated shared I/O implementation. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Most I/O servers need to implement some kind of user authentication
checking. In order to facilitate that process, libiohelp
has
some functions which encapsulate a set of idvecs (FIXME: xref to C
library) in a single struct iouser
.
Create a new iouser for the specified uids and gids.
Return a copy of iouser.
Release a reference to iouser.
I/O reauthentication is a rather complex protocol involving the
authserver as a trusted third party (see section Auth Protocol). In order
to reduce the risk of flawed implementations, I/O reauthentication is
encapsulated in the iohelp_reauth
function:
Conduct a reauthentication transaction, and return a new iouser. authserver is the I/O server's auth port. The rendezvous port provided by the user is rend_port.
If the transaction cannot be completed, return zero, unless permit_failure is nonzero. If permit_failure is nonzero, then should the transaction fail, return an iouser that has no ids. The new port to be sent to the user is newright.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The conch is at the heart of the shared memory I/O system.
Several Hurd libraries implement shared I/O, and so libiohelp
contains functions to facilitate conch management.
Everything about shared I/O is undocumented because it is not needed for adequate performance, and the RPC interface is simpler (see section I/O Interface). It is not useful for new libraries or servers to implement shared I/O.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The external pager (XP) microkernel interface allows applications to provide the backing store for a memory object, by converting hardware page faults into RPC requests. External pagers are required for memory-mapped I/O (see section Mapped Data) and stored filesystems (see section Stored Filesystems).
The external pager interface is quite complex, so the Hurd pager library
contains functions which aid in creating multithreaded external pagers.
libpager
is declared in <hurd/pager.h>
, and requires only
the threads and ports libraries.
4.2.1 Pager Management | High-level interface to external pagers. | |
4.2.2 Pager Callbacks | Functions that the user must define. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The pager library defines the struct pager
data type in order to
represent a multi-threaded pager. The general procedure for creating a
pager is to define the functions listed in Pager Callbacks,
allocate a libports
bucket for the ports which will access the
pager, and create at least one new struct pager
with
pager_create
.
Create a new pager. The pager will have a port created for it (using
libports
, in bucket) and will be immediately ready to
receive requests. u_pager will be provided to later calls to
pager_find_address
. The pager will have one user reference
created. may_cache and copy_strategy are the original
values of those attributes as for memory_object_ready
. Users may
create references to pagers by use of the relevant ports library
functions. On errors, return null and set errno
.
Once you are ready to turn over control to the pager library, you should
call ports_manage_port_operations_multithread
on the
bucket, using pager_demuxer
as the ports demuxer.
This will handle all external pager RPCs, invoking your pager callbacks
when necessary.
Demultiplex incoming libports
messages on pager ports.
The following functions are the body of the pager library, and provide a clean interface to pager functionality:
Write data from pager pager to its backing store. Wait for all the writes to complete if and only if wait is set.
pager_sync
writes all data; pager_sync_some
only writes
data starting at start, for len bytes.
Flush data from the kernel for pager pager and force any pending delayed copies. Wait for all pages to be flushed if and only if wait is set.
pager_flush
flushes all data; pager_flush_some
only
flushes data starting at start, for len bytes.
Flush data from the kernel for pager pager and force any pending delayed copies. Wait for all pages to be flushed if and only if wait is set. Have the kernel write back modifications.
pager_return
flushes and restores all data;
pager_return_some
only flushes and restores data starting at
start, for len bytes.
Offer a page of data to the kernel. If precious is set, then this page will be paged out at some future point, otherwise it might be dropped by the kernel. If the page is currently in core, the kernel might ignore this call.
attributes
Change the attributes of the memory object underlying pager pager.
The may_cache and copy_strategy arguments are as for
memory_object_change_
. Wait for the kernel to report
completion if and only if wait is set.
Force termination of a pager. After this returns, no more paging requests on the pager will be honoured, and the pager will be deallocated. The actual deallocation might occur asynchronously if there are currently outstanding paging requests that will complete first.
Return the error code of the last page error for pager p at address addr.(4)
Try to copy *size
bytes between the region other
points to and the region at offset in the pager indicated by
pager and memobj. If prot is VM_PROT_READ
,
copying is from the pager to other; if prot contains
VM_PROT_WRITE
, copying is from other into the pager.
*size
is always filled in with the actual number of bytes
successfully copied. Returns an error code if the pager-backed memory
faults; if there is no fault, returns zero and *size
will
be unchanged.
These functions allow you to recover the internal struct pager
state, in case the libpager
interface doesn't provide an
operation you need:
Return the struct user_pager_info
associated with a pager.
Return the port (receive right) for requests to the pager. It is absolutely necessary that a new send right be created from this receive right.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Like several other Hurd libraries, libpager
depends on you to
implement application-specific callback functions. You must
define the following functions:
For pager pager, read one page from offset page. Set
*buf
to be the address of the page, and set
*write_lock
if the page must be provided read-only. The
only permissible error returns are EIO
, EDQUOT
, and
ENOSPC
.
For pager pager, synchronously write one page from buf to
offset page. In addition, vm_deallocate
(or equivalent)
buf. The only permissible error returns are EIO
,
EDQUOT
, and ENOSPC
.
A page should be made writable.
This function should report in *offset
and
*size
the minimum valid address the pager will accept and
the size of the object.
This is called when a pager is being deallocated after all extant send rights have been destroyed.
This will be called when the ports library wants to drop weak references. The pager library creates no weak references itself, so if the user doesn't either, then it is all right for this function to do nothing.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The I/O interface facilities are described in <hurd/io.defs>
.
This section discusses only RPC-based I/O operations.(5)
4.3.1 I/O Object Ports | How ports to I/O objects work. | |
4.3.2 Simple Operations | Read, write, and seek. | |
4.3.3 Open Modes | State bits that affect pieces of operation. | |
4.3.4 Asynchronous I/O | How to be notified when I/O is possible. | |
4.3.5 Information Queries | How to implement io_stat and
io_server_version .
| |
4.3.6 Mapped Data | Getting memory objects referring to the data of an I/O object. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The I/O server must associate each I/O port with a particular set of uids and gids, identifying the user who is responsible for operations on the port. Every port to an I/O server should also support either the file protocol (see section File Interface) or the socket protocol (see section Socket Interface); naked I/O ports are not allowed.
In addition, the server associates with each port a default file pointer, a set of open mode bits, a pid (called the "owner"), and some underlying object which can absorb data (for write) or provide data (for read).
The uid and gid sets associated with a port may not be visibly shared
with other ports, nor may they ever change. The server must fix the
identification of a set of uids and gids with a particular port at the
moment of the port's creation. The other characteristics of an I/O port
may be shared with other users. The I/O server interface does not
generally specify the way in which servers may share these other
characteristics (with the exception of the deprecated
O_ASYNC
interface); however, the file and socket interfaces make
further requirements about what sharing is required and what sharing is prohibited.
In general, users get send rights to I/O ports by some mechanism that is
external to the I/O protocol. (For example, fileservers give out I/O
ports in response to the dir_lookup
and fsys_getroot
calls. Socket servers give out ports in response to the
socket_create
and socket_accept
calls.) However, the I/O
protocol provides methods of obtaining new ports that refer to the same
underlying object as another port. In response to all of these calls,
all underlying state (including, but not limited to, the default file
pointer, open mode bits, and underlying object) must be shared between
the old and new ports. In the following descriptions of these calls,
the term "identical" means this kind of sharing. All these calls must
return send rights to a newly-constructed Mach port.
The io_duplicate
call simply returns another port which is
identical to an existing port and has the same uid and gid set.
The io_restrict_auth
call returns another port, identical to the
provided port, but which has a smaller associated uid and gid set. The
uid and gid sets of the new port are the intersection of the set on the
existing port and the lists of uids and gids provided in the call.
Users use the io_reauthenticate
call when they wish to have an
entirely new set of uids or gids associated with a port. In response to
the io_reauthenticate
call, the server must create a new port,
and then make the call auth_server_authenticate
to the auth
server. The rendezvous port for the auth_server_authenticate
call is the I/O port to which was made the io_reauthenticate
call. The server provides the rend_int parameter to the auth
server as a copy from the corresponding parameter in the
io_reauthenticate
call. The I/O server also gives the auth
server a new port; this must be a newly created port identical to the
old port. The authserver will return the set of uids and gids
associated with the user, and guarantees that the new port will go
directly to the user that possessed the associated authentication port.
The server then identifies the new port given out with the specified
ID's.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Users write to I/O ports by calling the io_write
RPC. They
specify an offset parameter; if the object supports writing at
arbitrary offsets, the server should honour this parameter. If -1
is passed as the offset, then the server should use the default file
pointer. The server should return the amount of data which was
successfully written. If the operation was interrupted after some but
not all of the data was written, then it is considered to have succeeded
and the server should return the amount written. If the port is not an
I/O port at all, the server should reply with the error
EOPNOTSUPP
. If the port is an I/O port, but does not happen to
support writing, then the correct error is EBADF
.
Users read from I/O ports by calling the io_read
RPC. They
specify the amount of data they wish to read, and the offset. The offset
has the same meaning as for io_write
above. The server should
return the data that was read. If the call is interrupted after some
data has been read (and the operation is not idempotent) then the server
should return the amount read, even if it was less than the amount requested.
The server should return as much data as possible, but never more than
requested by the user. If there is no data, but there might be later,
the call should block until data becomes available. The server indicates
end-of-file by returning zero bytes. If the call is
interrupted after some data has been read, but the call is idempotent,
then the server may return EINTR
rather than actually filling the
buffer (taking care that any modifications of the default file pointer
have been reversed). Preferably, however, servers should return data.
There are two categories of objects: seekable and non-seekable.
Seekable objects must accept arbitrary offset parameters in the
io_read
and io_write
calls, and must implement the
io_seek
call. Non-seekable objects must ignore the offset
parameters to io_read
and io_write
, and should return
ESPIPE
to the io_seek
call.
On seekable objects, io_seek
changes the default file pointer for
reads and writes. (See (libc)File Positioning section `File Positioning' in The GNU C Library Reference Manual,
for the interpretation of the whence and offset arguments.)
It returns the new offset as modified by io_seek
.
The io_readable
interface returns the amount of data which can be
immediately read. For the special technical meaning of "immediately",
see Asynchronous I/O.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The server associates each port with a set of bits that affect its
operation. The io_set_all_openmodes
call modifies these bits and
the io_get_openmodes
call returns them. In addition, the
io_set_some_openmodes
and io_clear_some_openmodes
do an
atomic read/modify/write of the openmodes.
The O_APPEND
bit, when set, changes the behaviour of
io_write
when it uses the default file pointer on seekable
objects. When io_write
is done on a port with the
O_APPEND
bit set, is must set the file pointer to the current
file size before doing the write (which would then increment the file
pointer as usual). The current file size is the smallest offset
which returns end-of-file when provided to io_read
. The server
must atomically bind this update to the actual data write with respect
to other users of io_read
, io_write
, and io_seek
.
The O_FSYNC
bit, when set, guarantees that io_write
will
not return until data is fully written to the underlying medium.
The O_NONBLOCK
bit, when set, prevents read and write from
blocking. They should copy such data as is immediately available. If
no data is immediately available they should return EWOULDBLOCK
.
The definition of "immediately" is more or less server-dependent. Some servers, notably stored filesystem servers (see section Stored Filesystems), regard all data as immediately available. The one criterion is that something which must happen immediately may not wait for any user-synchronizable event.
The O_ASYNC
bit is deprecated; its use is documented in the
following section. This bit must be shared between all users of the
same underlying object.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Users may wish to be notified when I/O can be done without blocking;
they use the io_async
call to indicate this to the server. In
the io_async
call the user provides a port on which will the
server should send sig_post
messages as I/O becomes possible.
The server must return a port which will be the reference port in the
sig_post
messages. Each io_async
call should generate a
new reference port. (FIXME: xref the C library manual for information
on how to send sig_post messages.)
The server then sends one SIGIO
signal to each registered async
user everytime I/O becomes possible. I/O is possible if at least one
byte can be read or written immediately. The definition of
"immediately" must be the same as for the implementation of the
O_NONBLOCK
flag (see section Open Modes). In addition, every time a
user calls io_read
or io_write
on a non-seekable object, or at the
default file pointer on a seekable object, another signal should be sent
to each user if I/O is still possible.
Some objects may also define "urgent" conditions. Such servers should
send the SIGURG
signal to each registered async user anytime an
urgent condition appears. After any RPC that has the possibility of
clearing the urgent condition, the server should again send the signal
to all registered users if the urgent condition is still present.
A more fine-grained mechanism for doing async I/O is the
io_select
call. The user specifies the kind of access desired,
and a send-once right. If I/O of the kind the user desires is
immediately possible, then the server should return so indicating, and
destroy the send-once right. If I/O is not immediately possible, the
server should save the send-once right, and send a select_done
message as soon as I/O becomes immediately possible. Again, the
definition of "immediately" must be the same for io_select
,
io_async
, and O_NONBLOCK
(see section Open Modes).
For compatibility with 4.2 and 4.3 BSD, the I/O interface provides a
deprecated feature (known as icky async I/O). The calls
io_mod_owner
and io_get_owner
set the "owner" of the
object, providing either a pid or a pgrp (if the value is negative).
This implies that only one process at a time can do icky I/O on a given
object. Whenever the I/O server is sending sig_post
messages to
all the io_async
users, if the O_ASYNC
bit is set, the
server should also send a signal to the owning pid/pgrp. The ID port
for this call should be different from all the io_async
ID ports
given to users. Users may find out what ID port the server uses for
this by calling io_get_icky_async_id
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Users may call io_stat
to find out information about the I/O
object. Most of the fields of a struct stat
are meaningful only
for files. All objects, however, must support the fields
st_fstype, st_fsid, st_ino, st_atime,
st_atime_usec, st_mtime_user, st_ctime,
st_ctime_usec, and st_blksize.
st_fstype, st_fsid, and st_ino must be unique for the underlying object across the entire system.
st_atime and st_atime_usec hold the seconds and
microseconds, respectively, of the system clock at the last time the
object was read with io_read
.
st_mtime and st_mtime_usec hold the seconds and microseconds,
respectively, of the system clock at the last time the object was
written with io_write
.
Other appropriate operations may update the atime and the mtime as well; both the file and socket interfaces specify such operations.
st_ctime and st_ctime_usec hold the seconds and microseconds, respectively, of the system clock at the last time permanent meta-data associated with the object was changed. The exact operations which cause such an update are server-dependent, but must include the creation of the object.
The server is permitted to delay the actual update of these times until stat is called; before the server stores the times on permanent media (if it ever does so) it should update them if necessary.
st_blksize gives the optimal I/O size in bytes for io_read
and io_write
; users should endeavor to read and write amounts
which are multiples of the optimal size, and to use offsets which are
multiples of the optimal size.
In addition, objects which are seekable should set st_size to the
current file size as in the description of the O_APPEND
flag
(see section Open Modes).
The st_uid and st_gid fields are unrelated to the "owner" as described above for icky async I/O.
Users may find out the version of the server they are talking to by
calling io_server_version
; this should return strings and
integers describing the version number of the server, as well as its
name.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Servers may optionally implement the io_map
call. The ports
returned by io_map
must implement the external pager kernel
interface (see section Pager Library) and be suitable as arguments to
vm_map
.
Seekable objects must allow access from zero up to (but not including)
the current file size as described for O_APPEND
(see section Open Modes). Whether they provide access beyond such a point is
server-dependent; in addition, the meaning of accessing a non-seekable
object is server-dependent.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Thomas Schwinge on November, 8 2007 using texi2html 1.76.