[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Stored filesystems allow users to save and load persistent data from any random-access storage media, such as hard disks, floppy diskettes, and CD-ROMs. Stored filesystems are required for bootstrapping standalone workstations, as well.
8.1 Repairing Filesystems | Recovering from minor filesystem crashes. | |
8.2 Linux Extended 2 FS | The popular Linux filesystem format. | |
8.3 BSD Unix FS | The BSD Unix 4.x Fast File System. | |
8.4 ISO-9660 CD-ROM FS | Standard CD-ROM format. | |
8.5 Diskfs Library | Implementing new filesystem servers. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
FIXME: finish
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
FIXME: finish
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
FIXME: finish
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
FIXME: finish
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The diskfs library is declared in <hurd/diskfs.h>
, and does a lot
of the work of implementing stored filesystems. libdiskfs
requires the threads, ports, iohelp, fshelp, and store libraries. You
should understand all these libraries before you attempt to use diskfs,
and you should also be familiar with the pager library (see section Pager Library).
For historical reasons, the library for implementing stored filesystems
is called libdiskfs
instead of libstorefs
. Keep in mind,
however, that diskfs is useful for filesystems which are implemented on
any block-addressed storage device, since it uses the store library to
do I/O.
Note that stored filesystems can be tricky to implement, since the diskfs callback interfaces are not trivial. It really is best if you examine the source code of a similar existing filesystem server, and follow its example rather than trying to write your own from scratch.
8.5.1 Diskfs Startup | Initializing stored filesystems. | |
8.5.2 Diskfs Arguments | Parsing command-line arguments. | |
8.5.3 Diskfs Globals | Global behaviour modification. | |
8.5.4 Diskfs Node Management | Allocation, reference counting, I/O, caching, and other disk node routines. | |
8.5.5 Diskfs Callbacks | Mandatory user-defined diskfs functions. | |
8.5.6 Diskfs Options | Optional user-defined diskfs functions. | |
8.5.7 Diskfs Internals | Reimplementing small pieces of diskfs. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This subsection gives an outline of the general steps involved in implementing a filesystem server, to help refresh your memory and to offer explanations rather than to serve as a tutorial.
The first thing a filesystem server should do is parse its command-line arguments (see section Diskfs Arguments). Then, the standard output and error streams should be redirected to the console, so that error messages are not lost if this is the bootstrap filesystem:
Redirect error messages to the console, so that they can be seen by users.
The following is a list of the relevant functions which would be called during the rest of the server initialization. Again, you should refer to the implementation of an already-working filesystem if you have any questions about how these functions should be used:
Call this function after arguments have been parsed to initialize the library. You must call this before calling any other diskfs functions, and after parsing diskfs options.
Call this after all format-specific initialization is done (except for
setting diskfs_root_node
); at this point the pagers should be
ready to go.
Call this once the filesystem is fully initialized, to advertise the new
filesystem control port to our parent filesystem. If bootstrap is set,
diskfs will call fsys_startup
on that port as appropriate and return
the realnode from that call; otherwise we call
diskfs_start_bootstrap
and return MACH_PORT_NULL
.
flags specifies how to open realnode (from the O_*
set).
You should not need to call the following function directly, since
diskfs_startup_diskfs
will do it for you, when appropriate:
Start the Hurd bootstrap sequence as if we were the bootstrap filesystem
(that is, diskfs_boot_flags
is nonzero). All filesystem
initialization must be complete before you call this function.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following functions implement standard diskfs command-line and runtime argument parsing, using argp (see (libc)Argp section `Argp' in The GNU C Library Reference Manual):
Parse and execute the runtime options specified by argz and
argz_len. EINVAL
is returned if some option is
unrecognized. The default definition of this routine will parse them
using diskfs_runtime_argp
.
Append to the malloced string *argz
of length
*argz_len
a NUL-separated list of the arguments to this
translator. The default definition of this routine simply calls
diskfs_append_std_options
.
Appends NUL-separated options describing the standard diskfs
option state to argz and increments argz_len appropriately.
Note that unlike diskfs_get_options
, argz and
argz_len must already have sane values.
If this is defined or set to an argp structure, it will be used by the
default diskfs_set_options
to handle runtime option parsing. The
default definition is initialized to a pointer to
diskfs_std_runtime_argp
.
An argp for the standard diskfs runtime options. The default definition
of diskfs_runtime_argp
points to this, although the user can
redefine that to chain this onto his own argp.
An argp structure for the standard diskfs command line arguments. The
user may call argp_parse
on this to parse the command line, chain
it onto the end of his own argp structure, or ignore it completely.
An argp structure for the standard diskfs command line arguments plus a
store specification. The address of a location in which to return the
resulting struct store_parsed
structure should be passed as the
input argument to argp_parse
; FIXME xref the declaration for
STORE_ARGP.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following functions and variables control the overall behaviour of the library. Your callback functions may need to refer to these, but you should not need to modify or redefine them.
These are the respective send rights to the default pager, execserver control port, execserver itself, and authserver.
The io_identity
identity port for the filesystem.
The command line with which diskfs was started, set by the default argument parser. If you don't use it, set this yourself. This is only used for bootstrap file systems, to give the procserver.
When this is a bootstrap filesystem, the command line options passed from the kernel. If not a bootstrap filesystem, it is zero, so it can be used to distinguish between the two cases.
Hold this lock while doing filesystem-level operations. Innocuous users can just hold a reader lock, but operations that might corrupt other threads should hold a writer lock.
The current system time, as used by the diskfs routines. This is
converted into a struct timeval
by the maptime_read
C library function (FIXME xref).
True if and only if we should do every operation synchronously. It is the format-specific code's responsibility to keep allocation information permanently in sync if this is set; the rest will be done by format-independent code.
Establish a thread to sync the filesystem every interval seconds, or never, if interval is zero. If an error occurs creating the thread, it is returned, otherwise zero. Subsequent calls will create a new thread and (eventually) get rid of the old one; the old thread won't do any more syncs, regardless.
Pager reference count lock.
Set to zero if the filesystem is currently writable.
Change an active filesystem between read-only and writable modes, setting the global variable diskfs_readonly to reflect the current mode. If an error is returned, nothing will have changed. diskfs_fsys_lock should be held while calling this routine.
Check if the filesystem is readonly before an operation that writes it. Return nonzero if readonly, otherwise zero.
Reread all in-core data structures from disk. This function can only be successful if diskfs_readonly is true. diskfs_fsys_lock should be held while calling this routine.
Shutdown the filesystem; flags are as for fsys_shutdown
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Every file or directory is a diskfs node. The following functions help your diskfs callbacks manage nodes and their references:
Node np now has no more references; clean all state. The diskfs_node_refcnt_lock must be held, and will be released upon return. np must be locked.
Set disk fields from np->dn_stat
; update ctime, atime, and mtime
if necessary. If wait is true, then return only after the
physical media has been completely updated.
Add a hard reference to node np. If there were no hard references previously, then the node cannot be locked (because you must hold a hard reference to hold the lock).
Unlock node np and release a hard reference; if this is the last hard reference and there are no links to the file then request light references to be dropped.
Release a hard reference on np. If np is locked by anyone, then this cannot be the last hard reference (because you must hold a hard reference in order to hold the lock). If this is the last hard reference and there are no links, then request light references to be dropped.
Add a light reference to a node.
Unlock node np and release a light reference.
Release a light reference on np. If np is locked by anyone, then this cannot be the last reference (because you must hold a hard reference in order to hold the lock).
This is called by other filesystem routines to read or write files, and
extends them automatically, if necessary. np is the node to be
read or written, and must be locked. data will be written or
filled. off identifies where in the file the I/O is to take place
(negative values are not allowed). amt is the size of data
and tells how much to copy. dir is zero for reading or nonzero
for writing. cred is the user doing the access (only used to
validate attempted file extension). For reads, *amtread
is
filled with the amount actually read.
Send notifications to users who have requested them for directory
dp with dir_notice_changes
. The type of modification and
affected name are type and name respectively. This should
be called by diskfs_direnter
, diskfs_dirremove
,
diskfs_dirrewrite
, and anything else that changes the directory,
after the change is fully completed.
Create a new node structure with ds as its physical disknode. The new node will have one hard reference and no light references.
These next node manipulation functions are not generally useful, but may come in handy if you need to redefine any diskfs functions.
Create a new node. Give it mode: if mode includes
IFDIR
, also initialize `.' and `..' in the new
directory. Return the node in npp. cred identifies the
user responsible for the call. If name is nonzero, then link the
new node into dir with name name; ds is the result of
a prior diskfs_lookup
for creation (and dir has been held
locked since). dir must always be provided as at least a hint for
disk allocation strategies.
If disk is not readonly and the noatime option is not enabled, set
np->dn_set_atime
.
If np->dn_set_ctime
is set, then modify
np->dn_stat.st_ctime
appropriately; do the analogous
operations for atime and mtime as well.
Scan the cache looking for name inside dir. If we don't know any entries at all, then return zero. If the entry is confirmed to not exist, then return -1. Otherwise, return np for the entry, with a newly-allocated reference.
Return the node corresponding to cache_id in *npp
.
Node np has just been found in dir with name. If np is null, that means that this name has been confirmed as absent in the directory.
Purge all references in the cache to np as a node inside directory dp.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Like several other Hurd libraries, libdiskfs
depends on you to
implement application-specific callback functions. You must
define the following functions and variables, but you should also look
at Diskfs Options, as there are several defaults which should be
modified to provide good filesystem support:
You must define this type, which will hold information between a call to
diskfs_lookup
and a call to one of diskfs_direnter
,
diskfs_dirremove
, or diskfs_dirrewrite
. It must contain
enough information so that those calls work as described below.
This must be the size in bytes of a struct dirstat
.
This is the maximum number of links to any one file, which must be a
positive integer. The implementation of dir_rename
does not know
how to succeed if this is only one allowed link; on such formats you
need to reimplement dir_rename
yourself.
This variable is a positive integer which is the maximum number of
symbolic links which can be traversed within a single call to
dir_lookup
. If this is exceeded, dir_lookup
will
return ELOOP
.
Set this to be the node of the root of the filesystem.
Set this to the name of the filesystem server.
Set this to be the server version string.
This should be a string that somehow identifies the particular disk this filesystem is interpreting. It is generally only used to print messages or to distinguish instances of the same filesystem type from one another. If this filesystem accesses no external media, then define this to be zero.
Set *statfsbuf
with appropriate values to reflect the
current state of the filesystem.
You should not define diskfs_lookup
, because it is simply a
wrapper for diskfs_lookup_hard
, and is already defined in
libdiskfs
.
Lookup in directory dp (which is locked) the name name.
type will either be LOOKUP
, CREATE
, RENAME
,
or REMOVE
. cred identifies the user making the call.
If the name is found, return zero, and (if np is nonzero) set
*np
to point to the node for it, which should be locked.
If the name is not found, return ENOENT
, and (if np is
nonzero) set *np
to zero. If np is zero, then the
node found must not be locked, not even transitorily. Lookups for
REMOVE
and RENAME
(which must often check permissions on
the node being found) will always set np.
If ds is nonzero then the behaviour varies depending on the requested lookup type:
LOOKUP
Set *ds
to be ignored by diskfs_drop_dirstat
CREATE
On success, set *ds
to be ignored by
diskfs_drop_dirstat
.
On failure, set *ds
for a future call to
diskfs_direnter
.
RENAME
On success, set *ds
for a future call to
diskfs_dirrewrite
.
On failure, set *ds
for a future call to
diskfs_direnter
.
REMOVE
On success, set *ds
for a future call to
diskfs_dirremove
.
On failure, set *ds
to be ignored by
diskfs_drop_dirstat
.
The caller of this function guarantees that if ds is nonzero, then
either the appropriate call listed above or diskfs_drop_dirstat
will be called with ds before the directory dp is unlocked,
and guarantees that no lookup calls will be made on this directory
between this lookup and the use (or destruction) of *DS.
If you use the library's versions of diskfs_rename_dir
,
diskfs_clear_directory
, and diskfs_init_dir
, then lookups
for `..' might have the flag SPEC_DOTDOT
ORed in. This has a
special meaning depending on the requested lookup type:
LOOKUP
dp should be unlocked and its reference dropped before returning.
CREATE
Ignore this case, because SPEC_DOTDOT
is guaranteed not to be
given.
RENAME
REMOVE
In both of these cases, the node being found (*np
) is
already held locked, so don't lock it or add a reference to it.
Return ENOENT
if name isn't in the directory. Return
EAGAIN
if name refers to the `..' of this filesystem's
root. Return EIO
if appropriate.
You should not define diskfs_direnter
, because it is simply a
wrapper for diskfs_direnter_hard
, and is already defined in
libdiskfs
.
Add np to directory dp under the name name. This will
only be called after an unsuccessful call to diskfs_lookup
of type
CREATE
or RENAME
; dp has been locked continuously
since that call and ds is as that call set it, np is locked.
cred identifies the user responsible for the call (to be used only
to validate directory growth).
You should not define diskfs_dirrewrite
, because it is simply a
wrapper for diskfs_dirrewrite_hard
, and is already defined in
libdiskfs
.
This will only be called after a successful call to diskfs_lookup
of type RENAME
; this call should change the name found in
directory dp to point to node np instead of its previous
referent. dp has been locked continuously since the call to
diskfs_lookup
and ds is as that call set it; np is
locked.
diskfs_dirrewrite
has some additional specifications: name
is the name within dp which used to correspond to the previous
referent, oldnp; it is this reference which is being rewritten.
diskfs_dirrewrite
also calls diskfs_notice_dirchange
if
dp->dirmod_reqs
is nonzero.
You should not define diskfs_dirremove
, because it is simply a
wrapper for diskfs_dirremove_hard
, and is already defined in
libdiskfs
.
This will only be called after a successful call to diskfs_lookup
of type REMOVE
; this call should remove the name found from the
directory ds. dp has been locked continuously since the
call to diskfs_lookup
and ds is as that call set it.
diskfs_dirremove
has some additional specifications: this routine
should call diskfs_notice_dirchange
if
dp->dirmod_reqs
is nonzero. The entry being removed has
name name and refers to np.
ds has been set by a previous call to diskfs_lookup
on
directory dp; this function is guaranteed to be called if
diskfs_direnter
, diskfs_dirrewrite
, and
diskfs_dirremove
have not been called, and should free any state
retained by a struct dirstat
. dp has been locked
continuously since the call to diskfs_lookup
.
Initialize ds such that diskfs_drop_dirstat
will ignore it.
Return n directory entries starting at entry from locked
directory node dp. Fill *data
with the entries;
which currently points to *datacnt
bytes. If it isn't big
enough, vm_allocate
into *data
. Set
*datacnt
with the total size used. Fill amt with the
number of entries copied. Regardless, never copy more than bufsiz
bytes. If bufsiz is zero, then there is no limit on
*datacnt
; if n is -1, then there is no limit on
amt.
Return nonzero if locked directory dp is empty. If the user has
not redefined diskfs_clear_directory
and
diskfs_init_directory
, then `empty' means `only possesses entries
labelled `.' and `..'. cred identifies the user making
the call… if this user cannot search the directory, then this
routine should fail.
For locked node np (for which diskfs_node_translated
is
true) look up the name of its translator. Store the name into newly
malloced storage and set *namelen
to the total length.
For locked node np, set the name of the translating program to be name, which is namelen bytes long. cred identifies the user responsible for the call.
Truncate locked node np to be size bytes long. If np
is already less than or equal to size bytes long, do nothing. If
this is a symlink (and diskfs_shortcut_symlink
is set) then this
should clear the symlink, even if diskfs_create_symlink_hook
stores the link target elsewhere.
Grow the disk allocated to locked node np to be at least
size bytes, and set np->allocsize
to the actual
allocated size. If the allocated size is already size bytes, do
nothing. cred identifies the user responsible for the call.
This function must reread all data specific to node from disk, without writing anything. It is always called with diskfs_readonly set to true.
This function must invalidate all cached global state, and reread it as
necessary from disk, without writing anything. It is always called with
diskfs_readonly set to true. diskfs_node_reload
is
subsequently called on all active nodes, so this call doesn't need to
reread any node-specific data.
For each active node np, call fun. The node is to be locked around the call to fun. If fun returns nonzero for any node, then stop immediately, and return that value.
Allocate a new node to be of mode mode in locked directory
dp, but don't actually set the mode or modify the directory, since
that will be done by the caller. The user responsible for the request
can be identified with cred. Set *np
to be the newly
allocated node.
Free node np; the on-disk copy has already been synchronized with
diskfs_node_update
(where np->dn_stat.st_mode
was
zero). np's mode used to be mode.
Locked node np has some light references but has just lost its last hard reference.
Locked node np has just acquired a hard reference where it had none previously. Therefore, it is okay again to have light references without real users.
Node np has some light references, but has just lost its last hard
references. Take steps so that if any light references can be freed,
they are. Both diskfs_node_refcnt_lock and np are locked.
This function will be called after diskfs_lost_hardrefs
.
Node np has no more references; free local state, including
*np
if it shouldn't be retained.
diskfs_node_refcnt_lock is held.
Write any non-paged metadata from format-specific buffers to disk, asynchronously unless wait is nonzero. If clean is nonzero, then after this is written the filesystem will be absolutely clean, and it must be possible for the non-paged metadata to indicate that fact.
Write the information in np->dn_stat
and any associated
format-specific information to the disk. If wait is true, then
return only after the physical media has been completely updated.
Write the contents and all associated metadata of file NP to disk.
Generally, this will involve calling diskfs_node_update
for much
of the metadata. If wait is true, then return only after the
physical media has been completely updated.
Return a memory object port (send right) for the file contents of
np. prot is the maximum allowable access. On errors,
return MACH_PORT_NULL
and set errno
.
Return a struct pager *
that refers to the pager returned by
diskfs_get_filemap for locked node NP, suitable for use as an argument
to pager_memcpy
.
Return the bitwise OR of the maximum prot
parameter (the second
argument to diskfs_get_filemap
) for all active user pagers.
Return nonzero if there are pager ports exported that might be in use by users. Further pager creation should be blocked before this function returns zero.
Sync all the pagers and write any data belonging on disk except for the hypermetadata. If wait is true, then return only after the physical media has been completely updated.
Shut down all pagers. This is irreversible, and is done when the filesystem is exiting.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The functions and variables described in this subsection already have
default definitions in libdiskfs
, so you are not forced to define
them; rather, they may be redefined on a case-by-case basis.
You should set the values of any option variables as soon as your program starts (before you make any calls to diskfs, such as argument parsing).
You should set this variable to nonzero if the filesystem media can never be made writable.
Set this to be any additional version specification that should be printed for -version.
This should be nonzero if and only if the filesystem format supports
shortcutting symbolic link translation. The library guarantees that
users will not be able to read or write the contents of the node
directly, and the library will only do so if the symlink hook functions
(diskfs_create_symlink_hook
and diskfs_read_symlink_hook
)
return EINVAL
or are not defined. The library knows that the
dn_stat.st_size
field is the length of the symlink, even if the
hook functions are used.
These variables should be nonzero if and only if the filesystem format supports shortcutting character device node, block device node, FIFO, or Unix-domain socket translation, respectively.
diskfs_set_sync_interval
is called with this value when the first
diskfs thread is started up (in diskfs_spawn_first_thread
). This
variable has a default default value of 30, which causes disk buffers to
be flushed at least every 30 seconds.
Return zero if for the node np can be changed as requested. That is, if np's mode can be changed to mode, owner to uid, group to gid, author to author, flags to flags, or raw device number to rdev, respectively. Otherwise, return an error code.
It must always be possible to clear the mode or the flags; diskfs will not ask for permission before doing so.
This is called when the disk has been changed from read-only to read-write mode or vice-versa. readonly is the new state (which is also reflected in diskfs_readonly). This function is also called during initial startup if the filesystem is to be writable.
If this function pointer is nonzero (and diskfs_shortcut_symlink
is set) it is called to set a symlink. If it returns EINVAL
or
isn't set, then the normal method (writing the contents into the file
data) is used. If it returns any other error, it is returned to the
user.
If this function pointer is nonzero (and diskfs_shortcut_symlink
is set) it is called to read the contents of a symlink. If it returns
EINVAL
or isn't set, then the normal method (reading from the
file data) is used. If it returns any other error, it is returned to
the user.
Rename directory node fnp (whose parent is fdp, and which has name fromname in that directory) to have name toname inside directory tdp. None of these nodes are locked, and none should be locked upon return. This routine is serialized, so it doesn't have to be reentrant. Directories will never be renamed except by this routine. fromcred is the user responsible for fdp and fnp. tocred is the user responsible for tdp. This routine assumes the usual convention where `.' and `..' are represented by ordinary links; if that is not true for your format, you have to redefine this function.
Clear the `.' and `..' entries from directory dp. Its parent is pdp, and the user responsible for this is identified by cred. Both directories must be locked. This routine assumes the usual convention where `.' and `..' are represented by ordinary links; if that is not true for your format, you have to redefine this function.
Locked node dp is a new directory; add whatever links are
necessary to give it structure; its parent is the (locked) node
pdp. This routine may not call diskfs_lookup
on pdp.
The new directory must be clear within the meaning of
diskfs_dirempty
. This routine assumes the usual convention where
`.' and `..' are represented by ordinary links; if that is not
true for your format, you have to redefine this function. cred
identifies the user making the call.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The library also exports the following functions, but they are not generally useful unless you are redefining other functions the library provides.
Create and return a protid for an existing peropen po in
cred, referring to user user. The node po->np
must be locked.
Build and return in cred a protid which has no user
identification, for peropen po. The node po->np
must
be locked.
Finish building protid cred started with diskfs_start_protid
;
the user to install is user.
Called when a protid cred has no more references. Because references to protids are maintained by the port management library, this is installed in the clean routines list. The ports library will free the structure.
Create and return a new peropen structure on node np with open
flags flags. The initial values for the root_parent
,
shadow_root
, and shadow_root_parent
fields are copied from
context if it is nonzero, otherwise each of these values are
set to zero.
Decrement the reference count on po.
This function is called by S_fsys_startup
for execserver
bootstrap. The execserver is able to function without a real node,
hence this fraud. Arguments are as for fsys_startup
in
<hurd/fsys.defs>
.
Demultiplex incoming libports
messages on diskfs ports.
The diskfs library also provides functions to demultiplex the fs, io,
fsys, interrupt, and notify interfaces. All the server routines have
the prefix diskfs_S_
. For those routines, in
arguments of
type file_t
or io_t
appear as struct protid *
to
the stub.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Thomas Schwinge on November, 8 2007 using texi2html 1.76.