We offer a wide range of possible projects to choose from. If you have an idea not listed here, we'd love to hear about it!
In either case, we encourage you to contact us (on IRC and/or our developer mailing lists), so we can discuss your idea, or help you pick a suitable task -- we will gladly explain the tasks in more detail, if the descriptions are not clear enough.
In fact, we suggest you discuss your choice with us even if you have no trouble finding a task that suits you: as explained in the introduction section of the student application form, we ask all students to get into regular communication with us for the application to be considered complete. Talking about your project choice is a good start
(We strongly suggest that you generally take a look at the student application form right now -- the sooner you know what we expect, the better you can cater to it )
Many of the project descriptions suggest some "exercise". The reason is that for the application to be complete, we require you to make a change to the Hurd code, and send us the resulting patch. (This is also explained in the student application form.) If possible, the change should make some improvement to the code you will be working on during the summer, or to some related code.
The "exercise" bit in the project description is trying to give you some ideas what kind of change this could be. In most cases it is quite obvious, though: Try to find something to improve in the relevant code, by looking at known issues in the Savannah bug tracker; by running the code and testing stuff; and by looking through the code. If you don't find anything, try with some related code -- if you task involves translator programming, make some improvement to an existing translator; if it involves glibc hacking, make an improvement to glibc; if it involves driver hacking, make an improvement to the driver framework; and so on... Makes sense, doesn't it?
Sometimes it's hard to come up with a useful improvement to the code in question, that isn't too complicated for the purposes of the application. In this case, we need to find a good alternative. You could for example make an improvement to some Hurd code that is not directly related to your project: this way you won't get familiar with working on the code you will actually need for the task, but at least you can show that you are able to work with the Hurd code in general.
Another possible alternative would be making some change to the code in question, that isn't really a useful improvement, while still making sense in some way -- this could suffice to prove that you are able to work with the code.
Don't despair if you can't come up with anything suitable by yourself. Contact us, and we will think of something together
In either case, we strongly suggest that you talk to us about the change you want to make up front, to be sure that it is something that will get our approval -- especially if the idea is not directly taken from the project description.
Also, don't let this whole patch stuff discourage you from applying! As explained in the student application form, it's not a problem if you do not yet have all the necessary knowledge to do this alone -- we don't expect that. After all, the purpose of GSoC is to introduce you to free software development We only want to see that you are able to obtain the necessary knowledge before the end of the application process, with our help -- contact us, and we will assist you as well as we can.
Here is a list of project ideas, followed by all project ideas inlined.
Virtualization Using Hurd Mechanisms
Hurdish Package Manager for the GNU System, GNU Guix
Bindings to Other Programming Languages
Allow Using unionfs Early at Boot
Use Internet Protocol Translators (ftpfs etc.) as Backends for Other Programs
Fixing Programs Using PATH_MAX et al Unconditionally
Stub Implementations of Hardware Specific Libraries
Improving Perl or Python Support
Fix Compatibility Problems Exposed by Testsuites, Implement Missing Interfaces in glibc for GNU Hurd
Additional ideas have been posted in id:"[email protected]"
, but have not yet been integrated
here and elsewhere. Keywords: bootstrap-vz, buildbot, ceph, clang,
cloud, continuous integration, debian, eudyptula challenge, gcc front
end, gdb, grub, guile, learning system, llvm, lttng, rump
kernels, samba, sbcl, smbfs, steel bank common lisp, subhurd,
systemtap, teaching system, tracing, virtio, x.org, xen, xorg.
As well as any other ideas you might have, these are likewise applicable for
projects.
All project ideas inlined:
The main idea behind the Hurd design is to allow users to replace almost any system functionality (extensible system). Any user can easily create a subenvironment using some custom servers instead of the default system servers. This can be seen as an advanced lightweight virtualization mechanism, which allows implementing all kinds of standard and nonstandard virtualization scenarios.
However, though the basic mechanisms are there, currently it's not easy to make use of these possibilities, because we lack tools to automatically launch the desired constellations.
The goal is to create a set of powerful tools for managing at least one desirable virtualization scenario. One possible starting point could be the subhurd/neighborhurd mechanism, which allows a second almost totally independent instance of the Hurd in parallel to the main one.
While subhurd allow creating a complete second system instance, with an own set of Hurd servers and UNIX daemons and all, there are also situations where it is desirable to have a smaller subenvironment, living within the main system and using most of its facilities -- similar to a chroot environment. A simple way to create such a subenvironment with a single command would be very helpful.
It might be possible to implement (perhaps as a prototype) a wrapper using existing tools (chroot and unionfs); or it might require more specific tools, like some kind of unionfs-like filesystem proxy that mirrors other parts of the filesystem, but allows overriding individual locations, in conjunction with either chroot or some similar mechanism to create a subenvironment with a different root filesystem.
It's also desirable to have a mechanism allowing a user to set up such a custom environment in a way that it will automatically get launched on login -- practically allowing the user to run a customized operating system in his own account.
Yet another interesting scenario would be a subenvironment -- using some kind of special filesystem proxy again -- in which the user serves as root, being able to create local sub-users and/or sub-groups.
This would allow the user to run "dangerous" applications (webbrowser, chat client etc.) in a confined fashion, allowing it access to only a subset of the user's files and other resources. (This could be done either using a lot of groups for individual resources, and lots of users for individual applications; adding a user to a group would give the corresponding application access to the corresponding resource -- an advanced ?ACL mechanism. Or leave out the groups, assigning the resources to users instead, and use the Hurd's ability for a process to have multiple user IDs, to equip individual applications with sets of user IDs giving them access to the necessary resources -- basically a capability mechanism.)
The student will have to pick (at least) one of the described scenarios -- or come up with some other one in a similar spirit -- and implement all the tools (scripts, translators) necessary to make it available to users in an easy-to-use fashion. While the Hurd by default already offers the necessary mechanisms for that, these are not perfect and could be further refined for even better virtualization capabilities. Should need or desire for specific improvements in that regard come up in the course of this project, implementing these improvements can be considered part of the task.
Completing this project will require gaining a very good understanding of the Hurd architecture and spirit. Previous experience with other virtualization solutions would be very helpful.
Possible mentors: Justus Winter (teythoon)
See also: https://fosdem.org/2017/schedule/event/microkernel_virtualization_on_hurd/
Exercise: Currently, when issuing 'reboot' in Subhurds, 'boot' exits. Make it reboot the Subhurd instead.
As the Hurd attempts to be (almost) fully UNIX-compatible, it also implements a
chroot()
system call. However, the current implementation is not really
good, as it allows easily escaping the chroot
, for example by use of
passive translators.
Many solutions have been suggested for this problem -- ranging from simple
workaround changing the behavior of passive translators in a chroot
;
changing the context in which passive translators are executed; changing the
interpretation of filenames in a chroot; to reworking the whole passive
translator mechanism. Some involving a completely different approach to
chroot
implementation, using a proxy instead of a special system call in the
filesystem servers.
See http://tri-ceps.blogspot.com/2007/07/theory-of-filesystem-relativity.html for some suggestions, as well as the followup discussions on http://lists.gnu.org/archive/html/gnu-system-discuss/2007-09/msg00118.html and http://lists.gnu.org/archive/html/bug-hurd/2008-03/msg00089.html.
The task is to pick and implement one approach for fixing chroot.
This task is pretty heavy: it requires a very good understanding of file name lookup and the translator mechanism, as well as of security concerns in general -- the student must prove that he really understands security implications of the UNIX namespace approach, and how they are affected by the introduction of new mechanisms. (Translators.) More important than the actual code is the documentation of what he did: he must be able to defend why he chose a certain approach, and explain why he believes this approach really secure.
Possible mentors: Justus Winter (teythoon)
Exercise: It's hard to come up with a relevant exercise, as there are so many possible solutions... Probably best to make an improvement to one of the existing translators -- if possible, something touching name resolution or and such, e.g. implementing file_reparent() in a translator that doesn't support it yet.
2016-02-14, Justus Winter
I have factored out the proxying-bits from fakeroot so that it can be shared.
The most simple chrooting translator is the identity translator, which proxies
RPCs without really modifying them. Combining the identity translator with
settrans --chroot
gives us chroot(8)
. With a little more work, I believe
that can be used to implement chroot(2)
. Whether or not that is secure
remains to be seen, maybe that is even an ill-conceived goal.
Most GNU/Linux systems use pretty sophisticated package managers, to ease the management of installed software. These keep track of all installed files, and various kinds of other necessary information, in special databases. On package installation, deinstallation, and upgrade, scripts are used that make all kinds of modifications to other parts of the system, making sure the packages get properly integrated.
This approach creates various problems. For one, all management has to be done with the distribution package management tools, or otherwise they would loose track of the system state. This is reinforced by the fact that the state information is stored in special databases, that only the special package management tools can work with.
Also, as changes to various parts of the system are made on certain events (installation/deinstallation/update), managing the various possible state transitions becomes very complex and bug-prone.
For the official (Hurd-based) GNU system, a different approach is intended: making use of Hurd translators -- more specifically their ability to present existing data in a different form -- the whole system state will be created on the fly, directly from the information provided by the individual packages. The visible system state is always a reflection of the sum of packages installed at a certain moment; it doesn't matter how this state came about. There are no global databases of any kind. (Some things might require caching for better performance, but this must happen transparently.)
The core of this approach is formed by stowfs. GNU Guix, GNU's package manager, installs each package in its own directory. Each user has a profile, which is the union of some of these packages. On GNU/Linux, this union is implemented as a symlink tree; on GNU/Hurd, stowfs would offer a more elegant solution. Stowfs creates a traditional Unix directory structure from all the files in the individual package directories. This handles the lowest level of package management.
The goal of this task is to exploit Hurd features in GNU Guix.
See also: Porting Guix to GNU/Hurd.
Possible mentors: Justus Winter (teythoon), Ludovic Courtès
Exercise: Make some improvement to any of the existing Hurd translators. Especially those in hurdextras are often quite rudimentary, and it shouldn't be hard to find something to improve.
The Hurd presently uses hardware drivers implemented in the microkernel, GNU Mach. These drivers are old Linux drivers (mostly from 2.0.x) accessed through a glue code layer. This is not an ideal solution, but works quite OK, except that the drivers are extremely old by now. Thus we need a new framework, so we can use drivers from current Linux versions instead, or perhaps from one of the free BSD variants.
This is GNU Savannah task #5488. user-space device drivers. device drivers and io systems.
The most promising approach for getting newer drivers seems to be the ?Rump Kernel: it already does the hard work of providing an environment where the foreign drivers can run, and offers the additional benefit of being externally maintained. Rump also offers the necessary facilities for running all drivers in separate userspace processes, which is more desirable than drivers running in the microkernel.
Robert Millan worked on a port of the Rump kernel, which allowed to run a sound driver in userland. This work now needs to be extended.
Zheng Da has already done considerable work on a similar approach, using DDE The basic framework for using DDE in the Hurd is present, and network card drivers are already working very well. However, this work isn't fully integrated in the Hurd yet. The additional kernel interfaces that were created for this are still prototypes, and will need to be reworked. This environment can be reused and polished for Rump.
Other types of drivers are missing so far. Support for IDE drivers has been partially implemented, but isn't fully working yet. To fully replace the old in-kernel drivers, further infrastructure will be necessary to make userspace disk drivers usable for the root filesystem.
The goal of this task is to fix at least one of the mentioned major shortcomings: rework the kernel interfaces; polish the rumpkernel changes; componentize the rumpkernel elements for sound; or implement support for some other subsystem.
This is a doable, but pretty involved project. Previous experience with driver programming probably is a must. To be able to work on the framework, the student will also have to get a good understanding of certain aspects of Hurd, such as memory management for example.
Possible mentors: Justus Winter (teythoon), Samuel Thibault (youpi)
Exercise: Install and run the current rumpkernel library (librump0) and the corresponding mplayer, get it to run.
The main idea of the Hurd design is giving users the ability to easily modify/extend the system's functionality (extensible system). This is done by creating filesystem translators and other kinds of Hurd servers.
However, in practice this is not as easy as it should, because creating translators and other servers is quite involved -- the interfaces for doing that are not exactly simple, and available only for C programs. Being able to easily create simple translators in RAD languages is highly desirable, to really be able to reap the advantages of the Hurd architecture.
Originally Lisp was meant to be the second system language besides C in the GNU system; but that doesn't mean we are bound to Lisp. Bindings for any popular high-level language, that helps quickly creating simple programs, are highly welcome.
Several approaches are possible when creating such bindings. One way is simply to provide wrappers to all the available C libraries (libtrivfs, libnetfs etc.). While this is easy (it requires relatively little consideration), it may not be the optimal solution. It is preferable to hook in at a lower level, thus being able to create interfaces that are specially adapted to make good use of the features available in the respective language.
These more specialized bindings could hook in at some of the lower level library interfaces (libports, glibc, etc.); use the MIG-provided RPC stubs directly; or even create native stubs directly from the interface definitions. The lisp bindings created by Flavio Cruz as his 2008 GSoC project mostly use the latter approach, and can serve as a good example. In his 2011 GSoC project, Jérémie Koenig designed and began implementing an object-oriented interface; see his Java status page for details.
The task is to create easy to use Hurd bindings for a language of the student's choice, and some example servers to prove that it works well in practice. This project will require gaining a very good understanding of the various Hurd interfaces. Skills in designing nice programming interfaces are a must.
Anatoly A. Kazantsev has started working on Python bindings last year -- if Python is your language of choice, you probably should take his work and complete it.
There was also some previous work on Perl bindings, which might serve as a reference if you want to work on Perl.
Possible mentors: Anatoly A. Kazantsev (anatoly) for Python
Discussion
Java
IRC, freenode, #hurd, 2013-12-19
<antrik_> teythoon_: I don't think wrapping libtrivfs etc. for guile
bindings is really desirable... for the lisp bindings, we agreed that
it's better to hook in at a lower level, and build more lispish
abstractions
<antrik> trivfs is a C framework; it probably doesn't map very well to
other languages -- especially non-imperative ones...
<antrik> (it is arguable whether trivfs is really a good abstraction even
for C... but that's another discussion :-) )
<antrik> ArneBab: same for Python bindings. when I suggested ignoring
libtrivfs etc., working around the pthread problem was just a side effect
-- the real goal has always been nicer abstraction
<anatoly> antrik: agree with you
<anatoly> antrik: about nicer abstractions
<teythoon_> antrik: I agree too, but wrapping libtrivfs is much easier
<teythoon_> otherwise, one needs to reimplement lots of stuff to get some
basic functionality
<teythoon_> like a mig that emits your language
<braunr> i agree with antrik too
<braunr> yes, the best would be mig handling multiple languages
<antrik> teythoon_: not exactly. for dynamic languages, code generation is
silly. just handle the marshalling on the fly. that's what the Lisp
bindings are doing (AFAIK)
<teythoon> antrik: ok, but you'd still need to parse the rpc definitions,
no?
<antrik> teythoon: yeah, you still need to parse the .defs -- unless we add
reflection to RPC interfaces...
<antrik> err, I mean introspection
The Hurd presently uses a TCP/IP stack based on code from an old Linux version. This works, but lacks some rather important features (like PPP/PPPoE), and the design is not hurdish at all.
A true hurdish network stack will use a set of translator processes, each implementing a different protocol layer. This way not only the implementation gets more modular, but also the network stack can be used way more flexibly. Rather than just having the standard socket interface, plus some lower-level hooks for special needs, there are explicit (perhaps filesystem-based) interfaces at all the individual levels; special application can just directly access the desired layer. All kinds of packet filtering, routing, tunneling etc. can be easily achieved by stacking components in the desired constellation.
Implementing a complete modular network stack is not feasible as a GSoC project, though. Instead, the task is to take some existing user space TCP/IP implementation, and make it run as a single Hurd server for now, so it can be used in place of the existing pfinet. The idea is to split it up into individual layers later. The initial implementation, and the choice of a TCP/IP stack, should be done with this in mind -- it needs to be modular enough to make such a split later on feasible.
This is GNU Savannah task #5469.
Possible mentors: youpi
Exercise: You could try making some improvement to the existing pfinet implementation; or you could work towards running some existing userspace TCP/IP stack on Hurd. (As a normal program for now, not a proper Hurd server yet.)
The Hurd has both NFS server and client implementations, which work, but not very well: File locking doesn't work properly (at least in conjunction with a GNU/Linux server), and performance is extremely poor. Part of the problems could be owed to the fact that only NFSv2 is supported so far.
(Note though that locking on the Hurd is problematic in general, not only in conjunction with NFS -- see the file locking task.)
This project encompasses implementing NFSv3 support, fixing bugs and performance problems -- the goal is to have good NFS support. The work done in a previous unfinished GSoC project can serve as a starting point.
Both client and server parts need work, though the client is probably much more important for now, and shall be the major focus of this project.
Some discussion of NFS improvements has been done for a former GSoC application -- it might give you some pointers. But don't take any of the statements made there for granted -- check the facts yourself!
A bigger subtask is the libnetfs: io map
issue.
This task, GNU Savannah task #5497, has no special prerequisites besides general programming skills, and an interest in file systems and network protocols.
Possible mentors: ?
Exercise: Look into one of the existing issues in the NFS code. It's quite possible that you will not be able to fix any of the visible problems before the end of the application process; but you might discover something else you could improve in the code while working on it
If you can't find anything suitable, talk to us about possible other exercise tasks.
The most obvious reason for the Hurd feeling slow compared to mainstream systems like GNU/Linux, is a low I/O system performance, in particular very slow hard disk access.
The reason for this slowness is lack and/or bad implementation of common optimization techniques, like scheduling reads and writes to minimize head movement; effective block caching; effective reads/writes to partial blocks; reading/writing multiple blocks at once; and read-ahead. The ext2 filesystem server might also need some optimizations at a higher logical level.
The goal of this project is to analyze the current situation, and implement/fix various optimizations, to achieve significantly better disk performance. It requires understanding the data flow through the various layers involved in disk access on the Hurd (filesystem, pager, driver), and general experience with optimizing complex systems. That said, the killing feature we are definitely missing is the read-ahead, and even a very simple implementation would bring very big performance speedups.
Here are some real testcases:
running the Git testsuite which is mostly I/O bound;
use TopGit on a non-toy repository.
Possible mentors: Samuel Thibault (youpi)
Exercise: Look through all the code involved in disk I/O, and try something easy to improve. It's quite likely though that you will find nothing obvious -- in this case, please contact us about a different exercise task.
Although there are some attempts to move to a more modern microkernel altogether, the current Hurd implementation is based on GNU Mach, which is only a slightly modified variant of the original CMU Mach.
Unfortunately, Mach was created about two decades ago, and is in turn based on even older BSD code. Parts of the BSD kernel -- file systems, UNIX mechanisms like processes and signals, etc. -- were ripped out (to be implemented in userspace servers instead); while other mechanisms were added to allow implementing stuff in user space. (Pager interface, IPC, etc.)
Also, Mach being a research project, many things were tried, adding lots of optional features not really needed.
The result of all this is that the current code base is in a pretty bad shape. It's rather hard to make modifications -- to make better use of modern hardware for example, or even to fix bugs. The goal of this project is to improve the situation.
There are various things you can do here: Fixing compiler warnings; removing dead or unneeded code paths; restructuring code for readability and maintainability etc. -- a glance at the source code should quickly give you some ideas.
This task requires good knowledge of C, and experience with working on a large existing code base. Previous kernel hacking experience is an advantage, but not really necessary.
Possible mentors: Samuel Thibault (youpi)
Exercise: You should have no trouble finding something to improve when looking at the gnumach code, or even just at compiler warnings, for instance "implicit declaration of function", "format ‘%lu’ expects argument of type..." are easy to start with.
Hurd translators allow presenting underlying data in a different format. This is a very powerful ability: it allows using standard tools on all kinds of data, and combining existing components in new ways, once you have the necessary translators.
A typical example for such a translator would be xmlfs: a translator that presents the contents of an underlying XML file in the form of a directory tree, so it can be studied and edited with standard filesystem tools, or using a graphical file manager, or to easily extract data from an XML file in a script etc.
The exported directory tree should represent the DOM structure of the document, or implement XPath/XQuery, or both, or some combination thereof (perhaps XPath/XQuery could be implemented as a second translator working on top of the DOM one) -- whatever works well, while sticking to XML standards as much as possible.
Ideally, the translation should be reversible, so that another, complementary translator applied on the expanded directory tree would yield the original XML file again; and also the other way round, applying the complementary translator on top of some directory tree and xmlfs on top of that would yield the original directory again. However, with the different semantics of directory trees and XML files, it might not be possible to create such a universal mapping. Thus it is a desirable goal, but not a strict requirement.
The goal of this project is to create a fully usable XML translator, that allows both reading and writing any XML file. Implementing the complementary translator also would be nice if time permits, but is not mandatory part of the task.
The existing partial (read-only) xmlfs implementation can serve as a starting point.
This task requires pretty good designing skills. Very good knowledge of XML is also necessary. Learning translator programming will obviously be necessary to complete the task.
Possible mentors: Olaf Buddenhagen (antrik)
Exercise: Make some improvement to the existing xmlfs, or some other existing Hurd translator. (Especially those in hurdextras are often quite rudimental -- it shouldn't be hard to find something to improve...)
In UNIX systems, traditionally most software is installed in a common directory
hierarchy, where files from various packages live beside each other, grouped by
function: user-invokable executables in /bin
, system-wide configuration files
in /etc
, architecture specific static files in /lib
, variable data in
/var
, and so on. To allow clean installation, deinstallation, and upgrade of
software packages, GNU/Linux distributions usually come with a package manager,
which keeps track of all files upon installation/removal in some kind of
central database.
An alternative approach is the one implemented by GNU Stow and GNU
Guix: each package is
actually installed in a private directory tree. The actual standard directory
structure is then created by collecting the individual files from all the
packages, and presenting them in the common /bin
, /lib
, etc. locations.
While the normal Stow or Guix package (for traditional UNIX systems) uses symlinks to the actual files, updated on installation/deinstallation events, the Hurd translator mechanism allows a much more elegant solution: stowfs (which is actually a special mode of unionfs) creates virtual directories on the fly, composed of all the files from the individual package directories.
The problem with this approach is that unionfs presently can be launched only once the system is booted up, meaning the virtual directories are not available at boot time. But the boot process itself already needs access to files from various packages. So to make this design actually usable, it is necessary to come up with a way to launch unionfs very early at boot time, along with the root filesystem.
Completing this task will require gaining a very good understanding of the Hurd boot process and other parts of the design. It requires some design skills also to come up with a working mechanism.
Possible mentors: Carl Fredrik Hammar (cfhammar)
For historical reasons, UNIX filesystems have a real (hard) ..
link from each
directory pointing to its parent. However, this is problematic, because the
meaning of "parent" really depends on context. If you have a symlink for
example, you can reach a certain node in the filesystem by a different path. If
you go to ..
from there, UNIX will traditionally take you to the hard-coded
parent node -- but this is usually not what you want. Usually you want to go
back to the logical parent from which you came. That is called "lexical"
resolution.
Some application already use lexical resolution internally for that reason. It is generally agreed that many problems could be avoided if the standard filesystem lookup calls used lexical resolution as well. The compatibility problems probably would be negligible.
The goal of this project is to modify the filename lookup mechanism in the Hurd to use lexical resolution, and to check that the system is still fully functional afterwards. This task requires understanding the filename resolution mechanism.
See also GNU Savannah bug #17133.
Possible mentors: Carl Fredrik Hammar (cfhammar)
Exercise: This project requires changes to the name lookup mechanism in the Hurd-related glibc parts, as well as the Hurd servers. Thus, the exercise task should involve hacking glibc or Hurd servers, or even both. Fixing the bug in the client-side nfs translator (/hurd/nfs) that makes "rmdir foo/" fail while "rmdir foo" works, seems a good candidate.
The Hurd design facilitates splitting up large applications into independent, generic components, which can be easily combined in different contexts, by moving common functionality into separate Hurd servers (translators), accessible trough filesystem interfaces and/or specialized RPC interfaces.
Download protocols like FTP, HTTP, BitTorrent etc. are very good candidates for this kind of modularization: a program could simply use the download functionality by accessing FTP, HTTP etc. translators.
There is already an ?ftpfs translator in the Hurd tree, as well as an httpfs on hurdextras; however, these are only suitable for very simple use cases: they just provide the actual file contents downloaded from the URL, but no additional status information that are necessary for interactive use. (Progress indication, error codes, HTTP redirects etc.)
A new interface providing all this additional information (either as an extension to the existing translators, or as distinct translators) is required to make such translators usable as backends for programs like apt-get for example.
The goal of this project is to design a suitable interface, implement it for at least one download protocol, and adapt apt-get (or some other program) to use this as a backend.
This task requires some design skills and some knowledge of internet protocols, to create a suitable interface. Translator programming knowledge will have to be obtained while implementing it.
It is not an easy task, but it shouldn't pose any really hard problems either.
Possible mentors: Olaf Buddenhagen (antrik)
Exercise: Make some improvement to one of the existing download translators -- httpfs in particular is known to be buggy.
POSIX describes some constants (or rather macros) like PATH_MAX/MAXPATHLEN and similar, which may be defined by the system to indicate certain limits. Many people overlook the may though: Systems only should define them if they actually have such fixed limits (see limits.h). The Hurd, following the GNU Coding Standards, tries to avoid this kind of arbitrary limits, and consequently doesn't define the macros.
Many programs however just assume their presence, and use them unconditionally. This is simply sloppy coding: not only does it violate POSIX and fails on systems not defining the macros, but in fact most common use cases of these macros are simply wrong! (See http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html for some hints as to why this is so.)
There are a few hundred packages in Debian GNU/Hurd failing to build because of this -- simply grep for the offending macros in the list_of_build_failures.
Fixing these issues usually boils down to replacing char foo[PATH_MAX]
by char *foo
, and using dynamic memory allocation, i.e. e.g. a loop
that tries geometrically growing sizes. Sometimes this is tricky, but
more often not very hard. Sometimes it is even trivial because the GNU
system has proper replacements. See the corresponding section of the
porting guidelines page for more details. With a bit of
practice, it should be easily possible to fix several programs per day.
The goal of this project is to fix the PATH_MAX and related problems in a significant number of packages, and make the fixes ready for inclusion in Debian and (where possible) upstream. No Hurd-specific knowledge is needed, nor any other special knowledge aside from general C programming skills.
Possible mentors: Samuel Thibault (youpi)
Exercise: Fix the PATH_MAX issues in some Debian package.
Many programs use special libraries to access certain hardware devices, like libusb, libbluetooth, libraw1394, libiw-dev (though there already is a wireless-tools-gnumach package), etc.
The Hurd presently doesn't support these devices. Nevertheless, all of these programs could still be built -- and most of them would indeed be useful -- without actual support of these hardware devices, kdebase for instance. However, as the libraries are presently not available for Hurd, the programs can't be easily built in Debian GNU/Hurd due to missing dependencies.
This could be avoided by providing dummy libraries, which the programs could link against, but which wouldn't actually do any hardware access: instead, they would simply return appropriate error codes, reporting that no devices were found.
There are two possible approaches for providing such stub libraries: Either implement replacement libraries providing the same API as the real ones; or implement dummy backends for the Hurd in the proper libraries. Which approach to prefer probably depends on the structure of the various libraries.
The goal of this project is to create working dummy libraries/backends for the mentioned devices, and get them into Debian GNU/Hurd. It shouldn't require any special previous knowledge, though some experience with build systems would be helpful. Finishing this task will probably require learning a bit about the hardware devices in question, and about Debian packaging.
Possible mentors: Samuel Thibault (youpi)
Exercise: Get one of the libraries to compile on Debian GNU/Hurd. It doesn't need to report reasonable error codes yet -- just make it build at all for now.
The Hurd presently has only support for CD-ROMs, but not for audio extraction ("grabbing"). As a result, cdparanoia (and other extraction libraries/utilities) are not available; and many other packages depending on these can't be built in Debian GNU/Hurd either.
Adding support for audio extraction shouldn't be too hard. It requires implementing a number of additional ioctl()s, generating the appropriate ATAPI commands.
The goal of this task is fully working cdparanoia in Debian GNU/Hurd. It will require digging a bit into Hurd internals and ATAPI commands, but should be quite doable without any previous knowledge about either.
Possible mentors: Samuel Thibault (youpi)
Exercise: Look at the implementation of the existing ioctl()s, and try to find something that could be easily added/improved. If you don't see anything obvious, talk to us about a different exercise task.
Perl and Python are available on the Hurd, but there are still test suite failures. These could be caused by problems in the system-specific implementation bits of Perl/Python, and/or shortcomings in the actual system functionality which Perl/Python depends on.
The student applying for this project can pick either Perl or Python, whichever he is more comfortable with. (Perl is higher priority though; and there are more failures too.)
The goal then is to fix all of the problems with the chosen language if possible, or at least some of them. Some issues might require digging quite deep into Hurd internals, while others are probably easy to fix.
Note that while some Perl/Python knowledge is probably necessary to understand what the test suite failures are about, the actual work necessary to fix these issues is mostly C programming -- in the implementation of Perl/Python and/or the Hurd.
Possible mentors: Samuel Thibault (youpi)
Exercise: Take a stab at one of the testsuite failures, and write a minimal testcase exposing the underlying problem. Actually fixing it would be a bonus of course -- but as it's hard to predict which issues will be easy and which will be tricky, we will already be satisfied if the student makes a good effort. (We hope to see some discussion of the problems in this case though )
Fix Compatibility Problems Exposed by Testsuites
A number of software packages come with extensive testsuites. Some notable ones are glibc, gnulib, Perl, Python, GNU Coreutils, and glib. While these testsuites were written mostly to track regressions in the respective packages, some of the tests fail on the Hurd in general.
There is also the Open POSIX Testsuite which is more of a whole system interface testing suite.
Then, there is the File System Exerciser which we can use to test our file system servers for conformity.
While in some cases these might point to wrong usage of system interfaces, most of the time such failures are actually caused by shortcomings in Hurd's implementation of these interfaces. These shortcomings are often not obvious in normal use, but can be directly or indirectly responsible for all kinds of failures. The testsuites help in isolating such problems, so they can be tracked down and fixed.
This task thus consists in running some of the mentioned testsuites (and/or any other ones that come to mind), and looking into the causes of failures. The goal is to analyze all failures in one or more of the listed testsuites, to find out what shortcomings in the Hurd implementation cause them (if any), and to fix at least some of these shortcomings.
Note that this task somewhat overlaps with the Perl/Python task. Here the focus however is not on improving the support for any particular program, but on fixing general problems in the Hurd.
A complementary task is adding a proper unit testing framework to the GNU Hurd's code base, and related packages.
Implement Missing Interfaces in glibc for GNU Hurd
A related project is to implement missing interfaces for GNU Hurd (glibc wiki), primatily in glibc.
In glibc's Linux kernel port, most simple POSIX interfaces are in fact just forwarded to (implemented by) Linux kernel system calls. In contrast, in the GNU Hurd port, the POSIX (and other) interfaces are actually implemented in glibc on top of the Hurd RPC protocols. A few examples: getuid, open, rmdir, setresuid, socketpair.
When new interfaces are added to glibc (new editions of POSIX and similar standards, support for new editions of C/C++ standards, new GNU-specific extensions), generally ENOSYS stubs are added, which are then used as long as there is no real implementation, and often these real implementations are only done for the Linux kernel port, but not GNU Hurd. (This is because most of the contributors are primarily interested in using glibc on Linux-based systems.) Also, there is quite a backlog of missing implementations for GNU Hurd.
In coordination with the GNU Hurd developers, you'd work on implementing such missing interfaces.
These are very flexible tasks: while less experienced students should be able to tackle at least a few of the easier problems, other issues will be challenging even for experienced hackers. No specific previous knowledge is required; only fairly decent C programming skills. While tracking down the various issues, the student will be digging into the inner workings of the Hurd, and thus gradually gaining the knowledge required for Hurd development in general.
Possible mentors: Samuel Thibault (youpi)
Exercise: Take a stab at one of the testsuite failures, or missing implementation, and write a minimal testcase exposing the underlying problem. Actually fixing it would be a bonus of course -- but as it's hard to predict which issues will be easy and which will be tricky, we will already be satisfied if the student makes a good effort. (We hope to see some discussion of the problems in this case though )
Hurd development would benefit greatly from automated tests. Unit tests should be added for all the major components (Mach; Hurd servers and libraries). Also, functional tests can be performed on various levels: Mach; individual servers; and interaction of several servers.
(The highest level would actually be testing libc functionality, which in turn uses the various Hurd mechanisms. glibc already comes with a test suite -- though it might be desirabe to add some extra tests for exercising specific aspects of the Hurd...)
Our page on automated testing collects some relevant material.
The Goal of this task is to provide testing frameworks that allow automatically running tests as part of the Hurd and Mach build processes. The student will have to create the necessary infrastrucure, and a couple of sample tests employing it. Ideally, all the aspects mentioned above should be covered. At least some have to be ready for use and upstream merging before the end of the summer.
(As a bonus, in addition to these explicit tests, it would be helpful to integrate some methods for testing locking validity, performing static code analysis etc.)
This task probably requires some previous experience with unit testing of C programs, as well as dealing with complex build systems. No in-depth knowledge about any specific parts of the Hurd should be necessary, but some general understanding of the Hurd architecture will have to be aquired to complete this project. This makes it a very good project to get started on Hurd development
Possible mentors: ?
Exercise: Create a program performing some simple test(s) on the Hurd or Mach code. It doesn't need to be integrated in the build process yet -- a standalone progrem with it's own Makefile is fine for now.
libcap is a library providing the API to access POSIX capabilities. These allow giving various kinds of specific privileges to individual users, without giving them full root permissions.
Although the Hurd design should facilitate implementing such features in a quite natural fashion, there is no support for POSIX capabilities yet. As a consequence, libcap is not available on the Hurd, and thus various packages using it can not be easily built in Debian GNU/Hurd.
The first goal of this project is implementing a dummy libcap, which doesn't actually do anything useful yet, but returns appropriate status messages, so program using the library can be built and run on Debian GNU/Hurd.
Having this, actual support for at least some of the capabilities should be implemented, as time permits. This will require some digging into Hurd internals.
Some knowledge of POSIX capabilities will need to be obtained, and for the latter part also some knowledge about the Hurd architecture. This project is probably doable without previous experience with either, though.
David Hedberg applied for this project in 2010, and though he didn't go through with it, he fleshed out many details.
Possible mentors: Samuel Thibault (youpi)
Exercise: Make libcap compile on Debian GNU/Hurd. It doesn't need to actually do anything yet -- just make it build at all for now.
Valgrind is an extremely useful debugging tool for memory errors. (And some other kinds of hard-to-find errors too.) Aside from being useful for program development in general, a Hurd port will help finding out why certain programs segfault on the Hurd, although they work on Linux. Even more importantly, it will help finding bugs in the Hurd servers themselfs.
To keep track of memory use, Valgrind however needs to know how each system call affects the validity of memory regions. This knowledge is highly kernel-specific, and thus Valgrind needs to be explicitely ported for every system.
Such a port involves two major steps: making Valgrind understand how kernel traps work in general on the system in question; and how all the individual kernel calls affect memory. The latter step is where most of the work is, as the behaviour of each single system call needs to be described.
Compared to Linux,
Mach (the microkernel used by the Hurd) has very few kernel traps.
Almost all system calls are implemented as RPCs instead --
either handled by Mach itself, or by the various Hurd servers.
All RPCs use a pair of mach_msg()
invocations:
one to send a request message, and one to receive a reply.
However, while all RPCs use the same mach_msg()
trap,
the actual effect of the call varies greatly depending on which RPC is invoked --
similar to the ioctl()
call on Linux.
Each request thus must be handled individually.
Unlike ioctl()
,
the RPC invocations have explicit type information for the parameters though,
which can be retrieved from the message header.
By analyzing the parameters of the RPC reply message,
Valgrind can know exactly which memory regions are affected by that call,
even without specific knowledge of the RPC in question.
Thus implementing a general parser for the reply messages
will already give Valgrind a fairly good approximation of memory validity --
without having to specify the exact semantic of each RPC by hand.
While this should make Valgrind quite usable on the Hurd already, it's not perfect: some RPCs might return a buffer that is only partially filled with valid data; or some reply parameters might be optional, and only contain valid data under certain conditions. Such specific semantics can't be deduced from the message headers alone. Thus for a complete port, it will still be necessary to go through the list of all known RPCs, and implement special handling in Valgrind for those RPCs that need it. Reading the source code of the rpctrace tool would probably be useful to understand how the RPC message can be parsed.
The goal of this task is at minimum to make Valgrind grok Mach traps, and to implement the generic RPC handler. Ideally, specific handling for RPCs needing it should also be implemented.
Completing this project will require digging into Valgrind's handling of system calls (in C), and into Hurd RPCs. It is not an easy task, but a fairly predictable one -- there shouldn't be any unexpected difficulties, and no major design work is necessary. It doesn't require any specific previous knowledge: only good programming skills in general. On the other hand, the student will obtain a good understanding of Hurd RPCs while working on this task, and thus perfect qualifications for Hurd development in general
Possible mentors: Samuel Thibault (youpi)
Exercise: As a starter, students can try to teach valgrind a couple of Linux ioctls, as this will make them learn how to use the read/write primitives of valgrind.
Nowadays the most often encountered cause of Hurd crashes seems to be lockups in the ext2fs server. One of these could be traced recently, and turned out to be a lock inside libdiskfs that was taken and not released in some cases. There is reason to believe that there are more faulty paths causing these lockups.
The task is systematically checking the libdiskfs code for this kind of locking issues. To achieve this, some kind of test harness has to be implemented: For example instrumenting the code to check locking correctness constantly at runtime. Or implementing a unit testing framework that explicitly checks locking in various code paths. (The latter could serve as a template for implementing unit tests in other parts of the Hurd codebase...)
(A systematic code review would probably suffice to find the existing locking issues; but it wouldn't document the work in terms of actual code produced, and thus it's not suitable for a GSoC project...)
This task requires experience with debugging locking issues in multithreaded applications.
Tools have been written for automated code analysis; these can help to locate and fix such errors.
Possible mentors: Samuel Thibault (youpi)
Exercise: If you could actually track down and fix one of the existing locking errors before the end of the application process, that would be excellent. This might be rather tough though, so probably you need to talk to us about an alternative exercise task...