GSoC 2011 final report (Java on Hurd)
This is my final report regarding my work on Java for Hurd as a Google Summer of Code student for the GNU project. The work is going on, for recent status updates, see my java page.
Global signal dispositions and SA_SIGINFO
Signal delivery was implemented in Hurd before POSIX threads were defined. As a consequence the current semantics differ from the POSIX prescriptions, which libgcj relies on.
On the Hurd, each thread has its own signal dispositions and process-wide signals are always received by the main thread. In contrast, POSIX specifies signal dispositions to be global to the process (although there is still a per-thread blocking mask), and a global signal can be delivered to any thread which does not block it.
To further complicate the matter, the Hurd currently has two options for threads: the cthread library, still used by most of the Hurd code, and libpthread which was introduced later for compatibility purposes. To avoid breaking existing code, cthread programs should continue to run with the historical Hurd signal semantics whereas pthread programs are expected to rely on the POSIX behavior.
To address this, the patch series I wrote allows selecting a per-thread
behavior: by default, newly created threads provide historical
semantics, but they can be marked by libpthread as global signal
receivers using the new function _hurd_sigstate_set_global_rcv()
.
In addition, I refactored some of the signal code to improve
readability, and fixed a couple of bugs I came across in the process.
Another improvement which was required by OpenJDK was the implementation
of the SA_SIGINFO
flag for signal handlers. My signal patch series
provides the basic infrastructure. However it is not yet complete, as
some of the information provided by siginfo_t
structures is not
available to glibc. Making this information available would require a
change in the msg_sig_post()
RPC.
Related Debian changes
In Debian GNU/Hurd, libpthread is provided the hurd
package. Hurd also
uses extern inline functions from glibc which are affected by the new
symbols. This means that newer Hurd packages which take advantage of
glibc's support for global signal dispositions cannot run on older C
libraries and some thought had to be given to the way we could ensure
smooth upgrades.
An early attempt at using weak symbols proved to be impractical. As a consequence I modified the eglibc source package to enable dpkg-gensymbols on hurd-i386. This means that packages which are built against a newer libc and make use of the new symbols will automatically get an appropriately versionned dependency on libc0.3.
Status as of 2012-01-28
The patch series has not yet been merged upstream. However, it is now being used for the Debian package of glibc.
$ORIGIN substitution in RPATH
Another feature used by OpenJDK which was not implemented in Hurd is the
substitution of the special string $ORIGIN
within the ELF RPATH
header. RPATH
is a per-executable library search path, within which
$ORIGIN
should be substituted by the directory part of the binary's
canonical file name.
Currently, a newly executed program has no way of figuring out which
binary it was created from. Actually, even the _hurd_exec()
function,
which is used in glibc to implement the exec*()
family, is never
passed the file name of the executable, but only a port to it.
Likewise, the file_exec()
, exec_exec()
and exec_startup_get_info()
RPCs do not provide a path to transmit the file name from the shell to
the file system server, to the exec server, to the executed program.
Last year, Emilio Pozuelo Monfort submitted a patch series which fixes
this problem, up to the exec server. The series' original purpose was to
replace the guesswork done by exec
when running shell scripts. It
provides new versions of file_exec()
and exec_exec()
which allow for
passing the file name. I extended Emilio's patches to add the missing
link, namely a new exec_startup_get_info_2()
RPC. New code in glibc
takes advantage of it to retrieve the file name and use it in a
Hurd-specific dl-origin.c
to allow for RPATH
$ORIGIN
substitution.
Status as of 2012-01-28
The (hurd and glibc) patch series for $ORIGIN
are mostly complete.
However, there is still an issue related to the canonicalization of the
executable's file name. Doing it in the dynamic linker (where $ORIGIN
is expanded) is complicated due to the limited set of available
functions (realpath()
is not). Unfortunately canonicalizing in
_hurd_exec_file_name()
is not an option either because many shell
scripts use argv[0]
to alter their behavior, but argv[0]
is replaced
by the shell with the file name it's passed.
Another issue is that the patches use a fixed-length string buffer to transmit the file name through RPC.
OpenJDK 7
With the groundwork above being taken care of, I was able to build OpenJDK 7 on Hurd, although heavy portability patching was also necessary. A similar effort for Debian GNU/kFreeBSD was undertaken around the same time by Damien Raude-Morvan, so I intend to submit a more general set of "non-Linux" patches.
Due to the lack of a libpthread_db
library on the Hurd, I was only
able to build a Zero (interpreter only) virtual machine so far. However,
it should be possible to disable the debugging agent and build Hotspot.
Status as of 2012-01-28
I have put together generic nonlinux-*.diff
patches for the openjdk7
Debian package, however I have not yet tested them on Linux and kFreeBSD.
Java bindings
Besides improving Java support on Hurd, my original proposal also included the creation of Java bindings for the Hurd interfaces. My progress on this front has not been as fast as I would have liked. However I have started some of the work required to provide safe Java bindings for Mach system calls.
See https://github.com/jeremie-koenig/hurd-java.