Improve Java on Hurd

Description

The project consists in improving Java support on Hurd. This includes porting OpenJDK, creating low-level Java bindings for Mach and Hurd, as well as creating Java libraries to help with translator development.

I started working on this as a participant in Google Summer of Code 2011, for the GNU project. See my original proposal and final report.

Current status

OpenJDK 7 kindof works, but there are still imperfections and some integration work remains.

This page is somewhat out-of-date. At the moment, the GSoC [report] is more accurate.

Apt repository

Modified Debian packages are available in this repository:

deb http://jk.fr.eu.org/debian experimental/
deb-src http://jk.fr.eu.org/debian experimental/

Glibc signal code improvements

2011-06-29: Patches were submitted to libc-alpha which implement global signal dispositions and SA_SIGINFO. My latest code is available on github, and modified Debian packages are available in my apt repository.

2011-07-20: The patches were reviewed by Samuel Thibault. Samuel pointed out a couple of issues and I beleive I have addressed all of them (fixes posted). I'm in the process of publishing updated libc and hurd packages; provided those work as expected, the next step would be to get these changes into Debian.

One question is how the new symbols introduced by my patches should be handled. Weak symbols turned out to be impractical, so I'm currently considering using a Debian-specific symbol version in the interim period (GLIBC_2.13_DEBIAN_8 so far). The ultimate symbol version to be used will depend on the time at which the patches get integrated upstream (most likely GLIBC_2.15), at which point we will alias the interim version to the new one in debian packages.

I have modified libc0.3 to include a deb-symbols(5) file (alternatively see http://wiki.debian.org/Projects/ImprovedDpkgShlibdeps) so that we get an accurate libc dependency in hurd and other packages when the symbols in question are pulled in.

libthreads (cthreads library) will not be changed. There's no reason why its behavior should change, whereas for libpthread it's needed for conformance. Patches posted on 2011-05-25, but there's a more recent one in the modified hurd package (adds _hurd_sigstate_delete and removes the weak symbols).

IRC, freenode, #hurd, 2011-07-27:

< jkoenig> the glibc patches are pending review and inclusion in Debian (I
  think youpi wants to check my latest additions before we go ahead with
  that)
< jkoenig> when it's in Debian and the sky does not fall, I intend to
  resubmit a full series to libc-alpha for inclusion upstream.

IRC, freenode, #hurd, 2011-08-24:

< youpi> jkoenig: I'll probably commit your siginfo/globalsig patches soon
< youpi> I'm building the ant package atm, seems to proceed great
< jkoenig> youpi, great!

Another issue which came up with OpenJDK is the expansion by the dynamic linker of $ORIGIN in the RPATH header, see below.

Plans

The patches are pending review and inclusion upstream. As soon as we reach an agreement wrt. the new interfaces (in particular wrt. the value of SA_SIGINFO), the patches will be applied to the Debian libc packages for broader testing.

Open Items
  • Test patches: in progress, jkoenig, Svante. More volunteers welcome, of course.

    There's an issue with gdb, namely signals lose their "untracedness" when they go through the global sigstate's pending mask, so gdb spins intercepting a signal and trying to deliver it. Patch.

  • If jkoenig thinks it's mature enough: should ask Samuel to test these patches on the buildds.

    There's a risk that a dependency on my patched libc might be pulled in while building packages (in particular hurd) --jkoenig 2011-06-22

    • Waiting on ABI finalization ([!] Roland).

      • Which numeric values to use for SA_SIGINFO (and SA_NOCLDWAIT)?

        Staying in sync with BSD seems the most logical approach, so I have defined it to 0x40. --jkoenig 2011-06-29

  • Get patches reviewed (Roland?), and integrated into official sources: [!] tschwinge.

    samuelthibault reviewed the patches and pointed out a couple of issues which I'm currently working on:

    • Slight behaviour change with respect to forgetting blocked ignored signals. POSIX is flexible in this regard but I guess we could retain them instead of the current behaviour.
    • Sigstate accessors could be made extern inline functions. I suggest we postpone this.
    • Incorrect changes for msg_{get,set}_init_int(INIT_SIGMASK)
    • Some comments which can be improved.

    Once these are fixed we can probably test the patches in Debian.

    --?jk 2011-07-06

  • Documentations bits (from here, the initial proposal, and elsewhere) should probably be moved either into the appropriate glibc or Hurd documentation files/reference manuals, or to signal.

  • SA_SIGINFO patch is based on Samuel's earlier work. Thus, have him review the new patch?

  • SA_SIGINFO patch has a few TODOs w.r.t. protocol changes for missing information, and for FPU state. Providing even incomplete information is an improvement on the current status. The question is, whether applications rely on this information in any hard way if SA_SIGINFO is available?

    • We could possibly rename certain fields in struct siginfo, say si_pid_not_implemented, to ensure compilation failures for programs which use them. Or perhaps a linker warning is possible.

      IRC, freenode, #hurd, 2011-08-20:

      < youpi> jkoenig: I was considering renaming the fields of siginfo
      < youpi> to catch applications which need those which we haven't
        yet
      < jkoenig> youpi, makes sense AFAICT
      < youpi> one issue we'll get is some application which previously
        built without SA_SIGINFO, and will now want some information
        we're not yet able to provide
      < youpi> but at least we'll know
      < jkoenig> youpi, yes it would still be better than having them
        crash at runtime because of it
      

      IRC, freenode, #hurd, 2011-08-21:

      < youpi> jkoenig: actually we need the fields for waitid
      
    • The FPU state is not included in the ucontext_t passed to the signal handler. On the other hand, ucontext_t is actually being somewhat deprecated: the functions to restore it are no longer in POSIX. thread get state should return this information, in case we decide to fill the gap, and there might be existing glibc wrappers, too.

  • Perhaps have a look at SA_NOCLDWAIT.

Port OpenJDK

As suggested by tschwinge, I have targeted OpenJDK 7 at first. I don't expect it will be too hard to backport my patches to OpenJDK 6. I have succeeded in building a working JIT-less ("zero") version, although the dynamic linker issue must be worked around. Porting Hotspot (the original just-in-time compiler of OpenJDK) should not be too hard. If that fails we can fall back on Shark (a portable alternative JIT which uses LLVM).

Complexity of porting HotSpot: probably low. The complex things should be arch- rather than OS-specific. Not many Linux-specific interfaces used. Garbage collection/memory management, etc. and/or most of other Linux-specific interfaces are already dealt with for the zero build.

The dynamic linker issue is as follows. An executable-specific search path can be provided in the ELF RPATH header. RPATH directories can include the special string $ORIGIN, which is to be expanded to the directory the executable was loaded from. OpenJDK's java command uses this feature to locate the right libjli.so at runtime. However, on Hurd this information is not available to the dynamic linker and as a consequence RPATH components which include $ORIGIN are silently discarded.

This can be worked around by defining the LD_ORIGIN_PATH environment variable. (which have I used to build and test OpenJDK so far.)

IRC, freenode, #hurd, 2011-07-27:

< jkoenig> if you have the latest hurd/libc in my repository, you should be
  able to run /usr/lib/jvm/java-7-openjdk/bin/java without defining
  LD_ORIGIN_PATH manually
< braunr> java: error while loading shared libraries: libjli.so: cannot
  open shared object file: No such file or directory
< jkoenig> braunr, this one is expected, it's the symlink problem.
< braunr> oh ok
< jkoenig> (ie. thus far, if java is accessed as /usr/bin/java, the ld
  origin ends up as /usr/bin)

< jkoenig> *sigh*... it seems I'm going to have to reimplement realpath()
  in elf/dl-origin.c.
< braunr> why ?
< jkoenig> using it from there results in duplicate symbols when linking
  elf/librtld.map.o
< braunr> from where ?
< braunr> dl-origin ?
< jkoenig> apparently this part of the code uses a different allocator
  (elf/dl-minimal.c)
< braunr> oh
< braunr> depndency issues ?
< braunr> or bootstrapping ones ?
< jkoenig> http://paste.debian.net/124310/
< jkoenig> dl-origin is what provides the $ORIGIN value for RPATH (now
  sysdeps/mach/hurd/dl-origin.c, in our case)
< braunr> but what's the problem ?
< braunr> what prevents you from using the existing implementation ?
< jkoenig> you mean copy-and-paste the code ? Well I'll end up doing that I
  guess... not that it feels right.
< braunr> not really
< braunr> link against what provides it
< braunr> i'm really not familiar with glibc :/
< jkoenig> also I'd like to understand what's happening precisely before I
  resort to such blasphemy :-)
< braunr> :)
< jkoenig> maybe I could make {file,exec,_hurd}_exec_file_name()
  canonicalize it instead.
< jkoenig> for some reason it does not feel right, though.
< braunr> why ?
< jkoenig> I'm not sure, loss of information maybe?
< jkoenig> (that I ran /usr/bin/java as opposed to /usr/lib/jvm/...)
< braunr> i guess you should explain the issue more clearly, i feel like
  there is something i'm really missing :/
< braunr> but it can wait
< jkoenig> that ld.so actually needs the canonical file name to substitute
  $ORIGIN is its own problem, not that of exec or _hurd_exec_file_name..
< jkoenig> Ok, so.. Initially the shell (indirectly) runs
  _hurd_exec_file_name(..., "/usr/bin/java", ...), which then calls
  file_exec_file_name() on the file in question, passing it its own
  filename
< jkoenig> which is transmitted to exec_exec_file_name()
< jkoenig> (until now it's all pochu's patch)
< jkoenig> which then makes it available to the newly created process
  through exec_startup_get_info_2() (my own addition)
< braunr> oh
< braunr> wasn't it available before oO ?
< jkoenig> no, exec only has access to a port to the executable file.
< braunr> how was argv[0] handled then ?
< jkoenig> argv[0] is handled like any other argument
< braunr> ok, so the file path is duplicated ?
< jkoenig> the shell (or whomever calls _hurd_exec) provide whatever they
  want.
< braunr> ok
< jkoenig> well argv[0] is not necessarily the file path (at least not the
  full path)
< braunr> right
< jkoenig> so exec() does some guesswork with $PATH but obviously that's
  limited.
< braunr> so what you changed is that get_info_2 now receives a canonical
  path ?
< jkoenig> right
< jkoenig> (or whatever was specified to _hurd_exec_file_name(), for this
  reason and others we shouldn't use it for setuid programs.)
< jkoenig> well, not a canonical path. A path. (hence the problem)
< braunr> ok
< jkoenig> now both the filesystem and exec might run under another root so
  they're not an option for canonicalization
< jkoenig> _hurd_exec_file_name (in libc) might be a better spot.
< braunr> resolution from the client, yes

IRC, freenode, #hurd, 2011-08-03:

< jkoenig> so my RPATH patches are polished and built, and I'll post them
  soon, is the good news

IRC, freenode, #hurd, 2011-08-17:

< jkoenig> also fixed a fakeroot-induced deadlock in my dl-origin patches
  (namely, under fakeroot, realpath() uses a socket (through stat), so we
  need to use it when _hurd_dtable_lock is not held)
< jkoenig> also I'll post my dl-origin patches shortly
< youpi> dl-origin is about the environment variable that java needs,
  right?
< jkoenig> about the environment variable it shouldn't need, yes :-)
< youpi> ah :)
< youpi> but ok, I vaguely remember what that refers to
< jkoenig> $LD_ORIGIN_PATH is used as an override (much like
  LD_LIBRARY_PATH), but ideally ld.so uses whatever directory the loaded
  binary is from.
< youpi> ok
< jkoenig> (as a substitution for $ORIGIN in RPATH)

Plans

I intend to fix the RPATH issue by building on pochu's file_exec_file_name() patches.

I have succeeded in building a Hotspot-enabled libjvm.so, although the current toolchain issues (ELFOSABI GNU; 2011-07-03: fix committed in binutils) have so far prevented me from testing it.

It turns out the build fails later on in hotspot/agent because Hurd lack a libthread_db.so. Also, a Shark version builds, but the result does not work so far.

In other news, Damien Raude-Morvan is working on a kFreeBSD version, so I intend to merge my current patches with his.

--jkoenig 2011-06-29

IRC, freenode, #hurd, 2011-08-03:

< jkoenig> and I'm battleing to update my OpenJDK patches to b147, and
  merge the with the kFreeBSD ones.
< braunr> b147 ?
< jkoenig> but that thing is seriously huge and touches about everything,
  so it's taking more time than I'd have hoped
< jkoenig> braunr, the latest release of IcedTea / OpenJDK 7 and the
  current Debian version (in experimental of course)
< braunr> ok
< jkoenig> I'm trying to make this clean so that hopefully we can get them
  integrated at some level of upstream (probably IcedTea, at least at
  first)

IRC, freenode, #hurd, 2011-08-10:

< jkoenig> well actually I've finished merging my patches with the freebsd
  ones, and updating them to the new openjdk-7,
< jkoenig> but now a new version of both is out :-P
Upstream Submission

On 2011-07-15, gnu_andrew talked to us in the #hurd channel (freenode IRC), who is a maintainer of IcedTea. He's supportive of the porting approach, and is willing to review and integrate small patches for individual issues (rather than some huge patchset). Send patches to [email protected].

Open Items
  • [!] tschwinge to have a look at pochu's file_exec_file_name() patches, whether it's generally the right idea.

    • Assuming it is, continue with getting $ORIGIN working.
  • libthread_db.so issue. Likely, the Serviceability Agent is used by jdb and the like only, so for now the goal should be to lose some functionality by removing/avoiding this dependency.

  • java-access-bridge (not critical; JVM appears to work without)

  • IRC, freenode, #hurd, 2011-07-27:

    < jkoenig> there's a bug with java.nio when running javadoc, you might
      run into it.
    
  • SCM CREDENTIALS

    IRC, freenode, #hurd, 2011-08-03:

    < jkoenig> wrt. peer credentials, openjdk also uses file modes for
      security, and my guess is that it's sufficient, at least on Hurd, so
      I've reduced my priority for this at least in the meantime
    
  • They seem to have a rather heavy-weight process for such projects: confer http://mail.openjdk.java.net/pipermail/announce/2011-January/000092.html, for example. Do we need this, too?

    Probably not. My current approach (and Damien's wrt. the kFreeBSD patches) is to add preprocessor directives in the Linux code to make it more portable. --jkoenig 2011-06-29

  • Eclipse

    OK for testing -- but I'd very much hope that it just works as soon as we provide the required Java platform. But it may perhaps have some Linux-specifics (needlessly?) in its basement. Is it available for Debian GNU/kFreeBSD already?

Java bindings for Mach

The code is at http://github.com/jeremie-koenig/hurd-java.

tschwinge's notes for building with...

  • GCJ installed (due to the current Debian multilib confusion):

    $ tmp1=/usr/lib/gcc/i486-gnu/4.6 tmp2=/usr/lib/i386-gnu/gcc/i486-gnu/4.6 LIBRARY_PATH=$tmp2 COMPILER_PATH=$tmp1:$tmp2 C_INCLUDE_PATH=$tmp1/include make
    
  • OpenJDK installed (to have it find the shared library, and the jni.h header file):

    $ jdk=/usr/lib/jvm/java-7-openjdk LD_LIBRARY_PATH=$jdk/jre/lib/i386/jli C_INCLUDE_PATH=$jdk/include make
    

Doxygen-generated documentation is available at http://jk.fr.eu.org/hurd-java/doc/html/; or run make doc yourself.

IRC, freenode, #hurd, 2011-07-27:

< jkoenig> I need to be able to read/write individual data items from
  messages, in order to implement deallocation correctly, so I'm working on
  that when I'm waiting for things to build, but it's not my primary focus
  right now.

IRC, freenode, #hurd, 2011-08-17:

< jkoenig> so, weekly status report: I have made some progress on the java
  bindings, I hope to have a safe version mach_msg soon, after which I can
  begin experimenting with mig.

Plans

(just started.)

Open Items
  • RPC stub generator

  • mach_msg

    • Seems like the right approach to tschwinge, but he hasn't digested all the pecularities yet. Will definitely need more time.

Postponed

Might get back to these as time/interest permits.

GCJ

  • Java in GCC (that is, GCJ) has been removed in GCC 7.

Joe-E.