Java for Hurd (and vice versa)

Contact information:

Introductions

I am a first year M.Sc. student in Computer Science at University of Strasbourg (France). My interests include capability-based security, programming languages and formal methods (in particular, object-capability languages and proof-carrying code).

Proposal summary

This project would consist in improving Java support on Hurd. The first part would consist in fixing bugs and porting Java-related packages. The second part would consist in creating low-level Java bindings for the Hurd interfaces, as well as libraries to make translator development easier.

Previous involvement

I started contributing to Hurd last summer, during which I participated to Google Summer of Code as a student for the Debian project. I worked on porting Debian-Installer to Hurd. This project was mostly a success, although we still have to use a special mirror for installation with a few modified packages and tweaked priorities to work around some uninstallable packages with Priority: standard.

Shortly afterwards, I rewrote the procfs translator to fix some issues with memory leaks, make it more reliable, and improve compatibility with Linux-based tools such as procps or htop.

Although I have not had as much time as I would have liked to dedicate to the Hurd since that time, I have continued to maintain the mirror in question, and I have started to work on implementing POSIX threads signal semantics in glibc.

Project-related skills and interests

I have used Java mostly for university assignments. This includes non-trivial projects using threads and distributed programming frameworks such as Java RMI or CORBA. I have also used it to experiment with Google App Engine (web applications) and Google Web Toolkit (a compiler from Java to Javascript which helps with AJAX code), and I have some limited experience with JNI (the Java Native Interface, to link Java with C code).

My knowledge of the Hurd and Debian GNU/Hurd is reasonable, as the Debian-Installer and procfs projects gave me the opportunity to fiddle with many parts of the system.

Initially, I started working on this project because I wanted to use Joe-E (a subset of Java) to investigate the potential applications of object-capability languages in a Hurd context. I also believe that improving Java support on Hurd would be an important milestone.

Organisational matters

I am subscribed to [email protected] and I do have a permanent internet connexion.

I would be able to attend the regular IRC meetings, and otherwise communicate with my mentor through any means they would prefer (though I expect email and IRC would be the most practical). Since I'm already familiar with the Hurd, I don't expect I would require too much time from them.

My exams end on May 20 so I would be able to start coding right at the beginning of the GSoC period. Next year's term would probably begin around September 15, so that would not be an issue either. I expect I would work around 40 hours per week, and my waking hours would be flexible.

I don't have any other plans for the summer and would not make any if my project were to be accepted.

Full disclosure: I also submitted a proposal to the Jikes RVM project (which is a research-oriented Java Virtual Machine, itself written in Java) for implementing a new garbage collector into the MMTk subsystem.

Improve Java support

Justification

Java is a popular language and platform used by many desktop and web applications (mostly on the server side). As a consequence, competitive Java support is important for any general-purpose operating system. Better Java support would also be a prerequisite for the second part of my proposal.

Current situation

Java is currently supported on Hurd with the GNU Java suite:

  • GCJ, the GNU Compiler for Java, is part of GCC and can compile Java source code to Java bytecode, and both source code and bytecode to native code;
  • libgcj is the implementation of the Java runtime which GCJ uses. It is based on GNU Classpath. It includes a bytecode interpreter which enables Java applications compiled to native code to dynamically load and execute Java bytecode from class files.
  • The gij command is a wrapper around the above-mentioned virtual machine functionality of libgcj and can be used as a replacement for the java command.

However, GCJ does not work flawlessly on Hurd.r For instance, some parts of libgcj relies on the POSIX threads signal semantics, which are not yet implemented. In particular, this makes ant hang waiting for child processes, which makes some packages fail to build on Hurd (“ant” is the “make” of the Java world).

Tasks

  • Finish implementing POSIX thread semantics in glibc (high priority). According to POSIX, signal dispositions should be global to a process, while signal blocking masks should be thread-specific. Signals sent to the process as a whole are to be delivered to any thread which does not block them. By contrast, Hurd has per-thread signal dispositions and signals sent to a process are delivered to the main thread only. I have been working on refactoring the glibc signal code and implementing the POSIX semantics as a per-thread option. However, due to lack of time I have not yet been able to test and debug my code properly. Finishing this work would be my first task.
  • Fix further problems with GCJ on Hurd (high priority). While I’m not aware of any other problems with GCJ at the moment, I suspect some might turn up as I progress with the other tasks. Fixing these problems would also be a high-priority task.
  • Port OpenJDK 6 (medium priority). While GCJ is fine, it is not yet 100% complete. It is also slower than OpenJDK on architectures where a just-in-time compiler is available. Porting OpenJDK would therefore improve Java support on Hurd in scope and quality. Besides, it would also be a good way to test GCJ, which is used for bootstrapping by the Debian OpenJDK packages. Also note that OpenJDK 6 is now the default Java Runtime Environment on all released Linux-based Debian architectures; bringing Hurd in line with this would probably be a good thing.
  • Port Eclipse and other Java applications (low priority). Eclipse is a popular, state-of-the-art IDE and tool suite used for Java and other languages. It is a dependency of the Joe-E verifier (see part 3 of this proposal). Porting Eclipse would be a good opportunity to test GCJ and OpenJDK.

Deliverables

  • The glibc pthreads patch and any other fixes on the Hurd side would be submitted upstream
  • Patches against Debian source packages required to make them build on Hurd would be submitted to the Debian bug tracking system.

Create Java bindings for the Hurd interfaces

Justification

Java is used for many applications and often taught to introduce object-oriented programming. The fact that Java is a garbage-collected language makes it easier to use, especially for the less experienced programmers. Besides, its object-oriented nature is a natural fit for the capability-based design of Hurd. The JVM is also used as a target for many other languages, all of which would benefit from the access provided by these bindings.

Advantages over other garbage-collected, object-oriented languages include performance, type safety and the possibility to compile a Java translator to native code and link it statically using GCJ, should anyone want to use a translator written in Java for booting. Note that Java is being used in this manner for embedded development. Since GCJ can take bytecode as its input, this expect this possibility would apply to any JVM-based language.

Java bindings would lower the bar for newcomers to begin experimenting with what makes Hurd unique without being faced right away with the complexity of low-level systems programming.

Tasks summary

  • Implement Java bindings for Mach
  • Implement a libports-like library for Java
  • Modify MIG to output Java code
  • Implement libfoofs-like Java libraries

Design principles

The principles I would use to guide the design of these Java bindings would be the following ones:

  • The system should be hooked into at a low level, to ensure that Java is a "first class citizen" as far as the access to the Hurd's interfaces is concerned.
  • At the same time, the memory safety of Java should be maintained and extended to Mach primitives such as port names and out-of-line memory regions.
  • Higher-level interfaces should be provided as well in order to make translator development as easy as possible.
  • A minimum amount of JNI code (ie. C code) should be used. Most of the system should be built using Java itself on top of a few low-level primitives.
  • Hurd objects would map to Java objects.
  • Using the same interfaces, objects corresponding to local ports would be accessed directly, and remote objects would be accessed over IPC.

One approach used previously to interface programming languages with the Hurd has been to create bindings for helper libraries such as libtrivfs. Instead, for Java I would like to take a lower-level approach by providing access to Mach primitives and extending MIG to generate Java code from the interface description files.

This approach would be initially more involved, and would introduces several issues related to overcoming the "impedance mismatch" between Java and Mach. However, once an initial implementation is done it would be easier to maintain in the long run and we would be able to provide Java bindings for a large percentage of the Hurd’s interfaces.

Bindings for Mach system calls

In this low-level approach, my intention is to enable Java code to use Mach system calls (in particular, mach_msg) more or less directly. This would ensure full access to the system from Java code, but it raises a number of issues:

  • the Java code must be able to manipulate Mach-level entities, such as port rights or page-aligned buffers mapped outside of the garbage-collected heap (for out-of-line transfers);
  • putting together IPC messages requires control of the low-level representation of data.

In order to address these concerns, classes would be encapsulating these low-level entities so that they can be referenced through normal, safe objects from standard Java code. Bindings for Mach system calls can then be provided in terms of these classes. Their implementation would use C code through the Java Native Interface (JNI).

More specifically, this functionality would be provided by the org.gnu.mach package, which would contain at least the following classes:

  • MachPort would encapsulate a mach_port_t. (Some of) its constructors would act as an interface for the mach_port_allocate() system call. MachPort objects would also be instantiated from other parts of the JNI C code to represent port rights received through IPC. The deallocate() method would call mach_port_deallocate() and replace the encapsulated port name with MACH_PORT_DEAD. We would recommend that users call it when a port is no longer used, but the finalizer would also deallocate the port when the MachPort object is garbage collected.
  • Buffer would represent a page-aligned buffer allocated outside of the Java heap, to be transferred (or having been received) as out-of-line memory. The JNI code would would provide methods to read and write data at an arbitrary offset (but within bounds) and would use vm_allocate() and vm_deallocate() in the same spirit as for MachPort objects.
  • Message would allow Java code to put together Mach messages. The constructor would allocate a byte[] member array of a given size. Additional methods would be provided to fill in or query the information in the message header and additional data items, including MachPort and Buffer objects which would be translated to the corresponding port names and out-of-line pointers. A global map from port names to the corresponding MachPort object would probably be needed to ensure that there is a one-to-one correspondence.
  • Syscall would provide static JNI methods for performing system calls not covered by the above classes, such as mach_msg() or mach_thread_self(). These methods would accept or return MachPort, Buffer and Message objects when appropriate. The associated C code would access the contents of such objects directly in order to perform the required unsafe operations, such as constructing MachPort and Buffer objects directly from port names and C pointers.

Note that careful consideration should be given to the interfaces of these classes to avoid “safety leaks” which would compromise the safety guarantees provided by Java. Potential problematic scenarios include the following examples:

  • It must not be possible to write an integer at some position in a Message object, and to read it back as a MachPort or Buffer object, since this would allow unsafe access to arbitrary memory addresses and mach port names.
  • Providing the mach_task_self() system call would also provide access to arbitrary addresses and ports by using the vm_* family of RPC operations with the returned MachPort object. This means that the relevant task operations should be provided by the Syscall class instead.

Finally, access should be provided to the initial ports and file descriptors in _hurd_ports and provided by the getdport() function, for instance through static methods such as getCRDir(), getCWDir(), getProc(), ... in a dedicated class such as org.gnu.hurd.InitPorts.

A realistic example of code based on such interfaces would be:

import org.gnu.mach.MsgType;
import org.gnu.mach.MachPort;
import org.gnu.mach.Buffer;
import org.gnu.mach.Message;
import org.gnu.mach.Syscall;
import org.gnu.hurd.InitPorts;

public class Hello
{   
   public static main(String argv[])
       /* Parent class for all Mach-related exceptions */
       throws org.gnu.mach.MachException
   {   
       /* Allocate a reply port */
       MachPort reply = new MachPort();

       /* Allocate an out-of-line buffer */
       Buffer data = new Buffer(MsgType.CHAR, 13);
       data.writeString(0, "Hello, World!");

       /* Craft an io_write message */
       Message msg = new Message(1024);
       msg.setRemotePort(InitPorts.getdport(1));
       msg.setLocalPort(reply, Message.Type.MAKE_SEND_ONCE);
       msg.setId(21000);
       msg.addBuffer(data);

       /* Make the call, MACH_MSG_SEND | MACH_MSG_RECEIVE */
       Syscall.machMsg(msg, true, true, reply);

       /* Extract the returned value */
       msg.assertId(21100);
       int retCode = msg.readInt(0);
       int amount = msg.readInt(1);
   }
}

Should this paradigm prove insufficient, more ideas could be borrowed from the org.vmmagic package used by Jikes RVM, a research Java virtual machine itself written in Java.

Generating Java stubs with MIG

Once the basic machinery is in place to interface with Mach, Java programs have more or less equal access to the system functionality without resorting to more JNI code. However, as illustrated above, this access is far from convenient.

As a solution I would modify MIG to add the option to output Java code. MIG would emit a Java interface, a client class able to implement the interface given a Mach port send right, an a server class which would be able to handle incoming messages. The class diagram below, although it is by no means complete or exempt of any problem, illustrates the general idea:

gsoc2011 classes.png

This structure is somewhat reminiscent of Java RMI or similar systems, which aim to provide more or less transparent access to remote objects. The exact way the Java code would be generated still needs to be determined, but basically:

  • An interface, corresponding to the header files generated by MIG, would enumerate the operations listed in a given .defs files. Method names would be transformed to adhere to Java conventions (for instance, some_random_identifier would become someRandomIdentifier).
  • A user class, corresponding to the *User.c files, would implement this interface by doing RPC over a given MachPort object.
  • A server class, corresponding to *Server.c, would be able to handle incoming messages using a user-provided implementation of the interface. (Possibly, a skeleton class providing methods which would raise NotImplementedExceptions would be provided as well. Users would derive from this class and override the relevant methods. This would allow them not to implement some operations, and would avoid pre-existing code from breaking when new operations are introduced.)

In order to help with the implementation of servers, some kind of library would be needed to associate Mach receive rights with server objects and to handle incoming messages on dedicated threads, in the spirit of libports. This would probably require support for port sets at the level of the Mach primitives described in the previous section.

When possible, operations involving the transmission of send rights of some kind would be expressed in terms of the MIG-generated interfaces instead of MachPort objects. Upon reception of a send right, a FooUser object would be created and associated with the corresponding MachPort object. If the received send right corresponds to a local port to which a server object has been associated, this object would be used instead. This way, subsequent operations on the received send right would be handled as direct method calls instead of going through RPC mechanisms.

Some issues will still need to be solved regarding how MIG will convert interface description files to Java interfaces. For instance:

  • .defs files are not explicitly associated with a type. For instance in the example above, MIG would have to somehow infer that io_t corresponds to this in the Io interface.
  • More generally, a correspondence between MIG and Java types would have to be determined. Ideally this would be automated and not hardcoded too much.
  • Initially, reply port parameters would be ignored. However they may be needed for some applications.

So the details would need to be flushed out during the community bonding period and as the implementation progresses. However I’m confident that a satisfactory solution can be designed.

Using these new features, the example above could be rewritten as:

import org.gnu.hurd.InitPorts;
import org.gnu.hurd.Io;
import org.gnu.hurd.IoUser;

class Hello {
   static void main(String argv[]) throws ...
   {
       Io stdout = new IoUser(InitPorts.getdport(1));
       String hello = “Hello, World!\n”;

       int amount = stdout.write(hello.getBytes(), -1);

       /* (A retCode corresponding to an error
           would be signalled as an exception.) */
   }
}

An example of server implementation would be:

import org.gnu.hurd.Io;
import java.util.Arrays;

class HelloIo implements Io {
   final byte[] contents = “Hello, World!\n”.getBytes();

   int write(byte[] data, int offset) {
       return SOME_ERROR_CODE;
   }

   byte[] read(int offset, int amount) {
       return Arrays.copyOfRange(contents, offset,
                                 offset + amount - 1);
   }

   /* ... */
}

A new server object could then be created with new IoServer(new HelloIo()), and associated with some receive right at the level of the ports management library.

Base classes for common types of translators

Once MIG can target Java code, and a libports equivalent is available, creating new translators in Java would be greatly facilitated. However, we would probably want to introduce basic implementations of file system translators in the spirit of libtrivfs or libnetfs. They could take the form of base classes implementing the relevant MIG-generated interfaces which would then be derived by users, or could define a simpler interface which would then be used by adapter classes to implement the required ones.

I would draw inspiration from libtrivfs and libnetfs to design and implement similar solutions for Java.

Deliverables

  • A hurd-java package would contain the Java code developed in the context of this project.
  • The Java code would be documented using javadoc and a tutorial for writing translators would be written as well.
  • Modifications to MIG would be submitted upstream, or a patched MIG package would be made available.

The Java libraries resulting from this work, including any MIG support classes as well as the class files built from the MIG-generated code for the Mach and Hurd interface definition files, would be provided as single hurd-java package for Debian GNU/Hurd. This package would be separate from both Hurd and Mach, so as not to impose unreasonable build dependencies on them.

I expect I would be able to act as its maintainer in the foreseeable future, either as an individual or as a part of the Hurd team. Hopefully, my code would be claimed by the Hurd project as their own, and consequently the modifications to MIG (which would at least conceptually depend on the Mach Java package) could be integrated upstream.

Since by design, the Java code would use only a small number of stable interfaces, it would not be subject to excessive amounts of bitrot. Consequently, maintenance would primarily consist in fixing bugs as they are reported, and adding new features as they are requested. A large number of such requests would mean the package is useful, so I expect that the overall amount of work would be correlated with the willingness of more people to help with maintenance should I become overwhelmed or get hit by a bus.

Timeline

The dates listed are deadlines for the associated tasks.

  • Community bonding period. Discuss, refine and complete the design of the Java bindings (in particular the MIG and "libports" parts)
  • May 23. Coding starts.
  • May 30. Finish implementing pthread signal semantics.
  • June 5. Port OpenJDK
  • June 12. Fix the remaining problems with GCJ and/or OpenJDK, possibly port Eclipse or other big Java packages.
  • June 19. Create the bindings for Mach.
  • June 26. Work on some kind of basic Java libports to handle receive rights.
  • July 3. Test, write some documentation and examples.
  • July 17 (two weeks). Add the Java target to MIG.
  • July 24. Test, write some documentation and examples.
  • August 7 (two weeks). Implement a modular libfoofs to help with translator development. Try to write a basic but non-trivial translator to evaluate the performance and ease of use of the result, rectify any rough edges this would uncover.
  • August 22. (last two weeks) Polish the code and packaging, finish writing the documentation.

Conclusion

This project is arguably ambitious. However, I have been thinking about it for some time now and I'm confident I would be able to accomplish most of it.

In the event multiple language bindings projects would be accepted, some work could probably be done in common. In particular, ArneBab seems to favor a low-level approach for his Python bindings as I do for Java, and I would be happy to discuss API design and coordinate MIG changes with him. I would also have an extra month after the end of the GSoC period before I go back to school, which I would be able to use to finish the project if there is some remaining work. (Last year's rewrite of procfs was done during this period.)

As for the project's benefits, I believe that good support for Java is a must-have for the Hurd. Java bindings would also further the Hurd's agenda of user freedom by extending this freedom to more people: I expect the set of developers who would be able to write Java code against a well-written libfoofs is much larger than those who master the intricacies of low-level systems C programming. From a more strategic point of view, this would also help recruit new contributors by providing an easier path to learning the inner workings of the Hurd.

Further developments which would build on the results of this project include my planned experiment with Joe-E (which I would possibly take on as a university project next year). Another possibility would be to reimplement some parts of the Java standard library directly in terms of the Hurd interfaces instead of using the POSIX ones through glibc. This would possibly improve the performance of some Java applications (though probably not by much), and would otherwise be a good project for someone trying to get acquainted with Hurd.

Overall, I believe this project would be fun, interesting and useful. I hope that you will share this sentiment and give me the opportunity to spend another summer working on Hurd.