Wednesday, July 16, 2014

How to install OpenMPI-Java

I have been trying out message passing frameworks for Java that can be used in HPC clusters. In this blog, I’m trying to provide installation instructions to quickly setup and try out Open MPI Java in a Linux environment.

Pre-Requests:

  • Build essentials
  • gcc

Installation Steps:

  1. Create a directory which you want to install openmpi

           $mkdir /home/charith/software/openmpi-build
    3. Extract downloaded gzipped file and change into the extracted directory  
          $tar -xvvzf openmpi-1.8.1.tar.gz
          $cd openmpi-1.8.1
    4. Configure the build environment with java enabled, using the following command
         $./configure --enable-mpi-java --with-jdk-bindir="path to java bin directory " --with-jdk-headers="path to the java directroy which have jni.h" --prefix="Path to installation directory"

         Example:
        $./configure --enable-mpi-java --with-jdk-bindir=/home/charith/software/jdk1.6.0_31/bin --with-jdk-headers=/home/charith/software/jdk1.6.0_31/include --prefix=/home/charith/software/openmpi-build

     5. Compile and install OpenMPI
          $make all install

Now you are done with the installation. You should be able to find mpi.jar which contains compile time dependencies to compile MPI Java programs in openmpi-build/lib directory.

Compiling and Running a OpenMPI Java Program

You should be able to find some example MPI Java programs in the extracted openmpi-1.8.1/examples directory. Hello.java is one such example.

To compile the program

$javac -cp "path to mpi.jar" Hello.java

To run the program you can use mpirun command. (Do not forget to add the openmpi-build/bin directory to your PATH)
$mpirun -np 5 java Hello   
            

Friday, May 30, 2014

GoFFish : A Sub-Graph Centric Framework For Large Scale Graph Processing


It's been a long time since my last blog post. I wanted to write this blog for some time, but never got a chance to compete. Last year I was a part of a team worked on building a platform to perform  large scale distributed graph processing: GoFFish. In this blog, I am trying to give a small overview about GoFFish and its programming model.

After the Google MapReduce paper and induction of Hadoop there was a search for simple programming models and analytics tools for big data processing. Graph structured data takes over a significant portion of large scale data we are seeing present day. I would say any big data problem worth looking at are graph related. :)

Map Reduce model only works well with data with minimal interdependencies. Where graph structured data occupies the complete opposite end of spectrum.  Google pregel paper introduced a new simple programming model for graph processing, addressing this shortcomings of Hadoop. It's generally known as the vertex centric programming model in which programmer is forced to think as a vertex in the graph. The program logic is written thinking its inside a vertex and executed in parallel at each vertex. 


The vertex centric programming model is a message passing based programming model. Each vertex can send messages to its neighbors or to any known vertex id.  Each vertex will receive messages sent to its vertex it. The vertex centric programming model is an extension to Bulk Synchronous Parallel (BSP)  parallel programming model. BSP can be thought as an abstract computer with multiple processors which does work in parallel. The execution of a program in done in iterations. In each iteration, each processor will

  • Receive messages sent to it in the previous iteration
  • Process the messages and execute the user logic
  • Sent messages to other processors if needed
  • Wait for all other processors to finish the iteration

In the BSP model these iterations are called supersteps.  Even though the vertex centric programming model is very simple given the fact that, at vertex level the work we can do is so minimal the communication to computation ratio of this model is high. This makes it not suitable for some classes of algorithms like betweenness centrality.

In GoFFish we introduced a subgraph centric programming model instead of a vertex centric model and demonstrated that it can give huge performance improvement over vertex centric model for some class of graph algorithms.

I did a talk at Southern California Fall workshop 2013 at UCLA on this work and following are the slides




You can find a more detailed set of slides on  subgraph centric single source shortest path below.




This work was accepted at EuroPar 2014 and More detailed technical report can be downloaded from here.


GoFFish is a free and open source project and can be downloaded from here.