Performance of Java matrix math libraries

Question

We are computing something whose runtime is bound by matrix operations    Some details below if interested   This experience prompted the following question   Do folk have experience with the performance of Java libraries for matrix math  e g   multiply  inverse  etc     For example    JAMA COLT Apache commons math   I searched and found nothing     Details of our speed comparison   We are using Intel FORTRAN  ifort  IFORT  10 1 20070913    We have reimplemented it in Java  1 6  using Apache commons math 1 2 matrix ops  and it agrees to all of its digits of accuracy    We have reasons for wanting it in Java     Java doubles  Fortran real 8    Fortran  6 minutes  Java 33 minutes  same machine   jvisualm profiling shows much time spent in RealMatrixImpl  getEntry isValidCoordinate   which appear to be gone in unreleased Apache commons math 2 0  but 2 0 is no faster    Fortran is using Atlas BLAS routines  dpotrf  etc     Obviously this could depend on our code in each language  but we believe most of the time is in equivalent matrix operations   In several other computations that do not involve libraries  Java has not been much slower  and sometimes much faster

User · Answer

I m the main author of jblas and wanted to point out that I ve released Version 1 0 in late December 2009  I worked a lot on the packaging  meaning that you can now just download a  fat jar  with ATLAS and JNI libraries for Windows  Linux  Mac OS X  32 and 64 bit  except for Windows   This way you will get the native performance just by adding the jar file to your classpath  Check it out at http   jblas org

User · Answer

Linalg code that relies heavily on Pentiums and later processors  vector computing capabilities  starting with the MMX extensions  like LAPACK and now Atlas BLAS  is not  fantastically optimized   but simply industry-standard  To replicate that perfomance in Java you are going to need native libraries  I have had the same performance problem as you describe  mainly  to be able to compute Choleski decompositions  and have found nothing really efficient  Jama is pure Java  since it is supposed to be just a template and reference kit for implementers to follow    which never happened  You know Apache math commons    As for COLT  I have still to test it but it seems to rely heavily on Ninja improvements  most of which were reached by building an ad-hoc Java compiler  so I doubt it s going to help  At that point  I think we  just  need a collective effort to build a native Jama implementation

User · Answer

For 3d graphics applications the lwjgl util vector implementation out-performed above mentioned jblas by a factor of about 3   I have done 1 million matrix multiplications of a vec4 with a 4x4 matrix   lwjgl finished in about 18ms  jblas required about 60ms    I assume  that the JNI approach is not very suitable for fast successive application of relatively small multiplications  Since the translation mapping may take more time than the actual execution of the multiplication

User · Answer

We have used COLT for some pretty large serious financial calculations and have been very happy with it  In our heavily profiled code we have almost never had to replace a COLT implementation with one of our own   In their own testing  obviously not independent  I think they claim within a factor of 2 of the Intel hand-optimised assembler routines  The trick to using it well is making sure that you understand their design philosophy  and avoid extraneous object allocation

User · Answer

There s also UJMP

User · Answer

I m the author of Java Matrix Benchmark  JMatBench  and I ll give my thoughts on this discussion   There are significant difference between Java libraries and while there is no clear winner across the whole range of operations  there are a few clear leaders as can be seen in the latest performance results  October 2013    If you are working with  large  matrices and can use native libraries  then the clear winner  about 3 5x faster  is MTJ with system optimised netlib  If you need a pure Java solution then MTJ  OjAlgo  EJML and Parallel Colt are good choices   For small matrices EJML is the clear winner   The libraries I did not mention showed significant performance issues or were missing key features

User · Answer

You should add Apache Mahout to your shopping list

User · Answer

Have you taken a look at the Intel Math Kernel Library  It claims to outperform even ATLAS  MKL can be used in Java through JNI wrappers

User · Answer

There are many different freely available java linear algebra libraries  http   www ujmp org java-matrix benchmark  Unfortunately that benchmark only gives you info about matrix multiplication  with transposing the test does not allow the different libraries to exploit their respective design features    What you should look at is how these linear algebra libraries perform when asked to compute various matrix decompositions  http   ojalgo org matrix compare html

User · Answer

Matrix Tookits Java  MTJ  was already mentioned before  but perhaps it s worth mentioning again for anyone else stumbling onto this thread  For those interested  it seems like there s also talk about having MTJ replace the linalg library in the apache commons math 2 0  though I m not sure how that s progressing lately

User · Answer

I just compared Apache Commons Math with jlapack   Test  singular value decomposition of a random 1024x1024 matrix   Machine  Intel R  Core TM 2 Duo CPU     E6750    2 66GHz  linux x64  Octave code  A rand 1024   tic  U S V  svd A  toc   results                                execution time --------------------------------------------------------- Octave                                 36 34 sec  JDK 1 7u2 64bit     jlapack dgesvd                     37 78 sec     apache commons math SVD            42 24 sec   JDK 1 6u30 64bit     jlapack dgesvd                     48 68 sec     apache commons math SVD            50 59 sec  Native routines Lapack  invoked from C                 37 64 sec Intel MKL                               6 89 sec      My conclusion is that jlapack called from JDK 1 7 is very close to the native binary performance of lapack  I used the lapack binary library coming with linux distro and invoked the dgesvd routine to get the U S and VT matrices as well  All tests were done using double precision on exactly the same matrix each run  except Octave    Disclaimer - I m not an expert in linear algebra  not affiliated to any of the libraries above and this is not a rigorous benchmark  It s a  home-made  test  as I was interested comparing the performance increase of JDK 1 7 to 1 6 as well as commons math SVD to jlapack

User · Answer

You may want to check out the jblas project  It s a relatively new Java library that uses BLAS  LAPACK and ATLAS for high-performance matrix operations    The developer has posted some benchmarks in which jblas comes off favourably against MTJ and Colt

User · Answer

Jeigen https   github com hughperkins jeigen    wraps Eigen C   library http   eigen tuxfamily org   which is one of the fastest free C   libraries available relatively terse syntax  eg  mmul    sub  handles both dense and sparse matrices   A quick test  by multiplying two dense matrices  ie   import static jeigen MatrixUtil     int K   100  int N   100000  DenseMatrix A   rand N  K   DenseMatrix B   rand K  N   Timer timer   new Timer    DenseMatrix C   B mmul A   timer printTimeCheckMilliseconds      Results   Jama  4090 ms Jblas  1594 ms Ojalgo  2381 ms  using two threads  Jeigen  2514 ms    Compared to jama  everything is faster  -P Compared to jblas  Jeigen is not quite as fast  but it handles sparse matrices  Compared to ojalgo  Jeigen takes about the same amount of elapsed time  but only using one core  so Jeigen uses half the total cpu   Jeigen has a terser syntax  ie  mmul  versus  multiplyRight

User · Answer

Building on Varkhan s post that Pentium-specific native code would do better    jBLAS  An alpha-stage project with JNI wrappers for Atlas  http   www jblas org    Author s blog post  http   mikiobraun blogspot com 2008 10 matrices-jni-directbuffers-and-number html   MTJ  Another such project  http   code google com p matrix-toolkits-java

User · Answer

There s a benchmark of various matrix packages available in java up on http   code google com p java-matrix-benchmark  for a few different hardware configurations  But it s no substitute for doing your own benchmark   Performance is going to vary with the type of hardware you ve got  cpu  cores  memory  L1-3 cache  bus speed   the size of the matrices and the algorithms you intend to use   Different libraries have different takes on concurrency for different algorithms  so there s no single answer   You may also find that the overhead of translating to the form expected by a native library negates the performance advantage for your use case  some of the java libraries have more flexible options regarding matrix storage  which can be used for further performance optimizations    Generally though  JAMA  Jampack and COLT are getting old  and do not represent the state of the current performance available in Java for linear algebra   More modern libraries make more effective use of multiple cores and cpu caches   JAMA was a reference implementation  and pretty much implements textbook algorithms with little regard to performance   COLT and IBM Ninja were the first java libraries to show that performance was possible in java  even if they lagged 50  behind native libraries

User · Answer

I m the author of la4j  Linear Algebra for Java  library and here is my point  I ve been working on la4j for 3 years  the latest release is 0 4 0  01 Jun 2013   and only now I can start doing performace analysis and optimizations since I ve just covered the minimal required functional  So  la4j isn t as fast as I wanted but I m spending loads of my time to change it   I m currently in the middle of porting new version of la4j to JMatBench platform  I hope new version will show better performance then previous one since there are several improvement I made in la4j such as much faster internal matrix format  unsafe accessors and fast blocking algorithm for matrix multiplications

User · Answer

Just to add my 2 cents  I ve compared some of these libraries  I attempted to matrix multiply a 3000 by 3000 matrix of doubles with itself  The results are as follows   Using multithreaded ATLAS with C C    Octave  Python and R  the time taken was around 4 seconds   Using Jama with Java  the time taken was 50 seconds   Using Colt and Parallel Colt with Java  the time taken was 150 seconds   Using JBLAS with Java  the time taken was again around 4 seconds as JBLAS uses multithreaded ATLAS   So for me it was clear that the Java libraries didn t perform too well  However if someone has to code in Java  then the best option is JBLAS  Jama  Colt and Parallel Colt are not fast

User · Answer

I can t really comment on specific libraries  but in principle there s little reason for such operations to be slower in Java  Hotspot generally does the kinds of things you d expect a compiler to do  it compiles basic math operations on Java variables to corresponding machine instructions  it uses SSE instructions  but only one per operation   accesses to elements of an array are compiled to use  raw  MOV instructions as you d expect  it makes decisions on how to allocate variables to registers when it can  it re-orders instructions to take advantage of processor architecture    A possible exception is that as I mentioned  Hotspot will only perform one operation per SSE instruction  in principle you could have a fantastically optimised matrix library that performed multiple operations per instruction  although I don t know if  say  your particular FORTRAN library does so or if such a library even exists  If it does  there s currently no way for Java  or at least  Hotspot  to compete with that  though you could of course write your own native library with those optimisations to call from Java    So what does all this mean  Well    in principle  it is worth hunting around for a better-performing library  though unfortunately I can t recomend one if performance is really critical to you  I would consider just coding your own matrix operations  because you may then be able perform certain optimisations that a library generally can t  or that a particular library your using doesn t  if you have a multiprocessor machine  find out if the library is actually multithreaded    A hindrance to matrix operations is often data locality issues that arise when you need to traverse both row by row and column by column  e g  in matrix multiplication  since you have to store the data in an order that optimises one or the other  But if you hand-write the code  you can sometimes combine operations to optimise data locality  e g  if you re multiplying a matrix by its transformation  you can turn a column traversal into a row traversal if you write a dedicated function instead of combining two library functions   As usual in life  a library will give you non-optimal performance in exchange for faster development  you need to decide just how important performance is to you

User · Answer

I have found that if you are creating a lot of high dimensional Matrices  you can make Jama about 20  faster if you change it to use a single dimensional array instead of a two dimensional array   This is because Java doesn t support multi-dimensional arrays as efficiently  ie  it creates an array of arrays   Colt does this already  but I have found it is more complicated and more powerful than Jama which may explain why simple functions are slower with Colt   The answer really depends on that you are doing   Jama doesn t support a fraction of the things Colt can do which make make more of a difference

[java] Performance of Java matrix math libraries?

Examples related to java

Examples related to math

Examples related to matrix

Examples related to performance