Disclaimer
Above given instructions are what I used to get HPL compiled/running and these may very well work for you. However, please note that you are using these instructions at your very own risk and this website, sgowtham.com, is not responsible for any/all damage caused to your property, intellectual or otherwise.
What is HPL?
Citing from the website:
HPL is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. It can thus be regarded as a portable as well as freely available implementation of the High Performance Computing Linpack Benchmark.
HPL solves an N order system of the form Ax = b by determining the LU factorization: Ab = LUy. The solution, x, is computed by Ux = y. Matrix is divided into NB x NB blocks that are cyclically dealt onto a P x Q grid with a block-cyclic layout.
Why HPL?
Citing Loreto et al. of MIT, a (super) computer must solve a (random) dense system of linear equations, as prescribed by the Linpack Benchmark, to be be ranked in the Top 500 List. Linpack provides a performance measure in GFlops (billions of floating point operations per second) that can be/is used for ranking. Although there are different implementations of this Linpack Benchmark, HPL (High Performance Linpack) is probably the most popular one. It’s included, by default, in NPACI Rocks Cluster Software.
Why HPL on Single Processor Machines?
Our research group has a beowulf linux cluster apart from several single processor linux workstations. As these single processor workstations (which are about 5+ years old) are being replaced with state of the art machines soon, I thought of keeping a record – of how good the old machines are and how much better the new ones will be; and more importantly, how do these old/new machines fare when compared to our computing powerhouse. Details regarding one such machines, selected randomly for this task, are summarized below:
HPL Pre-Requisite #0: A Message Passing Interface
Message Passing Interface (or MPI, as is better known to mankind) standard defines a software library used to turn serial applications into parallel ones that can run on distributed memory systems. Typically these systems are clusters of servers or networks of workstations. The standard was created by the MPI Forum in 1994 and is now the de facto standard for parallel programming. Following this definition, it is quite natural to ask – Why would any one need MPI to run a program on a single processor machine? Well, running a serial program can be thought of as running the parallel version of the same program with one processor – makes it easier to generalize the idea, if need be, to M processors.
Not surprisingly, there are several different implementations of MPI but due to familiarity of compilation and usage, I decided to stick with MPICH1 – a freely available, portable implementation from Argonne National Labs. Following steps detail the the compilation of MPICH1 against GNU compilers – which come installed by default on most linux distributions.
- Download the tar ball from MPICH1 Download page and save it on Desktop as
$HOME/Desktop/mpich.tar.gz
-
cd $HOME mkdir -p programs/mpich-1.2.7p1 cd /tmp/ tar -zxvpf $HOME/Desktop/mpich.tar.gz cd mpich-1.2.7p1 export CC=gcc export CXX=g++ export FC=g77 export F77=g77 export F90=g77 export RSHCOMMAND=ssh
- Run the
configure
command with following options:./configure --prefix=$HOME/programs/mpich-1.2.7p1 --enable-f77 --enable-f90modules
- Run
make
andmake install
commands:make make install
- Add the following lines to
$HOME/.bashrc
:1 2 3 4
# MPICH 1.2.7p1 Settings export MPI="${HOME}/programs/mpich-1.2.7p1" export PATH="${MPI}/bin:${PATH}" export LD_LIBRARY_PATH="${MPI}/lib:${LD_LIBRARY_PATH}"
- Source the file:
. $HOME/.bashrc
HPL Pre-Requisite #1: Basic Linear Algebra Subprograms
Not surprisingly again, there are several different packages that provide these Basic Linear Algebra Subprograms (BLAS) but owing to simplicity of compilation, I use ATLAS (Automatically Tuned Linear Algebra Software). Following steps detail the the compilation of ATLAS against GNU compilers – which come installed by default on most linux distributions.
- Download the tar ball from ATLAS Download page and save it on Desktop as
$HOME/Desktop/atlas3.6.0.tar.gz
- Execute the following commands
cd $HOME/programs tar -zxvpf $HOME/Desktop/atlas3.6.0.tar.gz mv atlas3.6.0 atlas-3.6.0 cd atlas-3.6.0 make
and follow the on-screen instructions. I just selected the default answers/options for all questions.
- Run the
make install
command.make install arch=Linux_P4SSE2
You will need to check the option for arch though.
- Add the following lines to
$HOME/.bashrc
:1 2 3
# ATLAS 3.6.0 Settings export ATLAS="${HOME}/programs/atlas-3.6.0" export LD_LIBRARY_PATH="${ATLAS}/lib/Linux_P4SSE2"
- Source the file:
. $HOME/.bashrc
HPL Compilation
Assuming that the pre-requisites are successfully compiled/installed, compiling HPL itself is quite easy. Following steps detail the the compilation of HPL – using MPICH1 and ATLAS.
- Download the tar ball from HPL Download page and save it on Desktop as
$HOME/Desktop/hpl.tgz
-
cd $HOME/programs tar -zxvpf $HOME/Desktop/hpl.tgz cd hpl
- If MPICH1 and ATLAS were compiled as described before, you may place this file as
$HOME/programs/hpl/Make.Linux_P4SSE2
- Read through this file and edit it if necessary.
-
make clean arch=Linux_P4SSE2
-
make arch=Linux_P4SSE2
- Add the following lines to
$HOME/.bashrc
:1 2 3 4
# HPL Settings export HPL="${HOME}/programs/hpl" export PATH="${HPL}/bin/Linux_P4SSE2:${PATH}" export LD_LIBRARY_PATH="${HPL}/lib/Linux_P4SSE2:${LD_LIBRARY_PATH}"
- Source the file:
. $HOME/.bashrc
Preparing The Input – HPL.dat
A very detailed description of each input parameter is given here and the copy of HPL.dat I used to run the program may be found here. N is the problem size (N x N system of linear equations) and this parameter is limited by the memory. NB defines the block size; P, Q denote the processors arranged in a P x Q grid and so on.
Running HPL
cd $HOME mkdir -p test_runs/HPL cd test_runs/HPL cp $HOME/Desktop/HPL.dat . mpirun -np 1 $HOME/programs/hpl/bin/Linux_P4SSE2/xhpl 2>&1 | tee HPL.out |
It took me several attempts to figure out the combination of N and NB that would successfully run.
HPL Output
After solving the N x N system of linear equations, HPL checks the solution by computing 3 residuals that must all be within a threshold for the results to be considered valid. From the output file, I find that the calculation took 326.94 seconds and gave a performance of 1.121 GFlops. As can be seen from the output, all the residual tests were successfully passed.
Complete output file may be downloaded from here.
Request
The test runs I did are neither complete nor exhaustive. If you have better ideas, tips, tricks, approaches and/or understanding of HPL (or associated programs), please post them as comments using the form below. I would greatly appreciate them, as would many other readers.
Just wanted to mention that this helped me a lot in getting HPL working on my Lemote Fulong this past week. It took a bit of wrangling with configurations to get things happy, but once I got things working I was able to get some interesting numbers out of HPL.
Hey, I don’t know if you still read this. I am just teaching myself on how to build a beowulf cluster blah blah.
One thing I want to add to compiling HPL is that it is important in which order you link the libraries in.
If you don’t link them in the right order (which is -lf77blas.a -lcblas.a -latlas.a for me),
you’ll end up getting errors – at least I did.
Philipp.
@Philipp:
Thanks for the information about ordering of libraries during compilation process :)