HPL Benchmark For Single Processor Machines

Disclaimer

Above given instructions are what I used to get HPL compiled/running and these may very well work for you. However, please note that you are using these instructions at your very own risk and this website, sgowtham.com, is not responsible for any/all damage caused to your property, intellectual or otherwise.

What is HPL?

Citing from the website:

HPL is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. It can thus be regarded as a portable as well as freely available implementation of the High Performance Computing Linpack Benchmark.

HPL solves an N order system of the form Ax = b by determining the LU factorization: Ab = LUy. The solution, x, is computed by Ux = y. Matrix is divided into NB x NB blocks that are cyclically dealt onto a P x Q grid with a block-cyclic layout.

Why HPL?

Citing Loreto et al. of MIT, a (super) computer must solve a (random) dense system of linear equations, as prescribed by the Linpack Benchmark, to be be ranked in the Top 500 List. Linpack provides a performance measure in GFlops (billions of floating point operations per second) that can be/is used for ranking. Although there are different implementations of this Linpack Benchmark, HPL (High Performance Linpack) is probably the most popular one. It’s included, by default, in NPACI Rocks Cluster Software.

Why HPL on Single Processor Machines?

Our research group has a beowulf linux cluster apart from several single processor linux workstations. As these single processor workstations (which are about 5+ years old) are being replaced with state of the art machines soon, I thought of keeping a record – of how good the old machines are and how much better the new ones will be; and more importantly, how do these old/new machines fare when compared to our computing powerhouse. Details regarding one such machines, selected randomly for this task, are summarized below:
System Details

HPL Pre-Requisite #0: A Message Passing Interface

Message Passing Interface (or MPI, as is better known to mankind) standard defines a software library used to turn serial applications into parallel ones that can run on distributed memory systems. Typically these systems are clusters of servers or networks of workstations. The standard was created by the MPI Forum in 1994 and is now the de facto standard for parallel programming. Following this definition, it is quite natural to ask – Why would any one need MPI to run a program on a single processor machine? Well, running a serial program can be thought of as running the parallel version of the same program with one processor – makes it easier to generalize the idea, if need be, to M processors.

Not surprisingly, there are several different implementations of MPI but due to familiarity of compilation and usage, I decided to stick with MPICH1 – a freely available, portable implementation from Argonne National Labs. Following steps detail the the compilation of MPICH1 against GNU compilers – which come installed by default on most linux distributions.

Download the tar ball from MPICH1 Download page and save it on Desktop as $HOME/Desktop/mpich.tar.gz

cd $HOME
mkdir -p programs/mpich-1.2.7p1
cd /tmp/
tar -zxvpf $HOME/Desktop/mpich.tar.gz
cd mpich-1.2.7p1
export CC=gcc
export CXX=g++
export FC=g77
export F77=g77
export F90=g77
export RSHCOMMAND=ssh

Run the configure command with following options:
./configure --prefix=$HOME/programs/mpich-1.2.7p1 --enable-f77 --enable-f90modules
./configure --prefix=$HOME/programs/mpich-1.2.7p1 --enable-f77 --enable-f90modules
Run make and make install commands:
make make install
make make install
Add the following lines to $HOME/.bashrc:
1 2 3 4
# MPICH 1.2.7p1 Settings export MPI="${HOME}/programs/mpich-1.2.7p1" export PATH="${MPI}/bin:${PATH}" export LD_LIBRARY_PATH="${MPI}/lib:${LD_LIBRARY_PATH}"
# MPICH 1.2.7p1 Settings export MPI="${HOME}/programs/mpich-1.2.7p1" export PATH="${MPI}/bin:${PATH}" export LD_LIBRARY_PATH="${MPI}/lib:${LD_LIBRARY_PATH}"
Source the file:
. $HOME/.bashrc
. $HOME/.bashrc

HPL Pre-Requisite #1: Basic Linear Algebra Subprograms

Not surprisingly again, there are several different packages that provide these Basic Linear Algebra Subprograms (BLAS) but owing to simplicity of compilation, I use ATLAS (Automatically Tuned Linear Algebra Software). Following steps detail the the compilation of ATLAS against GNU compilers – which come installed by default on most linux distributions.

Download the tar ball from ATLAS Download page and save it on Desktop as $HOME/Desktop/atlas3.6.0.tar.gz
Execute the following commands
cd $HOME/programs tar -zxvpf $HOME/Desktop/atlas3.6.0.tar.gz mv atlas3.6.0 atlas-3.6.0 cd atlas-3.6.0 make
cd $HOME/programs tar -zxvpf $HOME/Desktop/atlas3.6.0.tar.gz mv atlas3.6.0 atlas-3.6.0 cd atlas-3.6.0 make
and follow the on-screen instructions. I just selected the default answers/options for all questions.
Run the make install command.
make install arch=Linux_P4SSE2
make install arch=Linux_P4SSE2
You will need to check the option for arch though.
Add the following lines to $HOME/.bashrc:
1 2 3
# ATLAS 3.6.0 Settings export ATLAS="${HOME}/programs/atlas-3.6.0" export LD_LIBRARY_PATH="${ATLAS}/lib/Linux_P4SSE2"
# ATLAS 3.6.0 Settings export ATLAS="${HOME}/programs/atlas-3.6.0" export LD_LIBRARY_PATH="${ATLAS}/lib/Linux_P4SSE2"
Source the file:
. $HOME/.bashrc
. $HOME/.bashrc

HPL Compilation

Assuming that the pre-requisites are successfully compiled/installed, compiling HPL itself is quite easy. Following steps detail the the compilation of HPL – using MPICH1 and ATLAS.

Download the tar ball from HPL Download page and save it on Desktop as $HOME/Desktop/hpl.tgz

cd $HOME/programs
tar -zxvpf $HOME/Desktop/hpl.tgz
cd hpl

If MPICH1 and ATLAS were compiled as described before, you may place this file as $HOME/programs/hpl/Make.Linux_P4SSE2
Read through this file and edit it if necessary.
make clean arch=Linux_P4SSE2
make clean arch=Linux_P4SSE2
make arch=Linux_P4SSE2
make arch=Linux_P4SSE2
Add the following lines to $HOME/.bashrc:
1 2 3 4
# HPL Settings export HPL="${HOME}/programs/hpl" export PATH="${HPL}/bin/Linux_P4SSE2:${PATH}" export LD_LIBRARY_PATH="${HPL}/lib/Linux_P4SSE2:${LD_LIBRARY_PATH}"
# HPL Settings export HPL="${HOME}/programs/hpl" export PATH="${HPL}/bin/Linux_P4SSE2:${PATH}" export LD_LIBRARY_PATH="${HPL}/lib/Linux_P4SSE2:${LD_LIBRARY_PATH}"
Source the file:
. $HOME/.bashrc
. $HOME/.bashrc

Preparing The Input – HPL.dat

A very detailed description of each input parameter is given here and the copy of HPL.dat I used to run the program may be found here. N is the problem size (N x N system of linear equations) and this parameter is limited by the memory. NB defines the block size; P, Q denote the processors arranged in a P x Q grid and so on.

Running HPL

cd $HOME
mkdir -p test_runs/HPL
cd test_runs/HPL
cp $HOME/Desktop/HPL.dat .
mpirun -np 1 $HOME/programs/hpl/bin/Linux_P4SSE2/xhpl 2&gt;&amp;1 | tee HPL.out

It took me several attempts to figure out the combination of N and NB that would successfully run.

HPL Output

After solving the N x N system of linear equations, HPL checks the solution by computing 3 residuals that must all be within a threshold for the results to be considered valid. From the output file, I find that the calculation took 326.94 seconds and gave a performance of 1.121 GFlops. As can be seen from the output, all the residual tests were successfully passed.
HPL Output

Complete output file may be downloaded from here.

Request

The test runs I did are neither complete nor exhaustive. If you have better ideas, tips, tricks, approaches and/or understanding of HPL (or associated programs), please post them as comments using the form below. I would greatly appreciate them, as would many other readers.