HPL Benchmark For Single Processor Machines


Above given instructions are what I used to get HPL compiled/running and these may very well work for you. However, please note that you are using these instructions at your very own risk and this website, sgowtham.com, is not responsible for any/all damage caused to your property, intellectual or otherwise.

What is HPL?

Citing from the website:

HPL is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. It can thus be regarded as a portable as well as freely available implementation of the High Performance Computing Linpack Benchmark.

HPL solves an N order system of the form Ax = b by determining the LU factorization: Ab = LUy. The solution, x, is computed by Ux = y. Matrix is divided into NB x NB blocks that are cyclically dealt onto a P x Q grid with a block-cyclic layout.

Why HPL?

Citing Loreto et al. of MIT, a (super) computer must solve a (random) dense system of linear equations, as prescribed by the Linpack Benchmark, to be be ranked in the Top 500 List. Linpack provides a performance measure in GFlops (billions of floating point operations per second) that can be/is used for ranking. Although there are different implementations of this Linpack Benchmark, HPL (High Performance Linpack) is probably the most popular one. It’s included, by default, in NPACI Rocks Cluster Software.

Why HPL on Single Processor Machines?

Our research group has a beowulf linux cluster apart from several single processor linux workstations. As these single processor workstations (which are about 5+ years old) are being replaced with state of the art machines soon, I thought of keeping a record – of how good the old machines are and how much better the new ones will be; and more importantly, how do these old/new machines fare when compared to our computing powerhouse. Details regarding one such machines, selected randomly for this task, are summarized below:
System Details

HPL Pre-Requisite #0: A Message Passing Interface

Message Passing Interface (or MPI, as is better known to mankind) standard defines a software library used to turn serial applications into parallel ones that can run on distributed memory systems. Typically these systems are clusters of servers or networks of workstations. The standard was created by the MPI Forum in 1994 and is now the de facto standard for parallel programming. Following this definition, it is quite natural to ask – Why would any one need MPI to run a program on a single processor machine? Well, running a serial program can be thought of as running the parallel version of the same program with one processor – makes it easier to generalize the idea, if need be, to M processors.

Not surprisingly, there are several different implementations of MPI but due to familiarity of compilation and usage, I decided to stick with MPICH1 – a freely available, portable implementation from Argonne National Labs. Following steps detail the the compilation of MPICH1 against GNU compilers – which come installed by default on most linux distributions.

  1. Download the tar ball from MPICH1 Download page and save it on Desktop as $HOME/Desktop/mpich.tar.gz
  2. cd $HOME
    mkdir -p programs/mpich-1.2.7p1
    cd /tmp/
    tar -zxvpf $HOME/Desktop/mpich.tar.gz
    cd mpich-1.2.7p1
    export CC=gcc
    export CXX=g++
    export FC=g77
    export F77=g77
    export F90=g77
    export RSHCOMMAND=ssh
  3. Run the configure command with following options:
    ./configure --prefix=$HOME/programs/mpich-1.2.7p1 --enable-f77 --enable-f90modules
  4. Run make and make install commands:
    make install
  5. Add the following lines to $HOME/.bashrc:
    # MPICH 1.2.7p1 Settings
    export MPI="${HOME}/programs/mpich-1.2.7p1"
    export PATH="${MPI}/bin:${PATH}"
    export LD_LIBRARY_PATH="${MPI}/lib:${LD_LIBRARY_PATH}"
  6. Source the file:
    . $HOME/.bashrc

HPL Pre-Requisite #1: Basic Linear Algebra Subprograms

Not surprisingly again, there are several different packages that provide these Basic Linear Algebra Subprograms (BLAS) but owing to simplicity of compilation, I use ATLAS (Automatically Tuned Linear Algebra Software). Following steps detail the the compilation of ATLAS against GNU compilers – which come installed by default on most linux distributions.

  1. Download the tar ball from ATLAS Download page and save it on Desktop as $HOME/Desktop/atlas3.6.0.tar.gz
  2. Execute the following commands
    cd $HOME/programs
    tar -zxvpf $HOME/Desktop/atlas3.6.0.tar.gz
    mv atlas3.6.0 atlas-3.6.0
    cd atlas-3.6.0

    and follow the on-screen instructions. I just selected the default answers/options for all questions.

  3. Run the make install command.
    make install arch=Linux_P4SSE2

    You will need to check the option for arch though.

  4. Add the following lines to $HOME/.bashrc:
    # ATLAS 3.6.0 Settings
    export ATLAS="${HOME}/programs/atlas-3.6.0"
    export LD_LIBRARY_PATH="${ATLAS}/lib/Linux_P4SSE2"
  5. Source the file:
    . $HOME/.bashrc

HPL Compilation

Assuming that the pre-requisites are successfully compiled/installed, compiling HPL itself is quite easy. Following steps detail the the compilation of HPL – using MPICH1 and ATLAS.

  1. Download the tar ball from HPL Download page and save it on Desktop as $HOME/Desktop/hpl.tgz
  2. cd $HOME/programs
    tar -zxvpf $HOME/Desktop/hpl.tgz
    cd hpl
  3. If MPICH1 and ATLAS were compiled as described before, you may place this file as $HOME/programs/hpl/Make.Linux_P4SSE2
  4. Read through this file and edit it if necessary.
  5. make clean arch=Linux_P4SSE2
  6. make arch=Linux_P4SSE2
  7. Add the following lines to $HOME/.bashrc:
    # HPL Settings
    export HPL="${HOME}/programs/hpl"
    export PATH="${HPL}/bin/Linux_P4SSE2:${PATH}"
    export LD_LIBRARY_PATH="${HPL}/lib/Linux_P4SSE2:${LD_LIBRARY_PATH}"
  8. Source the file:
    . $HOME/.bashrc

Preparing The Input – HPL.dat

A very detailed description of each input parameter is given here and the copy of HPL.dat I used to run the program may be found here. N is the problem size (N x N system of linear equations) and this parameter is limited by the memory. NB defines the block size; P, Q denote the processors arranged in a P x Q grid and so on.

Running HPL

cd $HOME
mkdir -p test_runs/HPL
cd test_runs/HPL
cp $HOME/Desktop/HPL.dat .
mpirun -np 1 $HOME/programs/hpl/bin/Linux_P4SSE2/xhpl 2>&1 | tee HPL.out

It took me several attempts to figure out the combination of N and NB that would successfully run.

HPL Output

After solving the N x N system of linear equations, HPL checks the solution by computing 3 residuals that must all be within a threshold for the results to be considered valid. From the output file, I find that the calculation took 326.94 seconds and gave a performance of 1.121 GFlops. As can be seen from the output, all the residual tests were successfully passed.
HPL Output

Complete output file may be downloaded from here.


The test runs I did are neither complete nor exhaustive. If you have better ideas, tips, tricks, approaches and/or understanding of HPL (or associated programs), please post them as comments using the form below. I would greatly appreciate them, as would many other readers.

3 Replies to “HPL Benchmark For Single Processor Machines”

  1. Just wanted to mention that this helped me a lot in getting HPL working on my Lemote Fulong this past week. It took a bit of wrangling with configurations to get things happy, but once I got things working I was able to get some interesting numbers out of HPL.

  2. Hey, I don’t know if you still read this. I am just teaching myself on how to build a beowulf cluster blah blah.
    One thing I want to add to compiling HPL is that it is important in which order you link the libraries in.
    If you don’t link them in the right order (which is -lf77blas.a -lcblas.a -latlas.a for me),
    you’ll end up getting errors – at least I did.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.