Rocks 5.4.2 – HPL 2.0 benchmark with GCC 4.1.2

Disclaimer

The instructions/steps given below worked for me (and Michigan Technological University) running Rocks 5.4.2 (with CentOS 5.5) – as has been a common practice for several years now, a full version of Operating System was installed. These instructions may very well work for you (or your institution), on Rocks-like or other linux clusters. Please note that if you decide to use these instructions on your machine, you are doing so entirely at your very own discretion and that neither this site, sgowtham.com, nor its author (or Michigan Technological University) is responsible for any/all damage – intellectual and/or otherwise.

A Bit About HPL

LINPACK is a software library for performing numerical linear algebra on computers. It was written in FORTRAN by Jack Dongarra, Jim Bunch, Cleve Moler and Gilbert Stewart. LINPACK makes use of the BLAS libraries for performing basic vector and matrix operations. It has largely been superseded by LAPACK, which runs more efficiently on modern architectures.

The LINPACK benchmarks are a measure of a system’s floating point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense N * N system of linear equations, Ax = b. The solution is obtained by Gaussian Elimination with Partial Pivoting, with

\frac{2}{3}\:N^3 \:+\: 2\:N^2 \:+\: \mathcal{O}\left(N\right)

floating point iterations. The result is often expressed in billions of floating point operations per second (GFLOPS). HPL, a portable implementation of High Performance LINPACK Benchmark, is used as a performance measure for ranking the supercomputers in the Top500 list.

Pre-requisite #1: MPI

Following Rocks recommendations, this and other pre-requisites will be installed under /share/apps/; software installed by me, in clusters at Michigan Tech, have the following template for their folder structure:

/share/apps/
--> Software/Software_Version/
--> Compiler/Compiler_Version


Rocks 5.4.2 installation has an instance (of few flavors) of MPI but I prefer to compile MPICH2 using GCC 4.1.2. At the time of writing this post, the latest stable version of MPICH2 is 1.4.1p1 and it may be downloaded from here. Following folder structure/template mentioned above, it will be installed under

/share/apps/
--> mpich2/1.4.1p1/
--> gcc/4.1.2


To avoid confusion and/or missed steps leading to undesired results, steps associated with installation of MPICH2 have been put in the following script:

#! /bin/bash
#
# install_mpich2.sh
# BASH script to install MPICH2 (compiled against GCC 4.1.2) on a
# Rocks 5.4.2 cluster's front end
# Must be root (or at least have sudo privilege) to run this script
 
# Begin root-check IF
if [ $UID != 0 ]
then
  clear
  echo
  echo "  You must be logged in as root!"
  echo "  Exiting..."
  echo
  exit
else
  # Set necessary variables
  export CC="gcc"
  export CXX="g++"
  export FC="gfortran"
  export F77="gfortran"
  export MPICH2_VERSION="1.4.1p1"
  export GCC_VERSION="4.1.2"
  export MPICH2_INSTALL="/share/apps/mpich2/${MPICH2_VERSION}/gcc/${GCC_VERSION}"
  export ANL="http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs"
 
  echo
  echo "  Step #0: Download MPICH2 to /share/apps/tmp"
  cd /share/apps/tmp/
  wget ${ANL}/${MPICH2_VERSION}/mpich2-${MPICH2_VERSION}.tar.gz
 
  # Begin mpich2-${MPICH2_VERSION}.tar.gz check IF
  if [ -e "mpich2-${MPICH2_VERSION}.tar.gz" ]
  then
    echo
    echo "  Step #1: configure, make clean, make and make install"
 
    tar -zxpf mpich2-${MPICH2_VERSION}.tar.gz
    cd mpich2-${MPICH2_VERSION}/
    ./configure --prefix=${MPICH2_INSTALL}
    make clean
    make
    make install
 
    echo
    echo "  Step #2: Update $HOME/.bashrc"
    cat <


A good test of a successful installation is that which mpicc, which mpif77, etc. return the respective commands located in ${MPICH2_INSTALL}.

Pre-requisite #2: Goto BLAS

As in the case of pre-requisite #1, this one will be installed under

/share/apps/
--> gotoblas2/1.13/
--> gcc/4.1.2


The following script, used in installation, will assume that one has downloaded Goto BLAS2 1.13 from here and placed it in
/share/apps/tmp/

#! /bin/bash
#
# install_gotoblas2.sh
# BASH script to install Goto BLAS2 (compiled against GCC 4.1.2) on a
# Rocks 5.4.2 cluster's front end
# Must be root (or at least have sudo privilege) to run this script
 
# Begin root-check IF
if [ $UID != 0 ]
then
  clear
  echo
  echo "  You must be logged in as root!"
  echo "  Exiting..."
  echo
  exit
else
  # Set necessary variables
  export CC="gcc"
  export CXX="g++"
  export FC="gfortran"
  export F77="gfortran"
  export GOTOBLAS2_VERSION="1.13"
  export GCC_VERSION="4.1.2"
  export GOTOBLAS2_INSTALL="/share/apps/gotoblas2/${GOTOBLAS2_VERSION}/gcc/${GCC_VERSION}"
 
  # Begin GotoBLAS2-${GOTOBLAS2_VERSION}.tar.gz check IF
  if [ -e "/share/apps/tmp/GotoBLAS2-${GOTOBLAS2_VERSION}.tar.gz" ]
  then
 
    mkdir /share/apps/gotoblas2/${GOTOBLAS2_VERSION}/gcc/
    cd /share/apps/gotoblas2/${GOTOBLAS2_VERSION}/gcc/
    tar -zxvf /share/apps/tmp/GotoBLAS2-${GOTOBLAS2_VERSION}.tar.gz
    mv GotoBLAS2 ${GCC_VERSION}
 
    echo
    echo "  Step #0: make clean and make"
    cd ${GOTOBLAS2_INSTALL}
    make clean
    make BINARY=64
 
    echo
    echo "  Step #1: Update $HOME/.bashrc"
    cat <

HPL Installation/Compilation

With MPICH2 and Goto BLAS2 in place, HPL 2.0 will be installed under

/share/apps/
--> hpl/2.0/
--> mpich2/1.4.1p1/
--> gcc/4.1.2


The following script, used in installation, will assume that one has downloaded HPL 2.0 from here & the necessary Make.MPICH2141p1_GCC412.HPL (listed below) and placed them in /share/apps/tmp/

#
# Makefile (Make.MPICH2141p1_GCC412) used to compile HPL 
# on HPC clusters running NPACI Rocks (5.4.2) with CentOS (5.5)
# at Michigan Technological University.
# 
# Disclaimer:
# Please note that you are using these instructions 
# at your very own risk and that Michigan Technological
# University is not responsible for any/all damage caused to 
# your property, intellectual or otherwise.
# 
# For additional help and/or comments, questions, suggestions,
# please contact 
#
# Gowtham
# Information Technology Services
# Michigan Technological University
# g@mtu.edu
#
#
# High Performance Computing Linpack Benchmark (HPL)                
# HPL - 2.0 - September 10, 2008                          
# Antoine P. Petitet                                                
# University of Tennessee, Knoxville                                
# Innovative Computing Laboratory                                 
# (C) Copyright 2000-2008 All Rights Reserved                       
#                                                                       
 
#
# Shell Details
SHELL        = /bin/sh
CD           = cd
CP           = cp
LN_S         = ln -s
MKDIR        = mkdir
RM           = /bin/rm -f
TOUCH        = touch
 
#
# Platform Identifier
ARCH         = MPICH2141p1_GCC412
 
#
# HPL Directory Structure / HPL library
TOPdir       = /share/apps/hpl/2.0/mpich2/1.4.1p1/gcc/4.1.2
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
HPLlib       = $(LIBdir)/libhpl.a 
 
#
# Message Passing library (MPI)
MPdir        = /share/apps/mpich2/1.4.1p1/gcc/4.1.2
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/libmpich.a
 
#
# Linear Algebra library (BLAS)
LAlib        = /share/apps/gotoblas2/1.13/gcc/4.1.2/libgoto2.a
 
#
# F77 / C Interface
# You can skip this section if and only if you are not planning to use
# a BLAS library featuring a Fortran 77 interface. Otherwise, it is
# necessary to fill out the F2CDEFS variable with the appropriate
# options. **One and only one** option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_              : all lower case and a suffixed underscore (Suns,
#                       Intel, ...) [default]
# -DNoChange          : all lower case (IBM RS6000)
# -DUpCase            : all upper case (Cray)
# -DAdd__             : the FORTRAN compiler in use is f2c
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle    : The string address is passed at the string location
#                       on the stack, and the string length is then
#                       passed as an F77_INTEGER after all explicit
#                       stack arguments [default]
# -DStringStructPtr   : The address of a structure is passed by a
#                       Fortran 77 string, and the structure is of the
#                       form: struct {char *cp; F77_INTEGER len;}
# -DStringStructVal   : A structure is passed by value for each Fortran
#                       77 string, and the structure is of the form:
#                       struct {char *cp; F77_INTEGER len;}
# -DStringCrayStyle   : Special option for Cray machines, which uses
#                       Cray fcd (fortran character descriptor) for
#                       interoperation
F2CDEFS      =  -DAdd_
 
#
# HPL Includes / Libraries / Specifics
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
 
#
# HPL Compile Time Options
# -DHPL_COPY_L           force the copy of the panel L before bcast
# -DHPL_CALL_CBLAS       call the cblas interface
# -DHPL_CALL_VSIPL       call the vsip library
# -DHPL_DETAILED_TIMING  enable detailed timers
#
# By default HPL will:
#    *) not copy L before broadcast
#    *) call the BLAS Fortran 77 interface
#    *) not display detailed timing information
HPL_OPTS     = -DHPL_COPY_L -DHPL_CALL_CBLAS -DHPL_DETAILED_TIMING 
 
#
# HPL Definitions
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
 
#
# Compilers / linkers - Optimization Flags
CC           = mpicc
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
 
#
# On some platforms, it is necessary to use the Fortran linker 
# to find the Fortran internals used in the BLAS library
LINKER       = mpif77
LINKFLAGS    = $(CCFLAGS)
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
#! /bin/bash
#
# install_hpl.sh
# BASH script to install HPL 2.0 (compiled against MPICH2 1.4.1p1) on a
# Rocks 5.4.2 cluster's front end
# Must be root (or at least have sudo privilege) to run this script
 
# Begin root-check IF
if [ $UID != 0 ]
then
  clear
  echo
  echo "  You must be logged in as root!"
  echo "  Exiting..."
  echo
  exit
else
  # Set necessary variables
  export HPL_VERSION="2.0"
  export MPICH2_VERSION="1.4.1p1"
  export GCC_VERSION="4.1.2"
  export HPL_INSTALL="/share/apps/hpl/${HPL_VERSION}/mpich2/${MPICH2_VERSION}/gcc/${GCC_VERSION}"
  export MAKEFILE_ARCH="MPICH2141p1_GCC412"
 
  # Begin hpl-${HPL_VERSION}.tar.gz check IF
  if [ -e "/share/apps/tmp/hpl-${HPL_VERSION}.tar.gz" ]
  then
 
    mkdir /share/apps/hpl/${HPL_VERSION}/mpich2/${MPICH2_VERSION}/gcc/
    cd /share/apps/hpl/${HPL_VERSION}/mpich2/${MPICH2_VERSION}/gcc/
    tar -zxvf /share/apps/tmp/hpl-${HPL_VERSION}.tar.gz
    mv hpl-${HPL_VERSION} ${GCC_VERSION}
 
    echo
    echo "  Step #0: copy the makefile, make clean and make"
    cd ${HPL_INSTALL}
    cp /share/apps/tmp/Make.${MAKEFILE_ARCH}.HPC ./Make.${MAKEFILE_ARCH}
 
    if [ -d "${HPL_INSTALL}/bin/${MAKEFILE_ARCH}"]
    then
      make clean arch=${MAKEFILE_ARCH}
    fi
 
    if [ "$(ls -A ${HPL_INSTALL}/bin)"];
    then
      make clean_arch_all arch=${MAKEFILE_ARCH} 
    fi
 
    make arch=${MAKEFILE_ARCH}
 
    echo
    echo "  Step #1: Update $HOME/.bashrc"
    cat <

Running HPL Benchmark

The amount of memory used by HPL is essentially the size of the co-efficient matrix, A. Following standard binary definition, 1 GB is 1024 * 1024 * 1024 bytes (MEM_BYTES). Most scientific/engineering computations use double precision numbers, with each such double precision number taking 8 bytes of memory. Thus, 1 GB can accommodate 134,217,728 double precision entities

DP_ELEMENTS = MEM_BYTES/8

Theoretically, sqrt(DP_ELEMENTS) represents the maximum possible value of N. However, operating system needs some memory to perform some necessary operations. As such, HPL benchmark is usually performed for the following values of N – with m representing the fraction of total memory – making sure that swapping did not occur (which would result in reduced performance).

N \:=\: m\:\sqrt{\mathrm{TOTAL\_DP\_ELEMENTS}} \hspace{0.50in} m: 0.50\:(0.10)\:0.80


HPL uses the block size (NB) for the data distribution as well as for the computational granularity. From a data distribution perspective, the smaller NB, the better the load balance. From a computational perspective, too small of a value for NB may limit the computational performance by a large factor since almost no data re-use will occur in the highest level of the memory hierarchy. The number of messages will also increase. In my case, this benchmark was performed for NB values of 128, 256 and 512.

The results so obtained are compared with the theoretical peak value, \mathrm{GFLOPS}_\mathrm{Theory}, computed as follows:

 \mbox{\# of Nodes} \;\times\; \mbox{\# of Sockets/Node} \;\times\; \mbox{\# of Cores/Socket} \;\times\; \\\\\mbox{CPU Frequency (Cycles/second)} \;\times\; \mbox{\# of Floating Point Operations/Cycle}


For e.g., for a cluster with 16 identical recent Intel architecture compute nodes, each compute node with dual hex cores @ 3.00 GHz, \mathrm{GFLOPS}_\mathrm{Theory} will be

 \mbox{16 (\# of Nodes)} \;\times\; \mbox{2 (\# of Sockets/Node)} \;\times\; \mbox{6 (\# of Cores/Socket)} \;\times\; \\\\\mbox{3 G [CPU Frequency (Cycles/second)]} \;\times\; \mbox{4 (\# of Floating Point Operations/Cycle)}\\\\ = 2304 \mbox{ GFLOPS} \\\\ \approx 2.3 \mbox{ TFLOPS}


If each of these nodes had 24 GB RAM, then

MEM_BYTES = 1024 * 1024 * 1024
* 24 * 16 = 412316860416

and

DP_ELEMENTS = 412316860416/8
= 51539607552

As such, N values will be

N \:=\: m\:\times\: \sqrt{51539607552} \:\approx\: m\:\times\: 227020 \hspace{0.50in} m: 0.50\:(0.10)\:0.80

What if the cluster has heterogeneous compute nodes?

Computing \mathrm{GFLOPS}_\mathrm{Theory} isn’t easy in this case; becomes even more so when these compute nodes belong to different generations as one has to account for aging factor. It has been a practice at Michigan Tech, in such cases, to split the cluster into different queues – one for each generation/type of compute nodes – and run the HPL benchmark separately.


Thanks be to

Rocks mailing list and its participants.

4 Replies to “Rocks 5.4.2 – HPL 2.0 benchmark with GCC 4.1.2”

  1. i have many many problems related installing linpack on rock cluster….can you plz give me some proper linux command guide to install mpich ,blas then hpl…..and proper detail instruction related changes in make.linux file……because i get two errors in
    command:

    make arch = linux

    thanks in advance …
    waiting for reply

    1. no effect.

      this is the error

      make[2]: Nothing to be done for `all’.
      make[2]: Leaving directory `/root/HPL/testing/ptimer/LinuxKVM’
      ( cd testing/ptest/LinuxKVM; make )
      make[2]: Entering directory `/root/HPL/testing/ptest/LinuxKVM’
      make[2]: *** No rule to make target `/root/HPL/lib/LinuxKVM/libhpl.a’, needed by `dexe.grd’. Stop.
      make[2]: Leaving directory `/root/HPL/testing/ptest/LinuxKVM’
      make[1]: *** [build_tst] Error 2
      make[1]: Leaving directory `/root/HPL’
      make: *** [build] Error 2

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.