Rocks 5.4.2 – HPCC 1.4.1 benchmark with GCC 4.1.2

Disclaimer

The instructions/steps given below worked for me (and Michigan Technological University) running Rocks 5.4.2 (with CentOS 5.5) – as has been a common practice for several years now, a full version of Operating System was installed. These instructions may very well work for you (or your institution), on Rocks-like or other linux clusters. Please note that if you decide to use these instructions on your machine, you are doing so entirely at your very own discretion and that neither this site, sgowtham.com, nor its author (or Michigan Technological University) is responsible for any/all damage – intellectual and/or otherwise.

A Bit About HPCC

Citing HPCC website,

HPCC benchmark suite has been released by DARPA High Productivity Computing Systems (HPCS) program to help define the performance boundaries of future Petascale computing systems. HPCC is a suite of tests that examine the performance of HPC architectures using kernels with memory access patterns more challenging than those of the HPL benchmark used in Top500. This suite is designed to augment the list, providing benchmarks that bound the performance of many real applications as a function of memory access characteristics. For e.g., spatial and temporal locality, and providing a framework for including additional tests. HPCC consists of seven tests that attempt to span high and low spatial and temporal locality space.

By design, the HPCC tests are scalable with the size of data sets being a function of the largest HPL matrix tested for the system. Since HPCC kernels consist of simple mathematical operations, it provides a unique opportunity to look at language and parallel programming issues. To characterize the architecture of the system, the following three scenarios are considered:

  1. Local – only a single processor is performing the computations
  2. Embarrassingly Parallel – each processor in the entire system is performing the computations but without explicit communication with each other
  3. Global – all processors in the system are performing the computations with explicit communication with each other

The seven tests that make up HPCC benchmark are as follows:

  1. HPL – the Linpack TPP benchmark which measures the floating point rate of execution for solving a linear system of equations
  2. DGEMM – measures the floating point rate of execution of double precision real matrix-matrix multiplication
  3. STREAM – a simple synthetic benchmark program that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for simple vector kernel
  4. PTRANS (parallel matrix transpose) – exercises the communications where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network
  5. RandomAccess – measures the rate of integer random updates of memory (GUPS)
  6. FFT – measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Transform (DFT)
  7. Communication bandwidth and latency – a set of tests to measure latency and bandwidth of a number of simultaneous communication patterns; based on beff (effective bandwidth benchmark)

Pre-requisites: MPI & Goto BLAS

Please refer to the corresponding sections in HPL 2.0 Benchmark With GCC 4.1.2 On Rocks 5.4.2 to learn more about required pre-requisites as well as instructions for installing/compiling them. Following Rocks recommendations, pre-requisites will be installed under /share/apps/; software installed by me, in clusters at Michigan Tech, have the following template for their folder structure:

/share/apps/
--> Software/Software_Version/
--> Compiler/Compiler_Version

HPCC Installation/Compilation

With MPICH2 and Goto BLAS2 in place, HPCC 1.4.1 will be installed under

/share/apps/
--> hpcc/1.4.1/
--> mpich2/1.4.1p1/
--> gcc/4.1.2


The following script, used in installation, will assume that one has downloaded HPCC 1.4.1 from here & the necessary Make.MPICH2141p1_GCC412.HPCC (listed below; it does look identical to the one used when compiling HPL 2.0 – the only difference is in the definition of TOPdir variable) and placed them in /share/apps/tmp/

#
# Makefile (Make.MPICH2141p1_GCC412) used to compile HPCC 
# on HPC clusters running NPACI Rocks (5.4.2) with CentOS (5.5)
# at Michigan Technological University.
# 
# Disclaimer:
# Please note that you are using these instructions 
# at your very own risk and that Michigan Technological
# University is not responsible for any/all damage caused to 
# your property, intellectual or otherwise.
# 
# For additional help and/or comments, questions, suggestions,
# please contact 
#
# Gowtham
# Information Technology Services
# Michigan Technological University
# g@mtu.edu
#
#
# High Performance Computing Linpack Benchmark (HPL)                
# HPL - 2.0 - September 10, 2008                          
# Antoine P. Petitet                                                
# University of Tennessee, Knoxville                                
# Innovative Computing Laboratory                                 
# (C) Copyright 2000-2008 All Rights Reserved                       
#  
 
#
# Shell Details
SHELL        = /bin/sh
CD           = cd
CP           = cp
LN_S         = ln -s
MKDIR        = mkdir
RM           = /bin/rm -f
TOUCH        = touch
 
#
# Platform Identifier
ARCH         = MPICH2141p1_GCC412
 
#
# HPL Directory Structure / HPL library
TOPdir       = ../../..
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
HPLlib       = $(LIBdir)/libhpl.a 
 
#
# Message Passing library (MPI)
MPdir        = /share/apps/mpich2/1.4.1p1/gcc/4.1.2
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/libmpich.a
 
#
# Linear Algebra library (BLAS)
LAlib        = /share/apps/gotoblas2/1.13/gcc/4.1.2/libgoto2.a
 
#
# F77 / C Interface
# You can skip this section if and only if you are not planning to use
# a BLAS library featuring a Fortran 77 interface. Otherwise, it is
# necessary to fill out the F2CDEFS variable with the appropriate
# options. **One and only one** option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_              : all lower case and a suffixed underscore (Suns,
#                       Intel, ...) [default]
# -DNoChange          : all lower case (IBM RS6000)
# -DUpCase            : all upper case (Cray)
# -DAdd__             : the FORTRAN compiler in use is f2c
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle    : The string address is passed at the string location
#                       on the stack, and the string length is then
#                       passed as an F77_INTEGER after all explicit
#                       stack arguments [default]
# -DStringStructPtr   : The address of a structure is passed by a
#                       Fortran 77 string, and the structure is of the
#                       form: struct {char *cp; F77_INTEGER len;}
# -DStringStructVal   : A structure is passed by value for each Fortran
#                       77 string, and the structure is of the form:
#                       struct {char *cp; F77_INTEGER len;}
# -DStringCrayStyle   : Special option for Cray machines, which uses
#                       Cray fcd (fortran character descriptor) for
#                       interoperation
F2CDEFS      =  -DAdd_
 
#
# HPL Includes / Libraries / Specifics
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
 
#
# HPL Compile Time Options
# -DHPL_COPY_L           force the copy of the panel L before bcast
# -DHPL_CALL_CBLAS       call the cblas interface
# -DHPL_CALL_VSIPL       call the vsip library
# -DHPL_DETAILED_TIMING  enable detailed timers
#
# By default HPL will:
#    *) not copy L before broadcast
#    *) call the BLAS Fortran 77 interface
#    *) not display detailed timing information
HPL_OPTS     = -DHPL_COPY_L -DHPL_CALL_CBLAS -DHPL_DETAILED_TIMING 
 
#
# HPL Definitions
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
 
#
# Compilers / linkers - Optimization Flags
CC           = mpicc
CCNOOPT      = $(HPL_DEFS)
CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
 
#
# On some platforms, it is necessary to use the Fortran linker 
# to find the Fortran internals used in the BLAS library
LINKER       = mpif77
LINKFLAGS    = $(CCFLAGS)
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
#! /bin/bash
#
# install_hpcc.sh
# BASH script to install HPCC 1.4.1 (compiled against MPICH2 1.4.1p1) on a
# Rocks 5.4.2 cluster's front end
# Must be root (or at least have sudo privilege) to run this script
 
# Begin root-check IF
if [ $UID != 0 ];
then
  clear
  echo
  echo "  You must be logged in as root!"
  echo "  Exiting..."
  echo
  exit
else
  # Set necessary variables
  export HPCC_VERSION="1.4.1"
  export MPICH2_VERSION="1.4.1p1"
  export GCC_VERSION="4.1.2"
  export HPCC_INSTALL="/share/apps/hpcc/${HPCC_VERSION}/mpich2/${MPICH2_VERSION}/gcc/${GCC_VERSION}"
  export MAKEFILE_ARCH="MPICH2141p1_GCC412"
 
  # Begin HPCC-${HPCC_VERSION}.tar.gz check IF
  if [ -e "/share/apps/tmp/hpcc-${HPCC_VERSION}.tar.gz" ]
  then
 
    mkdir /share/apps/hpcc/${HPCC_VERSION}/mpich2/${MPICH2_VERSION}/gcc/
    cd /share/apps/hpcc/${HPCC_VERSION}/mpich2/${MPICH2_VERSION}/gcc/
    tar -zxvf /share/apps/tmp/hpcc-${HPCC_VERSION}.tar.gz
    mv hpcc-${HPCC_VERSION} ${GCC_VERSION}
 
    echo
    echo "  Step #0: copy the makefile, make clean and make"
    cd ${HPCC_INSTALL}
    cp /share/apps/tmp/Make.${MAKEFILE_ARCH}.HPCC ./hpl/Make.${MAKEFILE_ARCH}
 
    make clean arch=${MAKEFILE_ARCH}
    make arch=${MAKEFILE_ARCH}
 
    echo
    echo "  Step #1: Update $HOME/.bashrc"
    cat <

Running HPCC Benchmark

Please refer to the corresponding sections in HPL 2.0 Benchmark With GCC 4.1.2 On Rocks 5.4.2 to learn how to get a fix on N, NB, etc., given the hardware configuration of compute nodes. Running HPCC benchmark is very similar to running HPL 2.0 benchmark.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.