Much of the weekend was spent in some relaxation, shoveling snow (I have come to realize that this is a very good exercise – it can make one sweat even when it’s +10F outside!) and trying to debug the errors associated with execution of parallel version of VASP 4.6.28. However, same errors persisted and I didn’t go too far ahead with it.
I got in touch with few of Intel’s authorized resellers (one in Bloomington, MN – who transferred the call to someone else in Boston, MA) to see if I can get a copy of version 8.x of FORTRAN and C compilers. Though the latter person promised to get back to me with some favorable information soon, I got directly in touch with an Intel Support Technician – hoping that he/she wouldn’t be from a call center in some other country. Fortunately, it was somebody in California and unfortunately, he redirected me back to their re-sellers. Last call, pretty much out of desperation, did what I needed: I just had to register the non-commercially downloaded products for Premier Support and I would be entitled for previous versions too. If only Intel explicitly mentioned what Premier Support actually is, my worries would have ended a long time ago. By the way, if you are now wondering what the error message was (running on 2 processors), here it is:
1 2 3 4 5 6 7 8 9 10 11 12 13
running on 2 nodes distr: one band on 1 nodes, 2 groups vasp.4.6.28 25Jul05 complex POSCAR found : 4 types and 18 ions LDA part: xc-table for Ceperly-Alder, standard interpolation found WAVECAR, reading the header POSCAR, INCAR and KPOINTS ok, starting setup WARNING: wrap around errors must be expected FFT: planning ... 1 reading WAVECAR the WAVECAR file was read sucessfully LAPACK: Routine ZPOTRF failed! 8 LAPACK: Routine ZPOTRF failed! 8
Having managed to get version 8.x and 7.x of Intel compilers, situation only got worse as the error message remained the same. At this point, I must thank the help offered by VASP Tech Support and Andri Arnaldsson (from University of Washington) – they have been pretty quick in their responses, sent their copies of Makefiles along with several tips and tricks. Changing compiler versions, using a previous version of MPICH (1.2.7p1 to be precise), repeating compilation many times with different BLAS, LAPACK libraries — nothing helped.
Taking a break for an hour and watching an episode of South Park seemed to have helped. A modification in the key words used for Google! search and reading some discussion forum a lot more carefully, I found that adding three lines at the end of VASP Makefile (what this does is to reduce the level of optimization for
mpi.F), the error vanished and the calculations started running smoothly.
I repeated the same calculation using 2, 4, 6 and 8 processors and noticed a slightly strange behavior – when the number of processors was 2, 4 or 8, energy optimization is exactly same as in a serial calculation but when the number of processors is 2*N (N=3 in this case), energy optimization route is different – final result is still exactly the same. Though I have to do more trials (say 3, 5, 7, 9, 10 processors) to completely convince myself, it appears to me that using 2N processors does the trick. Like Dave Kraus mentioned once before – knowing what trick works is certainly important, but knowing why that trick works is even more important.
300+ compilation attempts spanning over six (yes, SIX) months of day in and day out, 14+ hour days to get (the Makefile with necessary flags and libraries for) one software suite compiled and tested successfully. The timing of this successful attempt can’t be just a coincidence — I sure do believe in Santa Claus and Christmas miracles, and am forever grateful to my advisor’s endless patience throughout these six months.