Disclaimer
The instructions/steps/programs given below worked for me (and Michigan Technological University) running site licensed Red Hat Enterprise Linux 6.2, with NVIDIA CUDA SDK 4.1.28, NVIDIA GPU Driver v290.10 & two NVIDIA GeForce GTX 570 cards – as has been a common practice for several years now, a full version of Operating System was installed and all necessary patches/upgrades have been applied. These instructions may very well work for you (or your institution), on Red Hat-like or other linux distributions. Please note that if you decide to use these instructions on your machine, you are doing so entirely at your very own discretion and that neither this site, sgowtham.com, nor its author (or Michigan Technological University) is responsible for any/all damage – intellectual and/or otherwise.
The Program
/* hello_world_cuda.cu A CUDA C PROGRAM TO PRINT 'HELLO, WORLD!' TO THE SCREEN TESTED SUCCESSFULLY WITH CUDA SDK 4.1.28 AND NVIDIA GPU DRIVER VERSION 290.10 RUNNING ON NVIDIA GeForce GTX 270 COMPILATION: #1: NON-MAKEFILE APPROACH nvcc -g hello_world_cuda.cu -o hello_world_cuda.x #2. MAKEFILE APPROACH (USE THE ASSOCIATED Makefile) make EXECUTION: ./hello_world_cuda.x PORTIONS OF THE COMMENTS ARE ADOPTED FROM NVIDIA CUDA C PROGRAMMING GUIDE VERSION 4.0 (5/6/2011) FIRST WRITTEN: GOWTHAM; Mon, 13 Feb 2012 14:06:30 -0500 LAST MODIFIED: GOWTHAM; Mon, 13 Feb 2012 14:15:30 -0500 */ /* STANDARD HEADERS AND DEFINITIONS REFERENCE: http://en.wikipedia.org/wiki/C_standard_library */ #include /* Core input/output operations */ #include /* Conversions, random numbers, memory allocation, etc. */ #include /* Common mathematical functions */ #include /* Converting between various date/time formats */ #include /* CUDA related stuff */ /* KERNEL DEFINITION CUDA C EXTENDS C BY ALLOWING THE PROGRAMMER TO DEFINE C FUNCTIONS, CALLED KERNELS, THAT, WHEN CALLED, ARE EXECUTED N TIMES IN PARALLEL BY N DIFFERENT CUDA THREADS, AS OPPOSED TO ONLY ONCE LIKE REGULAR C FUNCTIONS A KERNEL IS DEFINED USING THE __global__ DECLARATION SPECIFIER. THE NUMBER OF CUDA THREADS THAT EXECUTE THAT KERNEL FOR A GIVEN KERNEL CALL IS SPECIFIED USING <<< >>> (EXECUTION CONFIGURATION) SYNTAX. EXECUTION CONFIGURATION DEFINES THE DIMENSION OF THE GRIDS AND BLOCKS THAT WILL BE USED TO EXECUTE THE FUNCTION ON THE DEVICE AS WELL AS THE ASSOCIATED STREAM EACH THREAD THAT EXECUTES THE KERNEL IS GIVEN A UNIQUE 'THREAD ID' THAT IS ACCESSIBLE WITHIN THE KERNEL THROUGH THE BUILT-IN threadIdx VARIABLE A FUNCTION DECLARED AS __global__ void Function(float* parameter); MUST BE CALLED AS FOLLOWS: Function<<< Dg, Db, Ns >>>(parameter); WHERE -- Dg : OF TYPE dim3, IT SPECIFIES THE DIMENSION AND SIZE OF THE GRID SUCH THAT Dg.x * Dg.y * Dg.z EQUALS THE NUMBER OF BLOCKS BEING LAUNCHED -- Db : OF TYPE dim3, IT SPECIFIES THE DIMENSION AND SIZE OF EACH BLOCK SUCH THAT Db.x * Db.y * Db.z EQUALS THE NUMBER OF THREADS PER BLOCK -- Ns : OF TYPE size_t, IT SPECIFIES THE NUMBER OF BYTES IN SHARED MEMORY THAT IS DYNAMICALLY ALLOCATED PER BLOCK FOR THIS CALL IN ADDITION TO THE STATICALLY ALLOCATED MEMORY. THIS DYNAMICALLY ALLOCATED MEMORY IS USED BY ANY OF THE VARIABLES DECLARED AS AN EXTERNAL ARRAY. NOTE THAT THIS IS AN OPTIONAL ARGUMENT THAT DEFAULTS TO 0 -- S : OF TYPE cudaStream_t, IT SPECIFIES THE ASSOCIATED STREAM. THIS TOO IS AN OPTIONAL ARGUMENT THAT DEFAULTS TO 0 */ __global__ void kernel(void) { } /* MAIN PROGRAM BEGINS */ int main(void) { /* Dg = 1; Db = 1; Ns = 0; S = 0 */ kernel<<<1,1>>>(); /* PRINT 'HELLO, WORLD!' TO THE SCREEN */ printf("\n Hello, World!\n\n"); /* INDICATE THE TERMINATION OF THE PROGRAM */ return 0; } /* MAIN PROGRAM ENDS */ |
# Simple version of the Makefile used to systematically compile # one of many CUDA C programs, taking into account respective dependencies # # First written: Gowtham; Mon, 13 Feb 2012 14:32:36 -0500 # Last modified: Gowtham; Mon, 13 Feb 2012 15:00:42 -0500 # # Necessary variables CC = nvcc CFLAGS = -g -c OFLAGS = -O3 MYPROGRAM = $(CPROGRAM) # If CPROGRAM is not defined (and hence MYPROGRAM is empty), # display help message ifndef CPROGRAM help: endif # Default target all: $(MYPROGRAM).x # Print help message help: @echo @echo " To compile, choose one of the following:" @echo @echo " make CPROGRAM=hello_world_cuda" @echo @echo @echo " To clean, choose one of the following:" @echo @echo " make clean-all-programs" @echo " make clean CPROGRAM=hello_world_cuda" @echo # $(MYPROGRAM).x (depends on $(MYPROGRAM).o) $(MYPROGRAM).x: $(MYPROGRAM).o $(CC) $(MYPROGRAM).o -o $(MYPROGRAM).x # $(MYPROGRAM).o (depends on $(MYPROGRAM).cu) $(MYPROGRAM).o: $(MYPROGRAM).cu $(CC) $(CFLAGS) $(OFLAGS) $(MYPROGRAM).cu # Remove the appropriate object file and executable clean: @echo @echo "Deleting $(MYPROGRAM).o and $(MYPROGRAM).x" rm -f $(MYPROGRAM).o $(MYPROGRAM).x @echo # Remove all object files and executables clean-all-programs: @echo @echo "Deleting *.o and *.x files" rm -f *.o *.x @echo |