CUDA/C – Hello, World!

Disclaimer

The instructions/steps/programs given below worked for me (and Michigan Technological University) running site licensed Red Hat Enterprise Linux 6.2, with NVIDIA CUDA SDK 4.1.28, NVIDIA GPU Driver v290.10 & two NVIDIA GeForce GTX 570 cards – as has been a common practice for several years now, a full version of Operating System was installed and all necessary patches/upgrades have been applied. These instructions may very well work for you (or your institution), on Red Hat-like or other linux distributions. Please note that if you decide to use these instructions on your machine, you are doing so entirely at your very own discretion and that neither this site, sgowtham.com, nor its author (or Michigan Technological University) is responsible for any/all damage – intellectual and/or otherwise.

The Program

/* hello_world_cuda.cu
   A CUDA C PROGRAM TO PRINT 'HELLO, WORLD!' TO THE SCREEN
 
   TESTED SUCCESSFULLY WITH CUDA SDK 4.1.28 AND NVIDIA GPU DRIVER 
   VERSION 290.10 RUNNING ON NVIDIA GeForce GTX 270
 
   COMPILATION:
   #1: NON-MAKEFILE APPROACH
       nvcc -g hello_world_cuda.cu -o hello_world_cuda.x
 
   #2. MAKEFILE APPROACH (USE THE ASSOCIATED Makefile)
       make
 
   EXECUTION:
   ./hello_world_cuda.x
 
   PORTIONS OF THE COMMENTS ARE ADOPTED FROM
 
     NVIDIA CUDA C
     PROGRAMMING GUIDE
     VERSION 4.0 (5/6/2011)
 
   FIRST WRITTEN: GOWTHAM; Mon, 13 Feb 2012 14:06:30 -0500
   LAST MODIFIED: GOWTHAM; Mon, 13 Feb 2012 14:15:30 -0500
*/
 
/* STANDARD HEADERS AND DEFINITIONS 
   REFERENCE: http://en.wikipedia.org/wiki/C_standard_library
*/
#include   /* Core input/output operations                         */
#include  /* Conversions, random numbers, memory allocation, etc. */
#include    /* Common mathematical functions                        */
#include    /* Converting between various date/time formats         */
#include    /* CUDA related stuff                                   */
 
 
/* KERNEL DEFINITION
   CUDA C EXTENDS C BY ALLOWING THE PROGRAMMER TO DEFINE C FUNCTIONS,
   CALLED KERNELS, THAT, WHEN CALLED, ARE EXECUTED N TIMES IN PARALLEL
   BY N DIFFERENT CUDA THREADS, AS OPPOSED TO ONLY ONCE LIKE REGULAR
   C FUNCTIONS
 
   A KERNEL IS DEFINED USING THE __global__ DECLARATION SPECIFIER.
   THE NUMBER OF CUDA THREADS THAT EXECUTE THAT KERNEL FOR A GIVEN
   KERNEL CALL IS SPECIFIED USING <<< >>> (EXECUTION CONFIGURATION)
   SYNTAX. EXECUTION CONFIGURATION DEFINES THE DIMENSION OF THE
   GRIDS AND BLOCKS THAT WILL BE USED TO EXECUTE THE FUNCTION ON THE 
   DEVICE AS WELL AS THE ASSOCIATED STREAM
 
   EACH THREAD THAT EXECUTES THE KERNEL IS GIVEN A UNIQUE 'THREAD ID'
   THAT IS ACCESSIBLE WITHIN THE KERNEL THROUGH THE BUILT-IN
   threadIdx VARIABLE
 
   A FUNCTION DECLARED AS
 
     __global__ void Function(float* parameter);
 
   MUST BE CALLED AS FOLLOWS:
 
     Function<<< Dg, Db, Ns >>>(parameter);
 
   WHERE
 
   -- Dg : OF TYPE dim3, IT SPECIFIES THE DIMENSION AND SIZE OF THE GRID
           SUCH THAT Dg.x * Dg.y * Dg.z EQUALS THE NUMBER OF BLOCKS BEING
           LAUNCHED
 
   -- Db : OF TYPE dim3, IT SPECIFIES THE DIMENSION AND SIZE OF EACH BLOCK
           SUCH THAT Db.x * Db.y * Db.z EQUALS THE NUMBER OF THREADS PER
           BLOCK
 
   -- Ns : OF TYPE size_t, IT SPECIFIES THE NUMBER OF BYTES IN SHARED MEMORY
           THAT IS DYNAMICALLY ALLOCATED PER BLOCK FOR THIS CALL IN ADDITION
           TO THE STATICALLY ALLOCATED MEMORY. THIS DYNAMICALLY ALLOCATED
           MEMORY IS USED BY ANY OF THE VARIABLES DECLARED AS AN EXTERNAL
           ARRAY. NOTE THAT THIS IS AN OPTIONAL ARGUMENT THAT DEFAULTS TO 0
 
   -- S  : OF TYPE cudaStream_t, IT SPECIFIES THE ASSOCIATED STREAM. THIS
           TOO IS AN OPTIONAL ARGUMENT THAT DEFAULTS TO 0
*/
__global__ void kernel(void) {
}
 
/* MAIN PROGRAM BEGINS */
int main(void) {
 
  /* Dg = 1; Db = 1; Ns = 0; S = 0 */
  kernel<<<1,1>>>();
 
  /* PRINT 'HELLO, WORLD!' TO THE SCREEN */
  printf("\n  Hello, World!\n\n");
 
  /* INDICATE THE TERMINATION OF THE PROGRAM */
  return 0;
}
/* MAIN PROGRAM ENDS */
# Simple version of the Makefile used to systematically compile
# one of many CUDA C programs, taking into account respective dependencies
# 
# First written: Gowtham; Mon, 13 Feb 2012 14:32:36 -0500
# Last modified: Gowtham; Mon, 13 Feb 2012 15:00:42 -0500
# 
 
# Necessary variables
CC        = nvcc
CFLAGS    = -g -c
OFLAGS    = -O3
MYPROGRAM = $(CPROGRAM)
 
# If CPROGRAM is not defined (and hence MYPROGRAM is empty), 
# display help message
ifndef CPROGRAM
help:
endif
 
# Default target
all: $(MYPROGRAM).x
 
# Print help message
help:
	@echo
	@echo "  To compile, choose one of the following:"
	@echo
	@echo "    make CPROGRAM=hello_world_cuda"
	@echo
	@echo
	@echo "  To clean, choose one of the following:"
	@echo
	@echo "    make clean-all-programs"
	@echo "    make clean CPROGRAM=hello_world_cuda"
	@echo
 
 
# $(MYPROGRAM).x (depends on $(MYPROGRAM).o)
$(MYPROGRAM).x: $(MYPROGRAM).o
	$(CC) $(MYPROGRAM).o -o $(MYPROGRAM).x
 
 
# $(MYPROGRAM).o (depends on $(MYPROGRAM).cu)
$(MYPROGRAM).o: $(MYPROGRAM).cu
	$(CC) $(CFLAGS) $(OFLAGS) $(MYPROGRAM).cu
 
 
# Remove the appropriate object file and executable
clean:
	@echo
	@echo "Deleting $(MYPROGRAM).o and $(MYPROGRAM).x"
	rm -f $(MYPROGRAM).o $(MYPROGRAM).x
	@echo
 
# Remove all object files and executables
clean-all-programs:
	@echo
	@echo "Deleting *.o and *.x files"
	rm -f *.o *.x
	@echo

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.