Parallel Applications 2

Public Resources

 

No login is required.
If provided, read carefully additional texts that can contain some helpful information on the origins of the files, licence agreements, etc.

Internal Resources

 

Login is required to view all content and to download files in this section.
Do not enter your LDAP credentials. A common user name and password were set for all students at the beginning of semester.

Lesson 1

Prerequisites

  • download CUDA 9.1 project template with all additional libraries for further usage -> DOWNLOAD
  • knowledge of C++

 

Topics and Tasks

  • try to compile the project template
  • explore the project structure

Lesson 2

Prerequisites

  • download the project template with all additional libraries for further usage -> DOWNLOAD
  • knowledge of C++

 

Topics and Tasks

  • Allocate the HOST memory that will represent two M-dimensional vectors (A, B) and fill them with some values.
  • Allocate the DEVICE memory to be able to copy data from HOST.
  • Allocate the DEVICE memory to store an output M-dimensional vector C.
  • Create a kernel that sums scalar values such that C[i] = A[i] + B[i].
  • Allocate the HOST memory that will represent N M-dimensional vectors (A_0,...A_n-1, B_0, ... B_n-1) and fill them with some values.
  • Allocate the DEVICE memory to be able to copy data from HOST.
  • Allocate the DEVICE memory to store output M-dimensional vectors C_0 ... C_n-1.
  • Create a kernel that sums all vectors pairs that C_0[i] = A_0[i] + B_0[i], ... C_n-1[i] = A_n-1[i] + B_n-1[i].
  • THINK ABOUT THE VARIANTS OF YOUR SOLUTION, CONSIDER THE PROS AND CONS.

Lesson 3

Prerequisites

  • download the project template with all additional libraries for further usage -> DOWNLOAD
  • CUDA - memory allocation, page-locked memory

 

Topics and Tasks

  • Create a column matrix m[mRows,mCols] containing the numbers 0 1 2 3 ...
  • The data should be well alligned in the page-locked memory.
  • The matrix should be filled in CUDA kernel.
  • You must use a Pitch CUDA memory with appropriate alignment. Moreover you must use 2D grid of 2D blocks of size 8x8.
  • Increment the values of the matrix.
  • Finally, copy matrix to HOST using cudaMemcpy2D function.

Lesson 4

Prerequisites

  • download the project template with all additional libraries for further usage -> DOWNLOAD
  • download the the runner template -> DOWNLOAD
  • CUDA - shared memory

 

Topics and Tasks

  • Lets have a simple particle system representing a set of positions of N rain drops in the 3D space, where N>=1M.
  • Create a suitable data representation of the mentioned set of rain drops.
  • Lets have a filed of 256 wind power plants that give 256 movement vectors. The movement vectors invoke changes of all rain drops positions in a second.
  • Create a kernel that simulates the falling of rain drops.
  • Just for sake of simplicity suppose that a single kernel call simulates one second in the simulated world.

Lesson 5

Prerequisites

  • download the project template with all additional libraries for further usage -> DOWNLOAD
  • CUDA - constant memory

 

Topics and Tasks

  • Try to write a simple code that will allocate and set a scalar value in the GPU constant memory.
  • Copy the data back to HOST and check the value.
  • Do the same with custom structure and then with some array.

Lesson 6

Prerequisites

  • download the project template with all additional libraries for further usage -> DOWNLOAD
  • runner 6 - DOWNLOAD
  • CUDA - texture memory

 

Topics and Tasks

  • Try to finish a given application. To do that, you have to implement all subtasks marked by TODO in the code.

Lesson 7

Prerequisites

  • download the project template with all additional libraries for further usage -> DOWNLOAD
  • runner 7 - DOWNLOAD
  • CUDA - texture memory

 

Topics and Tasks

  • Try to finish a given application. To do that, you have to implement all subtasks marked by TODO in the code.

Lesson 8

Prerequisites

 

Topics and Tasks

  • Try to finish a given application. To do that, you have to implement all subtasks marked by TODO in the code.

Lesson 9

Prerequisites

  • CUDA - Atomics

 

Topics and Tasks

  • Create and array of 2^26 elements (big enough to see the performance of your code).
  • Fill that array with some numbers.
  • Find the maximum number using the appropriate atomic function.
  • Think about the optimization.

Lesson 10

Prerequisites

  • download the project template with all additional libraries for further usage -> DOWNLOAD
  • CUDA - Streams
  • Source code - DOWNLOAD

 

Topics and Tasks

  • Try to finish a given application. To do that, you have to implement all subtasks in the code. There are two vectors A and B (dim ~= 1M) that will be N times duplicated in a loop. A simple kernel makes vector sum A+B=C. Everything will be done in streams with respect to the following tasks.
  • TASK 1: Simple stream
  • TASK 2: two streams - depth first approach
  • TASK 3: two streams - breadth first approach

Lesson 11

Prerequisites

 

Topics and Tasks

  • Try to finish a given application. To do that, you have to implement all subtasks in the code. You must create a distance matrix that will contain the distances between vectors in 3D.
  • The distances can be computed in several different ways. Try to use BLAS3 functions. It means that you must deal with matrix operations.