# Parallel Applications 2

## Public Resources

If provided, read carefully additional texts that can contain some helpful information on the origins of the files, licence agreements, etc.

## Internal Resources

Do not enter your LDAP credentials. A common user name and password were set for all students at the beginning of semester.

### Lesson 1

Prerequisites

• knowledge of C++

• try to compile the project template
• explore the project structure

### Lesson 2

Prerequisites

• knowledge of C++

• Allocate the HOST memory that will represent two M-dimensional vectors (A, B) and fill them with some values.
• Allocate the DEVICE memory to be able to copy data from HOST.
• Allocate the DEVICE memory to store an output M-dimensional vector C.
• Create a kernel that sums scalar values such that C[i] = A[i] + B[i].
• Allocate the HOST memory that will represent N M-dimensional vectors (A_0,...A_n-1, B_0, ... B_n-1) and fill them with some values.
• Allocate the DEVICE memory to be able to copy data from HOST.
• Allocate the DEVICE memory to store output M-dimensional vectors C_0 ... C_n-1.
• Create a kernel that sums all vectors pairs that C_0[i] = A_0[i] + B_0[i], ... C_n-1[i] = A_n-1[i] + B_n-1[i].
• THINK ABOUT THE VARIANTS OF YOUR SOLUTION, CONSIDER THE PROS AND CONS.

### Lesson 3

Prerequisites

• CUDA - memory allocation, page-locked memory

• Create a column matrix m[mRows,mCols] containing the numbers 0 1 2 3 ...
• The data should be well alligned in the page-locked memory.
• The matrix should be filled in CUDA kernel.
• You must use a Pitch CUDA memory with appropriate alignment. Moreover you must use 2D grid of 2D blocks of size 8x8.
• Increment the values of the matrix.
• Finally, copy the matrix to HOST using cudaMemcpy2D function.

Help for students

### Lesson 4

Prerequisites

• CUDA - shared memory

• Lets have a simple particle system representing a set of positions of N rain drops in the 3D space, where N>=1M.
• Create a suitable data representation of the mentioned set of rain drops.
• Lets have a filed of 256 wind power plants that give 256 movement vectors. The movement vectors invoke changes of all rain drops positions in a second.
• Create a kernel that simulates the falling of rain drops.
• Just for sake of simplicity suppose that a single kernel call simulates one second in the simulated world.

Help for students

### Lesson 5

This lesson is focused on discussion about students projects. In the rest of time, the following tasks should be solved.

Prerequisites

• CUDA - constant memory

• Try to write a simple code that will allocate and set a scalar value in the GPU constant memory.
• Copy the data back to HOST and check the value.
• Do the same with custom structure and then with some array.

### Lesson 6

Prerequisites

• CUDA - texture memory

• Try to finish a given application. To do that, you have to implement all subtasks marked by TODO in the code.

Help for students

### Lesson 7

Prerequisites

• CUDA - texture memory

• Try to finish a given application. To do that, you have to implement all subtasks marked by TODO in the code.

### Lesson 8

Prerequisites

• Try to finish a given application. To do that, you have to implement all subtasks marked by TODO in the code.

### Lesson 9

Prerequisites

• CUDA - Atomics

• Create and array of 2^28 elements (big enough to see the performance of your code).
• Fill that array with some numbers.
• Find the maximum number using the appropriate atomic function.

### Lesson 10

Prerequisites

• CUDA - Streams

• Try to finish a given application. To do that, you have to implement all subtasks in the code. There are two vectors A and B (dim ~= 1M) that will be N times duplicated in a loop. A simple kernel makes vector sum A+B=C. Everything will be done in streams with respect to the following tasks.
• TASK 2: two streams - depth first approach