Hello, here I will explain how I ran my code so that it may be easily replicated. All of my
work was done in Google Colab. For my code, in Colab I have 3 separate coding
blocks.
The first for sequential.c (the attached file), the second for the compiling, and the third
for testing. The compiling for sequential.c is done using:
!gcc sequential.c -o sequential -lm
The command to run sequential.c is:
!./sequential 32 32
The 2 inputs are just the width and height.
I run parallel.cu in another Colab file with another 3 coding blocks.
The first for parallel.cu (the other attached file), the second for the compiling, and the
third for testing. The compiling for parallel.cu is done using:
!nvcc -arch=compute_75 -code=sm_75 parallel.cu -o parallel
The command to run parallel.cu is:
!./parallel 32 32 1528 8 8 0 0
The 7 inputs are the width, height, iterations (take from sequential.c's output), blocks for
x-axis, blocks for y-axis, grid for x-axis, grid for y-axis (0 is default for the grids, which I
recommend for ease of use)
I run parallel.cu in a separate file so that I do not need to constantly switch runtime
types from CPU to T4 GPU. T4 GPU is needed for parallel.cu while CPU should be
used for sequential.c.
Obviously, the numbers can be changed around for the running commands. It is
important to note that under "Change runtime type" the hardware accelerator should be
CPU for sequential.c and T4 GPU for parallel.cu. I also checked to ensure that the
temperatures came out the same in each file.
Here are some runtime comparisons with various classroom sizes. Overall, you will see
that parallel.cu runs much faster, with the only outlier being very small arrays.
Size
16x16
32x32
64x64
32x64
Average runtime for
sequential.c
~1.5 ms
~30 ms
~1000 ms
~200 ms
Average runtime for
parallel.cu
~2.7 ms
~17 ms
~175 ms
~60 ms
32x128
~530 ms
~95 ms