Previous | Next --- Slide 36 of 40
Back to Lecture Thumbnails
AnnabelleBlue

Step 3 is finding the start and end addresses of each cell segment. This step is highly parallelizable because no two threads will attempt to modify the same piece of memory.

Consider index i. The only write to cell_starts[i] happens when result[i] is the first element of that cell block. Since result is sorted, this will occur exactly once. The only write to cell_ends[i] happens when you start a new cell block. Once again, since result is sorted, this will happen exactly once. Therefore, no two threads touch the same index in the result array.

In this particular example, only index=2 will modify cell_starts[6] and only index=5 will modify cell_ends[6].

After Step 3 has run, you can use cell_starts and cell_ends to index into the array of particles which I assume has been sorted with the result array.

kayvonf

@AnnabelleBlue -- that was an excellent explanation of the algorithm and its invariants. Very well done. (We prettied up the code references in monospace using Markdown to make it easier to read.)