You can actually use segmented scan to solve the histogram problem from quiz 2.

Set all histogram values to 0.

Map the values to their bins and sort that array, call it A.

Map the values of A to 1 if they are the start of a new run of a bin and 0 otherwise. You can check this by comparing element at i to the element at i-1 (or if i==0). Call this array flag.

Tabulate an array of length A of all 1's, call it data.

Run segmented_scan_exclusive on data using flag, getting array count. It's also fine to reuse data.

In parallel, look at all elements in count. Say we're at index i. If count[i] is the end of a run, then we know that the bin the run belongs to is A[i] and that the count is count[i]+1, hence simply do histogram[A[i]] = count[i]+1. We can check if we're at the end of a run by checking if count[i+1] == 0 (or i==N-1).

Since the question gave us N processors, this should make good use of them since the steps that don't involve a library call are all O(1) span.

jrgallag

Just for clarification, exclusive scan includes the "base case", but not the final sum of the sequence. On the other hand, an inclusive scan would not include the base case, but would include the final sum. For example:

segmented_scan_exclusive(+, A) = [[0, 1], [0], [0, 1, 3, 6]
segmented_scan_inclusive(+, A) = [[1, 3], [6], [1, 3, 6, 10]]

This was probably covered in an earlier lecture, but it confused me for a while, so I figured I might as well leave a note!

You can actually use segmented scan to solve the histogram problem from quiz 2.

`A`

.`A`

to 1 if they are the start of a new run of a bin and 0 otherwise. You can check this by comparing element at`i`

to the element at`i-1`

(or if`i==0`

). Call this array`flag`

.`A`

of all 1's, call it`data`

.`data`

using`flag`

, getting array`count`

. It's also fine to reuse`data`

.`count`

. Say we're at index`i`

. If`count[i]`

is the end of a run, then we know that the bin the run belongs to is`A[i]`

and that the count is`count[i]+1`

, hence simply do`histogram[A[i]] = count[i]+1`

. We can check if we're at the end of a run by checking if`count[i+1] == 0`

(or`i==N-1`

).Since the question gave us N processors, this should make good use of them since the steps that don't involve a library call are all O(1) span.

Just for clarification, exclusive scan includes the "base case", but not the final sum of the sequence. On the other hand, an inclusive scan would not include the base case, but would include the final sum. For example:

This was probably covered in an earlier lecture, but it confused me for a while, so I figured I might as well leave a note!