To maximize the utilization of these resources, do we need to create 16 threads which operate on different parts of data and use Vector Program in the previous slide?


@BigFish Yes, each of the 16 threads can run on a core, and each thread can utilize an 8 wide vector.