Previous | Next --- Slide 27 of 81
Back to Lecture Thumbnails
pht

By using vector ops and vector registers, many fragments can be processed through shader functions. In modern GPUs, 16 to 64 fragments share an instruction stream.

To achieve high-throughput execution, use many "slimmed down" cores and run them in parallel, pack cores full of ALUs, and avoid latency stalls by interleaving execution between fragment groups. These concepts are similar to when we discussed threads.

For more information: refer to https://www.cs.cmu.edu/afs/cs/academic/class/15462-f11/www/lec_slides/lec19.pdf.