Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Previous | Next --- Slide 39 of 51

haibinl

Just want to confirm, the GPU is supposed to have 4 L1 caches instead of 2. Is that correct? Since we're processing 4 instruction streams simultaneously.

illuminated

On each clock, up to four runnable warps from the on-core execution contexts are selected to be run. The GPU/CUDA lecture also notes that there is instruction-level parallelism available.