Previous | Next --- Slide 39 of 51
Back to Lecture Thumbnails
haibinl

Just want to confirm, the GPU is supposed to have 4 L1 caches instead of 2. Is that correct? Since we're processing 4 instruction streams simultaneously.

illuminated

On each clock, up to four runnable warps from the on-core execution contexts are selected to be run. The GPU/CUDA lecture also notes that there is instruction-level parallelism available.