Currently reviewing for midterm: because there are 2 sets of 16 ALU's, each of which actually perform 2 ALU cycles per cycle (the whole warps thing), the total number of observed SIMD operations per processing unit is 64.
Currently reviewing for midterm: because there are 2 sets of 16 ALU's, each of which actually perform 2 ALU cycles per cycle (the whole warps thing), the total number of observed SIMD operations per processing unit is 64.