Previous | Next --- Slide 20 of 69
Back to Lecture Thumbnails
Kapteyn

Kayvon mentioned the answer to this question in class but I didn't quite understand his explanation of it:

1) How does a compiler compile a program if the program asks to spawn more gangs than the width of the vector?

And a second question I have:

2) How does a compiler compile a program if the program asks to spawn more threads than there are cores in the CPU?

TA-lixf

@Kapteyn Those are two related questions. But I'm not sure how ISPC compiler is implemented.

For the second one, I think the compiler will happily spawn that many threads but the OS will schedule them accordingly. It means that some of them just won't just run for a while until a core frees up.

marjorie

I think he said in class that if, for example, you tried to spawn 16 gangs and your vector size was 8, the compiler would be smart enough to break up execution into two gangs of 8.

kayvonf

@marjorie: Not 2 gangs of 8. Gangs are an abstraction (they are how to think about the programs structure), we are now talking about implementation. To run with a gang of 16 instances you can use the ISPC compiler flag --target=avx2-i32x16. The implementation of each operation by the gang in this situation is two, 8-wide AVX2 vector instructions. The gang size remains 16 as that is what the programmer specified.

Kapteyn

So if you were running your program on a single core CPU with 8 ALUs, would compiling the program with the flag for vector width 16 result in the same assembly code as would compiling the program with the flag for vector with 8?

kayvonf

No it would not. Every place that required one 8-wide instruction in the gang size 8 case now requires two 8-wide instructions to carry out that operation for all 16 instances in the gang. You'd see assembly that had a bunch of repeated instructions. Try it with ISPC and inspect the resulting assembly.

BigFish

I was wondering if the gang size is 16 while only 8-wide instructions are supported, will the performance of the program suffers a lot compared to simply using 8 gangs?