This seems like a reasonable idea, but it also seems that taking advantage of SIMD in this way is almost not worth it... unless the packet size is many times larger than the processor size. Sure, in this example we save 1 SIMD instruction to work on 7 of 8 instead of 7 of 16, but I imagine if there were say, 2 of 16 rays, there would still be a lot of wasted resources. At least having a larger packet size would allow for the cases where, say 4 of 64 can be compressed to 4 of 8, or something like that.
This seems like a reasonable idea, but it also seems that taking advantage of SIMD in this way is almost not worth it... unless the packet size is many times larger than the processor size. Sure, in this example we save 1 SIMD instruction to work on 7 of 8 instead of 7 of 16, but I imagine if there were say, 2 of 16 rays, there would still be a lot of wasted resources. At least having a larger packet size would allow for the cases where, say 4 of 64 can be compressed to 4 of 8, or something like that.
This comment was marked helpful 0 times.