Slide View : Parallel Computer Architecture and Programming : 15-418/618 Fall 2016

Previous | Next --- Slide 12 of 69

cloudhary

If I remember it correctly from my signals class, you can apply the filters (each being an impulse response for a certain function) to each other to find one "combined" filter. This might reduce the amount of computations that needs to be done. Has anyone explored such a technique in the past, or might have insights on how well this would work?

acortes

I imagine that is a very common strategy to reduce computation. Assuming you know what filters you want to apply before running the program you could compute their matrix product to get the same filter.

Split_Personality_Computer

If you think about it though, filtering is a memory bound operation, not a computation bound one. So even if it were true that filter X=AC and filter Y=BC and you could calculated I2=ImC, this may not save you as much time as you think because you'll still have to calculate AI2 and BI2 which will probably end up taking just as long as XIm and Y*Im.

EggyLv999

Filtering is definitely not necessarily memory-bound. The filters can get quite large (5x5, 7x7, or even 9x9). That said, we want the results of each filter separately when implementing convolutional neural networks because each filter separately will get fed into the next layer.

Split_Personality_Computer

@EggyLv999 still though wouldn't it be memory bound? Even with 9x9 filters where you have 81 multiplies and 81 adds, won't most of your time still be spent grabbing all of the values from your image? You can't cache your whole image so even if you take 500ish cycles to do the multiplies and adds, won't each read to global memory take you 100s of cycles anyways? (Especially if on a GPU)