The convolution in Fourier domain can be referenced here.
Interesting to note that the original FFT algorithm (the Cooley-Tukey algorithm) is one of the early examples of a trivially parallelizable divide-and-conquer algorithm. No asymptotic improvements have been made to computing circular convolution since then, but there is a very healthy area of research around decreasing the number of multiplications for certain data sets. For example, Overlap-add is an efficient way of computing convolution when you have a large image and a small filter (which is often the case in deep learning).
The Winograd algorithm is based off the premise that it is possible to uniquely given its remainder with respect to the given moduli. (Assumptions are that moduli Are relatively prime)