Memcpy just moves some bits in memory. This slide shows that it'd save time if memory could do it by itself without involving the processor.
This follows the general hardware design trend in recent years of adding "accelerator" units for dedicated operations. This is a fairly interesting paper benchmarking zero-copy memory systems.
What I don't understand here is why did it take so long to come up with this idea?? DMA controllers already existed and they are based on essentially the same idea! (i.e bypass the processor to perform the data transfer.)