So the slide title says that this is an idea - does anyone know if it's been implemented anywhere? Also, would you have to add new instructions to x86/your standard of choice and change the C compiler? Can this behavior be produced in the DIMMs we use now or would we need to make new ones? I don't think it'd be a huge issue if we had to make new ones because it doesn't seem like any expensive extra hardware is required.
Someone brought this up in lecture, but just to reiterate their comment and Kayvon's response, this method really only works if you want to copy the contents of an entire row of one DRAM chip to another row of the same DRAM chip.
I wonder what proportion of memcpy operations typically performed by users are of this form. I would think memcpy might be used primarily to alter data locality, so maybe not many?
If our address space is byte-interleaved across DRAM chips, then it's less likely that a user's memcpy will be copying an entire row of DRAM.