This slide talks about data caches reducing stalls. Modern processors have separate instruction and data caches. Is there a reason why only data caches are mentioned here? Even instruction caches can reduce the number of stalls, right?
@afa4, I think this slide is talking about prefetching. On this slide, the reason why only data cache is mentioned is because we assumed that the instructions have already been made available in some way (maybe instruction prefetch, or it's already in cache etc.) and that the performance is bottlenecked by the memory accessing latency.
The point is that prefetching can reduce this stall by analyzing the memory accessing patterns and issue loads before the execution flow gets there.
I have a question regarding cache data prefetching. Let's take an array and a linked list. Could we consider that in general iterating through elements in an array would be faster than elements in a linked list?
The simplest algorithm for prefetching data would be to cache all data next to some data element we are accessing. Therefore, iterating through an array would be quite fast as all next elements would be really likely to be cached.
Then, in a linked list, predicting the next accessed data would be more complex as it would have to know that the next element is pointed by the current one somehow. And it is likely that the next element would not even have been cached, so we'd have a cache miss maybe for each next element. The allocation algorithm would also have an impact on how elements are scattered.
EDIT : interesting video of Mr. C++ (Bjarne Stroustrup) discussing of the use of linked lists vs. array
I'm a little confused on how prefetching works from a hardware standpoint. Is there a unit that is solely responsible for analyzing memory access patterns and prefetching data into the cache that operates independently from the processor executing instructions?