Is it possible that hiding stalls with multi-threading increases the amount of cache misses? Whenever a context switch happens, the cache might evict the spots used by the last thread and fill them with memory that the current thread needs. More threads means more switches and more cache misses.
@crow True, so if we don't want their cache performance to get worse, we could divide the cache between the two threads. Though, by dividing the available cache to a thread, we effectively also increase the chance of capacity misses. This is also true in the case of the TLB cache, where you would definitely have to divide it among the threads, since the virtual to physical mapping would be different for different threads.
Something that I was really unsure of and have been thinking about during Assignment 1 never ended up being relevant. Why is there no memory latency when the processor has to go fetch the next instruction? The program code sits in memory, right?
I imagine it is just because prefetching for the program code is very sophisticated. But the code's not often linearly traversed, if there are branches and things like that. Does anyone know much about how that works in the systems we are working with?