Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2016

Parallel Programming Abstractions

Previous | Next --- Slide 2 of 57

Back to Lecture Thumbnails

arcticx

emm... Where are the review slides?

tdecker

Yeah so I also could not find the review slides, but I have a question about the scheduling on the same core vs different cores. If I have a program that forks a new thread but uses shared memory would it be better to schedule that new thread on the same core or a different core assuming that no other threads were running. I understand that the answer will be it depends. I am mostly wondering how much the speed of memory access compares with the cost of being held back by only having 1 execution unit.

karima

@tdecker Let's be careful about our terminology :)

We don't fork threads. We spawn a thread and fork a process. Forked child processes by definition do not share memory spaces with their parent processes. Pthreads on the other hand share memory spaces with their parent thread. For this reason, pthreads are more lightweight than processes; they take less resources to create.

To answer your question, yes the short answer is it depends. It depends on the nature of your program and it depends on the nature of the architecture you're running your program on. For example, it is true that if both of your threads are reading to and writing to memory locations with high spatial locality then having them share a core can save you time lost from cache misses. One thread incurs the cache miss, but in doing so it warms up the cache for the other thread when it wants to read data already loaded by the other thread.

But is not uncommon for cores to share an L3 cache. So even if you have each of your threads are on separate cores, on such an architecture, you still benefit from spatial locality between the two threads' memory access patterns despite them running on separate cores.

Furthermore the cost of an L2 cache miss is typically between 10 to 20 CPU cycles so if the amount of time required for the computations each thread performs per cache miss is more significant than the cost of an L2 cache miss, then for this particular architecture, it would still be better to run the threads on separate cores.

So in short, you need to analyze the nature of your program and the nature of your architecture.

acfeng

@arcticx The review slides are at the end of the previous lecture (lecture 2). I think the review slides are useful as we ended up spending over 30 minutes in classes for it.

kayvonf

https://www.youtube.com/watch?v=u31FO_4d9TY