I think it was touched upon in lecture, but I might have missed it - I understand why GPUs want to have a heavy focus on multi-threading, but why is it that CPUs focus more on memory (caching and prefetching)? Does it have to do with the size needed on chip to support more cores? The type of processing done on CPUs vs GPUs? I feel like if given the opportunity, a heavier focus on multi-threading would be just as beneficial on CPUs.
@jkorn Applications run on CPU usually prefer low latency over high throughput, so the designers focus on decreasing latency. They prevent cache misses by having larger caches and prefetching. Besides that, fewer threads could hide latency by decreasing the frequency of context switching. On the other hand, throughput is the main concern for GPU, so they need huge memory BW as it usually become the bottleneck for throughput.
Since our main program thread will be running on the CPU, don't we also have to worry about getting data into and then back out of the GPU memory in the first place?
The CPU is designed to be general purpose, striking a balance between programs which are possibly parallel, resulting in multiple executors and cores, programs which are sequential, resulting in the moderate number of executors and cores, and programs which may use many different types of data, resulting in the larger cache
The GPU is designed to run a small number of operations on a relatively small amount of data many times, resulting in the the large number of cores and smaller cache; there must be a large number of the same instruction or independent instructions in order for the GPU to be fully utilized
Up to this point, I never really had a clear idea of what a processor really is. Now, correct me if I'm wrong, but it seems that processors come in two flavors: CPU and GPU. There is no fundamental difference in what they do (as in, one can do anything that the other can do with varying efficiency). However, they are different in what they are optimized to do. Now, I am getting some sense of those differences from the previously written answers, but can someone give me some concrete examples of tasks/programs that CPU's would be good at, but GPU's would be bad at and examples of tasks/programs that GPU's would be good at but CPU's would be bad at and why?
CPU: Print out one million file names in order
This will probably be better on a CPU because of the larger cache size; subsequent prints will run faster because more unique data can be accessed via the cache. The names must be printed out sequentially so it is impossible to run many tasks at once. Perhaps some of the cores could be used in parallel to do something to each character, but since they are file names, this would be on the order of 10 or so.
On the other hand, this task would cost a lot of cache misses for a GPU with a smaller cache. Even if there were a smaller number of files, this would not necessarily run slower, but it would be a huge waste of the GPU's parallel capabilities (inefficient).
GPU: Perform one hundred processing operations on every pixel in an image
Understandably, a graphics related task runs best on a GPU. In this example, only one image needs to be loaded from memory, so the small cache size is not an issue. Say perhaps that the GPU had enough cores to process every pixel at once, something on the order of 1000. Then the total time would only take one hundred steps.
When performing the same task, a CPU would need to loop through the entire image. Even if it could utilize ~10 cores at a time, it would still need to loop through the image one chunk at a time, and repeat this one hundred times. This would take much longer.
@xka. Saying that processors come in two flavors is too strong of a statement in my opinion.
Instead, I would say that there are a number of key principles used to design modern throughput processors (they key ideas in this lecture!), and that different designs choose to embody those principles in different ways. Commodity CPUs and GPUs are two points near the ends of the design space. A CPU in your laptop or desktop features a small amount of multi-core parallelism, multi-threading, and modest SIMD processing, but retains a lot of traditional processor features that ensure reasonably good single threaded performance. On the other hand, GPUs choose to aggressively maximize for throughput for data-parallel workloads (and this choose wider SIMD, a heavy amount of multi-threading, at the cost of very poor single-threaded performance.) There are other design points as well. For example, Intel's Xeon Phi line of processors sits in between the traditional CPU and GPU extremes. It features 72 cores, 4 threads per core, 16-wide SIMD, and only modest single threaded performance.