Previous | Next --- Slide 21 of 36
Back to Lecture Thumbnails
hxin

During the good old times, from the 1980's to early 2000's, people cared less about parallelizing their code than they do now. The hardware vendors that put lots of effort in multi-core machines pretty much all lost on the market. The whole industry is driven by Moore's law: the clock frequency and the number of transistors doubles for every 18 months, which translates to doubled single- thread performance. The thinking was... why even bother to write parallel programs if the serial versions get faster with future CPUs?

After 2000, people figured out that there were too many transistors put on a fixed small area that they were too dense for the heat sink to dissolve the heat. The net result was high power consumption and high heat density. Did you know that the Pentium 4 was originally designed to operate at 5 GHz? We only saw Pentium 4 processors max (with normal air cooling) at around 3 GHz. This was because the engineers at Intel later figured out the chip would simply melt away at such high frequency. However, the design was optimized at 5 GHz and running at lower frequencies could not fully unleash its power . In terms of performance, Pentium 4 series was beaten by AMD's same generation chips: Athlon64 (64-bit machine) and Athlon64x2 (2 Athlon cores). Athlon64x2 was the landmark that we were entering the new multicore era, because now the single thread performance stopped increasing (as the clock frequency stuck at 3GHz), the only way to make the computer run faster is to put more cores in it (as the number of transistors still doubles for every 18 months). As a result, programmers are now incentivized to learn how to write parallel programs... that's why we are all here sitting in Kayvon's class.

Enjoy the rest of the class. Peace out.

Xiao

Transistor count wasn't the main problem, in fact transistor count is still following Moore's Law. The real problem is the clock speed driving those transistors, as hxin mentioned. Energy is consumed every time a transistor switches from on to off and viceversa. So higher clock speed causes larger dissipation of heat. Here you can find an more thorough explanation between transistor count and clock speed.

tnebel

Could someone give a high level description of what a transistor does?

GG

Transistors are the basic units of everything in a CPU, cache, ALU, branch predicter etc. Typically, as it says in lecture 1, slide 12, more transistors means larger cache, smarter control and faster computations. And the switching speed of transistor is related with clock rate, as @Xiao says.

martin

transistors are basic units that provides either 1 or 0 (high or low voltage). now intel use 3d-gate transistors in ivy-bridge cpus (http://en.wikipedia.org/wiki/Trigate_transistors#Tri-gate_transistors). It needs less power to sustain and be able to performance stably. The revolution of transistors helps significantly in CPU performance. Not only they are made smaller, they also focus on power consumption (getting the same performance with less current going through ). In addition, as power awareness is growing to be a problem. Intel's next generation cpu (haswell and skylake) puts a lot efforts in heat dissipation.

hxin

As people have pointed out. The reason is that the frequency stopped scaling. Although Moore's law is all about the number of transistors for unit area, however, in the past, smaller transistors typically also enables higher frequency. As a result, David House, Intel ex-CEO extended Moore's law as "who predicted that period for a doubling in chip performance (being a combination of the effect of more transistors and their being faster)" .

Now, the more transistor part keeps on, however, the being faster part has halted. As I and Xiao pointed out above, it's because of high energy consumption. Then the question is: throughout all the years from 1980s to early 2000s, why we have not run into the energy wall? Why we could keep increasing frequency while maintaining a roughly constant power through out those year but we no longer be able to do it now?

To understand this problem, let's first look at the transistor itself. Transistor is a micro electrical switch. Internally, it is a capacitor, a static well. The transistor is an open circuit if the well is empty and is a closed circuit if the well is full filled. In order to switch on and off the transistor, we need electrical energy to fill and drain the static inside the transistor's capacitor. As a result, here is the relation ship: bigger transistor, bigger capacitor, more energy to operate. Smaller transistor, smaller capacitor, thus less energy to operate.

If you think of shrinking transistor, we are shrinking them in 3 dimensions. scaling the transistor in every dimension means shorter length, width and lower height of the transistor. Although it is not exactly the case, but the you can envision it as the capacity of the transistor and hence the energy required to operate the transistor scales cubically with a smaller length in each dimension. If the scaling factor of the transistor in each dimension is S, then the reduction of energy is $ E_{single \space operation \space new} = S^{3} \times E_{old}$

However, as the transistors are laid out on a two-dimension plane, the number of transistor increase inverse quadratically with a smaller dimension. The number of transistors scales as $ N_{number \space of \space transistor \space new} = 1/S^{2} \times N_{old} $ .

Let us now assume the frequency of the processor is $ f $, which is the frequency we fill and drain the processor, and the previous generation of technology manufactures chips with $ N_{old} $ transistors while each transistor consumes $ E_{old} $. As a result, the chip power consumption of the whole chip is $ P_{old} = f_{old} \times N_{old} \times E_{old} $.

With a scaling factor S, the new chip's power consumption is: $ P = f_{new} \times N \times E = f_{new} \times 1/S^{2} \times N_{old} \times S^{3} \times E_{old} = f_{new} \times S \times P_{old} $.

From the equation above, we can observe that we can keep the power roughly the same while increase the frequency by a factor of $ 1/S $ (S is smaller than 1). This is why in the past, we can increase both the frequency of the chip and the number of transistors of the chip.

However, this is no longer the case. After we have scaled down the transistor smaller than 1 micrometer, the operating energy of a single transistor stopped to scale as much as they used to. In another word, $ E_{new} = S^{2} \times E_{old} $ now instead of $ E_{new} = S^{3} \times E_{old} $. There are two causes: first, the transistor is so small such that the capacitor inside the transistor is only a few layers of molecules thick and the capacitance now is determined by the physical property of the $SiO_{2}$ rather than the geometry of the capacitor; second, because the capacitor is so think now, it is no longer considered as a good insulation. The thin capacitor is now leaking a lot of energy and the chip have to constantly fill up the transistor, increasing the static energy consumption.

The no longer perfect scaling energy consumption of a single transistor is the cause of the power wall, which is presented as the frequency stopped scaling.