Web Server Elasticity
By raggi, arihant, and cdafc
Due on 2013-04-27 23:59:00

Go to Lecture 14 slides or video

Note: Since it is probably too difficult to write a clear article of a reasonable length that covers the scope of the entire lecture, we are going to focus on one specific topic from it. The topic we picked is Elasticity, about which we will write a clear and detailed article after giving a brief overview of the lecture.

Overview

Lecture 14 introduces the basics of scaling a website. This means keeping the behavior of a web server the same as load (broadly, the number of user requests) increases. Cluster-based solutions are being widely adopted for implementing parallel, flexible, scalable, low-cost, and high-performance web server platforms. The goal for a good implementation is for the web server to deliver superior throughput and latency; that is to say, to respond to all requests as quickly as possible (minimizing response time) while maintaining reasonable cost (minimizing number of servers).

However, a problem that arises when developing web server platforms is the burstiness of web traffic. That is, the ratio of the peak to average data rate is quite high. A typical web traffic source on the Internet does not generate data at a constant rate, but rather is very bursty in nature, where activities and events generate bursty traffic patterns intermixed with relatively long periods of silence.

Image alt text

A good example of a bursty site load (see Figure 1). The figure above shows Amazon.com page views in the years 2011 and 2012. It clearly suggests that it gears up for bursts of purchases during holiday shopping seasons, and runs on average case load for the rest of the time.

xs33

Since holiday shopping seasons occur every year, a site such as Amazon can make a fairly accurate estimate on the number of web servers that need to be available during those times, and can adjust the number of web servers they use according to past observed data.

Relatively, one of the main challenges of implementing those web server platforms is the correct dimensioning of the cluster size, so as to satisfy variable and peak demand periods. In other (rather simpler) words, how many web servers does one need in order to achieve optimal performance at all times? This is somewhat speculative because provisioning for the common average case load will result in poor quality of service (or failures) during peak usage, while provisioning for the peak usage case will result in many idle servers most of the time. Luckily, the answer to this problem lies in utilizing elasticity.

Elasticity

The main idea of elasticity is to be able to scale out on demand by automatically adding or shutting down web servers based on measured load. The web server thus will have the capability to elastically adapt to variations in request stream load. A good implementation of this will use more machines under times of high load (to minimize response time), and use fewer machines in times of low load (to minimize cost).

In this context, virtualization is being adopted by many organizations, such as Amazon and Google, as a solution not only to provide service elasticity for customers, but also to consolidate server workloads and improve server utilization rates. A virtualized web server can be dynamically adapted to the client demands by deploying new virtual nodes when the demand increases, and powering off and consolidating virtual nodes during periods of low demand. Therefore, those organizations that encounter a frequent over-provisioning problem, and can afford such solutions, own many machines and make available (virtually) for rent at other times to others in need of compute. An example of this is Amazon's elastic compute cloud (EC2), which is available nowadays for rent publicly.

Putting everything together, resources from one infrastructure can be complemented with a cloud provider (cloud bursting), so that peak demand periods can be satisfied by deploying cluster nodes in the external cloud, on an on-demand basis. This framework enables elasticity, with which one can ensure that the number of web servers used, such as Amazon EC2 instances for example, increases seamlessly during demand spikes to maintain performance, and decreases automatically during demand rests to minimize costs.

As a concrete example, consider a running web server as shown in Figure 2 below. This configuration of the web server represents a standard website with a standard parallel set-up.

Image alt text

Now, consider an event that causes a lot of activity to be carried on on the website, which brings many new requests added promptly to the queue. At that moment, the site performance monitor detects high load, and as a result instantiates new web server instances before informing the load balancer about the presence of new servers. See Figure 3 below for an illustration of this step.

Image alt text

After the peak of high load is over, the opposite happens as illustrated in the below figure; the site performance monitor detects low load, kills the extra server instances to save operating cost, and finally informs the load balancer about the loss of web servers. This entire process balances out the response rate and consequently enables the website to scale up and down fairly quickly.

Image alt text

Questions

  1. What would be the best web server site configuration for unpredictable sites like Twitter?
  2. How do you determine how many web servers (worker nodes) should be used?
    chaominy

    Elastic determined according to the real time workload, i.e. define some metric to be scaling criteria.

  3. Why can the web servers be safely fired up or killed off?
  4. How do you determine when to start/kill a new web server?
    xs33

    One way to determine when to kill a web server is to have some criteria that depends on the amount of time the web server is idle. When to start a web server depends on the number of requests and the number of workers that are currently in use.