When an elastic web server experiences a burst of requests, it can scale up the number of available servers to be able to service each request and prevent the request queue from piling up. I think Kayvon mentioned that in order to be able to scale up and down, the servers need to be stateless. On the next slide, after a burst, we see that the elastic web server can scale down to a number of servers that can sustain the average load request.
Like how Apache has a preset number of worker threads in its worker pool (which may not be doing work but are in standby incase a heavy load needs to be processed), there are a number of web servers here that were on standby, and have been activated after a high workload was detected.
It is better to have a bunch of web servers initialized but in a standby/sleep mode when the workload is low, and instantiate them during times of high workload. Otherwise if suddenly confronted with a high workload, the latency of initializing these web server instances at runtime would become very high.
@kipper I think it is more accurate to use "scale out" here. "scale up" means that we use larger resources while "scale out" means provisioning extra resources.
@xiaoguaz, oh okay, thanks for the clarification! I wasn't aware of the distinction.
And these extra web servers are spun up from the hardware resources that we already have, since we have the hardware resources to accommodate peak load?