When looking at how much capacity should be built an architect should first look at how much certain capacities cost and then look for the best marginal decrease in data traffic from an increase in cost. For example if we were at the edge of the first steep cliff and we could spend just a little more to increase capacity slightly, then we would get a big decrease in data traffic so that increase in cost was well spent. However, if we were in the middle section of the graph where the decrease in data traffic isn't very steep the architect needs to consider if increasing cost is really worth getting just a little bit less data traffic.
What @stl said leads to the conclusion that it's good idea to implement a cache large enough to hold data up to the second cliff in the diagram, such that the marginal return of additional cache storage is reasonable.
I'm curious as to how chip architects determine the appropriate sizes for L1, L2, and L3 given that we can't just have massive sizes for each cache. It seems like a simple optimization problem, but I'd like to know what techniques are actually used in industry.