Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2017

Scaling a Web Site

Previous | Next --- Slide 22 of 57

Back to Lecture Thumbnails

pht

How does the database partitions dynamically adjust to different kinds of data characteristics?

dmerigou

One solution to adjust the partition dynamically is to use consistent hashing to assign data to storage nodes depending on the hash of the data. With this, adding or removing a storage node does not require the system to reassign all the data but just a reasonable fraction.

Also the partitioning here is called horizontal partitioning, since all the data related to one user is stored on the same node. Vertical partitioning consists in storing the name on one node, the adress on a second node, etc.

pk267

We can partition the data alphabetically, geographically or by hashing. The goal is to distribute the data uniformly. Hashing though divides uniformly gives poor locality*, and so geographical distribution is preferred over hashing-based distribution.

*Users in a given geographical region might be accessing similar data, for example youtube recommendations, images, web results etc.

zale

Wikimedia uses a combination of this architecture and the client/master system: Wikipedias are mapped to databases depending on their size (this is apparently fixed and doesn't change dynamically), and each such database has a master and several slaves.

This probably works well because the relative size of Wikipedias doesn't change quickly (hence no need for dynamic mapping), and reads are very significantly more frequent than writes.

vadtani

Summary for scaling out a database: 1. replication 2. specialization (e.g: user photos) 3. sharding (partitioning the database)