Previous | Next --- Slide 31 of 59
Back to Lecture Thumbnails
jmnash

Which one is more commonly used: replicating or partitioning? I don't remember if this was mentioned in lecture. It seems like there are advantages and disadvantages to both, a sort of trade-off.

On the one hand, with replicating, contention is remedied better because if one duplicate of the database is busy, the worker can just try to access a different one instead of waiting. With partition, if there is contention for a profile in the same alphabetical range, the workers will have to wait. However, on average all parts of the alphabet will probably be accessed equally, or you can break up the letters such that it does work that way.

With writing it is also very different. It is a lot more effort to keep writes consistent with replication, since you are only writing to one database and then eventually updating the others. However, with partition, writing to one database will occupy that portion until the write can be completed. It's not immediately clear to me which one would be faster.

benchoi

Whether replicating or partitioning is better depends on the use case... if the application is such that there are a huge number of read requests but not many write requests (e.g. Google search), then replicating is a better way to go - it may be that tons of people are searching for the same keyword on Google due to recent trends, so partitioning would lead to one database server being very busy relative to the rest.

On the other hand, if there are a lot of writes made very frequently (suppose a central database is used to implement Facebook chat), the effort required to keep everything in sync would be huge and quite wasted. In such a situation, it would make sense to partition (e.g. every chat conversation could have an unique hash which specifies which database server it is stored on).

aew

It seems that, in general, replication is best when we have mostly read requests and partitioning is best when we have many write requests. However, if the distribution of reads to each partition is uneven, the contention for more common partitions may lead to higher request latency. I would think that writes to a partitioned database would be much more efficient than with a replicated one, because for each write you would have to propagate the change to every other slave database and the master. Is this correct, and are there any situations where this would not be true?

lixf

I think you are right. Both are suitable for different read/write patterns. However there's a big advantage of using partition: space. Obviously you don't need to copy everything if you are partitioning. Thus the amount of data you can store is a lot more.

mchoquet

My impression was that replication was mainly used to ensure that data on the databases isn't lost. I imagine that in a real system, data is partitioned out of necessity, then the partitions are replicated to ensure that every piece of data is always accessable.