Slide View : 15-418/618 Spring 2014

Previous | Next --- Slide 48 of 59

wcrichto

Although memcached is tried and true, it's worth noting that the up-and-coming redis has a larger feature set and seems to do everything memcached does but better.

This comment was marked helpful 0 times.

uhkiv

It was mentioned in lecture that going to disk is worse than going to network and fetching the data from a in memory cache. I thought it was a little bit counterintuitive, since network is unreliable and also slow (maybe even lower bandwidth?). Does anyone have stats comparing the latency and bandwidth of the two approaches?

This comment was marked helpful 0 times.

kayvonf

Someone can check my numbers, but in general it's must easier to get data stored in another machine's memory than it would be to read off local disk. The real advantage is latency. Disk latency is typically measured on the order of a few milliseconds (average seek time). Network latency in a datacenter with quality switches can be measured in a few microseconds.

In terms of throughput, 10 Gbit ethernet is common these days (with 40 and 100 Gbit available and becoming a lot more common quickly). In contrast, 1 Gbit/sec per disk head might be possible in an absolute best case scenario for large contiguous file IO but in practice applications will realize much less. Striping data across multiple disks can improve throughput by adding more heads.

This comment was marked helpful 5 times.

uhkiv

http://brenocon.com/dean_perf.html

Here's the "Numbers Everyone Should Know" from Jeff Dean, which states that

Round trip within same datacenter 500,000 ns 0.5 ms Disk seek 10,000,000 ns 10 ms

Read 1 MB sequentially from network 10,000,000 ns 10 ms Read 1 MB sequentially from disk 30,000,000 ns 30 ms

so it does seem like it much cheaper to take the network route!

This comment was marked helpful 1 times.

Q_Q

Why don't people run memcached on the web server itself? That would remove the network latency, and if the web server can be fitted with enough memory, it seems like that would work equally as well.

This comment was marked helpful 0 times.

kayvonf

It's common to use the term "server" can mean an independent software component of the web site. For example, the web server (e.g., Apache, nginx) is a system that handles http requests and provides responses, the database server handles database queries (e.g., mySql, Mongo), and memcached and Redis) is an in-memory key-value stores that handle get/put requests. All the these "servers" may be running on the machine machine, but for modularity purposes they are distinct systems.

Moreover, they might be running on different virtual machines, which ultimately end up on the same physical box.

This comment was marked helpful 0 times.

black

Facebook is using Memcached to support their billions of requests per second. And they also came up with an idea called 'lease' to fix the memcached data consistency problems as they put data physically in several data centers around the world.

This comment was marked helpful 0 times.

retterermoore

Wait, but if you get the data from another machine's memory, don't you have to wait for both network latency and disk latency, since that machine has to get the requested data from its own memory?

This comment was marked helpful 0 times.

nrchu

It's in memory (RAM), not in disk of the other machine.

This comment was marked helpful 0 times.