Sunday, August 01, 2004

Memcached

Linux Journal has an interesting article on Memcached - it's a distributed in memory caching system. The article details how it is used to drive sites like LiveJournal and Slashdot and the article is full of other useful performance tuning advice. For example:
In the end, though, it's all a series of trade-offs. Because processors keep getting faster, I find it preferable to burn CPU cycles rather than wait for disks. Modern disks keeping growing larger and cheaper, but they aren't getting much faster. Considering how slow and crash-prone they are, I try to avoid disks as much as possible. LiveJournal's Web nodes are all diskless, Netbooting off a common yet redundant NFS root image. Not only is this cheaper, but it requires significantly less maintenance.
Memcached is really just a distributed hash table - the hashing is two levels deep. The first hash is done on the client to figure out which server to go to and then the second hashing is done on the server. It's all straightforward but in any type of cache, the devil is in the details.
Each Memcached instance is totally independent, and does not communicate with the others. Each instance drops items used least recently by default to make room for new items. The server provides many statistics you can use to find query/hit/miss rates for your entire Memcached farm. If a server fails, the clients can be configured to route around the dead machine or machines and use the remaining active servers. This behavior is optional, because the application must be prepared to deal with receiving possibly stale information from a flapping node. When off, requests for keys on a dead server simply result in a cache miss to the application. With a sufficiently large Memcached farm on enough unique hosts, a dead machine shouldn't have much impact on global hit rates.

0 Comments:

Post a Comment

<< Home