« List people | Main | Ordering error_messages_for in Rails »

January 10, 2008

Sharding with Cookie-Based Sessions

I used Rails 2.0 as part of my recent startup venture (actually, I started with 1.2.3 but migrated 2 days after the new release).  The cookie-based session storage was my primary motivation for using the latest release before it has been hardened by wide deployment.  Cookie-based session storage is sexy.  They appropriately move session storage to the user through the utilization of HMAC.  In other words, no scalability bottlenecks will be imposed because of PStore and ActiveRecord sessions.

While the decision to make Cookie-Based session storage the default session storage engine in Rails 2.0 still seems to be a bit controversial, the benefits, assuming security is maintained, are incredible.  In my venture, I used sessions in a way that many developers would find dangerous.  My venture was a dating website targeted to large metropolitan areas, each with it's own sub-domain.  The website emphasized finding people to go on a date with in real-time.  Offline users were not browsable. Users were grouped by metro shard: New York City residents would go to the nyc sub-domain, Philadelphia residents would go to the philly sub-domain, etc.  All of the user's profile information and messages was stored on server pointed to by the primary domain, but once logged in and ready to interact, the user would be dispatched to the appropriate shard.

Here's where things get…unique. Instead of doing MySQL replication to each shard, each shard had its own, unshared database.  When a user decided to interact in a specific metro area, the user clicked a form, disguised as a hyperlink, which submitted their encrypted profile to the appropriate shard.

For example, if the user wanted to find a date in Uptown Manhattan, they would click the Uptown Manhattan link, which did a post to the nyc shard, Uptown Manhattan homepage, sending the encrypted profile silently along.  The nyc shard then checked for tampering, decrypted the profile, and stored it in it's local memory only database.  If the user had had been idle for too long the profile was marked as stale, and a chron job would sweep it away shortly thereafter.

This technique is only worthwhile if there is something that can be sharded but there are clear benefits. The main server handled all permanent changes to the user's data (including message sending) but this was done via posts and redirects.  The main server and each shard were completely decoupled.  They did not, at any point, communicate with each other.  Obviously, this makes for a very scalable website.

I should note that by using this technique I was committing the dual sin of premature optimization and abuse of something cool for the sake of messing around.  That makes me dumb.  My point is, if you have a mature project that is running into scalability constraints but has a possibility for sharding, this technique could be valuable (or at least cool).

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/2825034/25001502

Listed below are links to weblogs that reference Sharding with Cookie-Based Sessions:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Interesting architecture. Do people ever span locations? And being too simple is a sin too. You are allowed to user your brain :-)

Do people ever span locations? Sort of. The site did not require that a user self-assign a shard. Meaning, when the user wants to find a date in NYC they clicked NYC. It was not part of their profile. Therefore, a user could conceptually visit any region they wanted (almost like having a Tryst dating franchise in any city). Moreover, because they central server was the only one that could update the users persistent state, the spam controls were still on the central server.

The setup had a few subtle points that I didn't mention in the article, such as an expiration field in the session that was only updated by the central server. This was done with a periodical Ajax call to the central server (every 30 minutes). If the user's profile contained spam, the central server would opt not to update the expiration date which was protected by HMAC. Therefore, the profile would soon be swept away by the cron job.

Using hypermedia to carry state across shrads and avoid server-side data replication? Quite an interesting technique!

Post a comment

If you have a TypeKey or TypePad account, please Sign In