Programming

January 10, 2008

Sharding with Cookie-Based Sessions

I used Rails 2.0 as part of my recent startup venture (actually, I started with 1.2.3 but migrated 2 days after the new release).  The cookie-based session storage was my primary motivation for using the latest release before it has been hardened by wide deployment.  Cookie-based session storage is sexy.  They appropriately move session storage to the user through the utilization of HMAC.  In other words, no scalability bottlenecks will be imposed because of PStore and ActiveRecord sessions.

While the decision to make Cookie-Based session storage the default session storage engine in Rails 2.0 still seems to be a bit controversial, the benefits, assuming security is maintained, are incredible.  In my venture, I used sessions in a way that many developers would find dangerous.  My venture was a dating website targeted to large metropolitan areas, each with it's own sub-domain.  The website emphasized finding people to go on a date with in real-time.  Offline users were not browsable. Users were grouped by metro shard: New York City residents would go to the nyc sub-domain, Philadelphia residents would go to the philly sub-domain, etc.  All of the user's profile information and messages was stored on server pointed to by the primary domain, but once logged in and ready to interact, the user would be dispatched to the appropriate shard.

Here's where things get…unique. Instead of doing MySQL replication to each shard, each shard had its own, unshared database.  When a user decided to interact in a specific metro area, the user clicked a form, disguised as a hyperlink, which submitted their encrypted profile to the appropriate shard.

For example, if the user wanted to find a date in Uptown Manhattan, they would click the Uptown Manhattan link, which did a post to the nyc shard, Uptown Manhattan homepage, sending the encrypted profile silently along.  The nyc shard then checked for tampering, decrypted the profile, and stored it in it's local memory only database.  If the user had had been idle for too long the profile was marked as stale, and a chron job would sweep it away shortly thereafter.

This technique is only worthwhile if there is something that can be sharded but there are clear benefits. The main server handled all permanent changes to the user's data (including message sending) but this was done via posts and redirects.  The main server and each shard were completely decoupled.  They did not, at any point, communicate with each other.  Obviously, this makes for a very scalable website.

I should note that by using this technique I was committing the dual sin of premature optimization and abuse of something cool for the sake of messing around.  That makes me dumb.  My point is, if you have a mature project that is running into scalability constraints but has a possibility for sharding, this technique could be valuable (or at least cool).