Something's Gotta Give

Saw this clarification on TC regarding Twitter
the issue certainly isn’t with creating new messages, the issue is when you check your own messages For eg, if I am following 200 people, the query needs to check the new messages from all those people and sort them into date order. That query would be really intense link »
- from Twitter At Scale: Will It Work? via sharedcopy.com

I’d heard that earlier in the Gilmor Gang : Blaine clarified that static resources (e.g. profile images) are no issue at all, and that the the bigger issue is with each person’s “home page”, where multiple streams are joined and ordered chronologically. When 1 of the connected people update, all related cached “home pages” are obsolete - hence they are generated painfully often. (so don’t refresh your own page so much!)

I’m not a scaling expert. Nor do I know anything about Twitter’s real problems. So treat this as noise maybe.

Anyways, ignoring other things unknown and purely focusing on this particular aspect… I thought it’d be interesting to consider what we might gain, if we’re just willing to give up certain things?

So.. assuming static resources ain’t a problem, lets try to have more of those! A single person’s stream (without friends) looks cachable - like a blog that generates html instead of serving dynamic. A single person’s social graph (following who) looks cachable as well. If assumptions are wrong here, then exit(1); So a person’s homepage is his own static stream with static external references to static streams of ppl he follows. Like a html page referencing a bunch of CSS. No server side join or sort. Instead, the browser, with the help of some static javascript, pulls in those resources, does the merging and sorting (and paging) and spit a pretty display similar to the current. i.e. every user’s browser does its own merging and sorting, you know… like, help out will ya?

Would that be better? And what did we give up? The ability to properly browse your own Twitter homepage on a browser without JS & clients using API need a few more pulls and do merging / sorting themselves too.

Can we live with that?

Update: Slim pointed to an article in the comments, and “Part III” of that article, Eran had talked abt the same alternative:
if you only ask this once, the server side batch solution will be faster. But when the client starts asking this once a minute or more repeatedly, resources go to waste and scalability is more expensive. link »
Once the server is only serving data with limited scope, usually providing the messages of a single person with the optional perspective of the reader (to enforce access control), scaling becomes a much easier task as data can be segmented easier. link »
- from Hueniverse: Scaling a Microblogging Service - Part III via sharedcopy.com

Update: Then again, you have scoble who follows 30k other people making it impractical for the client to fetch those streams independently… I guess most systems would decide to place limits and “punt” the issue, e.g. you can have at most 5000 friends. And that’s something Twitter is unwilling to let go.