<< Deference/Deférrance | Front Page | The Ideal Ulysses >>
Wednesday, August 01, 2007
A crowd’s job of work
This Whimsley post is pretty interesting.
Online DVD rental outfit Netflix caused a real buzz last October when it announced the competition. If anyone can come up with a recommender system for predicting customer DVD preferences that beats its own algorithm (Cinematch) by a certain amount, Netflix will hand over $1million. The prize got a lot of attention because it exemplifies the idea of crowdsourcing. Not only does Netflix rely on crowdsourcing of DVD ratings (user ratings of DVD titles) but the competition itself is an attempt to use crowdsourcing to develop the algorithms to make the most of those ratings. Instead of doing the work itself, or hiring specialists, Netflix lets whoever anyone enter their competition and pays the winner. The competition is still in progress: Netflix says it will run until at least 2011. So now the initial buzz has died down, what can we learn from the Netflix Prize?
It seems as though academic critics ought to have something to say about this - yes, even if they hate the term ‘crowdsourcing’. (Franco Moretti’s book maybe could have been Maps, Graphs, Trees, and Crowds.)
Also, this is awesome:
Customer 2270619 has rated 1975 titles. 1931 were given a 5, 31 were given a 4, 10 given a 3, 2 given a 2 (Grumpy Old Men and Sex In Chains) and a single title was given a 1. That title? Gandhi, which has an average rating of over 4 and which less than 2% of those who watch it give a 1.
It’s a curious feature of the contest that the challenge doesn’t involve figuring out how to throw out, or massage, obviously weird cases but, instead, to predict what weird people will say as well as what possibly sane people will say. It’s like that old Far Side cartoon with the lab coats looking through the glass: “Yes, of course they’re idiots, but what KIND of idiot?”
Another thing that’s strange - the author of the post, Tom Slee, notes as much - is that it seems rather obvious you could get significant improvement by working to improve the other end of the system: namely, the point where people click for the 1-5 stars. Build an even slightly more fine-grained system and you would surely see a hell of a lot better than 10% improvement in the final predictions (he says boldly). Nor is this something extra that you could always go and do later, after you’ve got your spiffy new algorithms. Because, as it stands, mathematicians are pulling their hair out, trying to wangle heuristic routes to answers to questions that could be asked. They are trying to clean one end of the pipe from the other end, with tools that can barely reach, if at all. Example (which Tom Slee himself mentions): accounts with multiple renters will give you bad data in the form of false positive linkages (he likes war movies; she likes “Sex In the City”; little Timmy only watches “Little Einsteins”.) So ask how many users on the account, then when a rating submitted, prompt to specify for which user.
On a more elevated note, you could ask people to register whether they are wearing their Anton Ego critic hat, as it were, or just their Popcorn Id hat. Most people sort of get the difference. Find some intuitive way to register it - hell you could do it with iconic ‘reflective critic’ vs. ‘regular guy yukking it up on the couch’ icons. Let people rate under one or the other heading, or both. It seems as though you could, by offering small incentives, get people to make the few extra clicks that would amount to some rather interesting data, even if the questions are still pretty crude.
It’s not that I think this project is vital to the strength of the republic of letters. But when I see someone slap down a cool million for an answer to what is basically a literary critical question - and when the question is put so crudely, even as the answer promises to be so marvelously sophisticated, I can’t help thinking an opportunity is being lost. The company could do better, gathering data that might fuel any number of quite fascinating, Moretti-style studies. And it would actually make good business sense for them to do so (if this present exercise makes sense.)
Oh, wise crowd of Valve readers: what slightly less painfully simple rating question(s) could Netflix ask customers, such that genuinely interesting (and predictive) data would potentially result?
I’m not a Netflix user. Sadly, that doesn’t work in Singapore.





