After following all the examples in the chapter (spending most of my time in the Movielens data), I’ve moved the movie data to a mySQL database and calculated sim scores between arbitrary users and whole set. I wish mySQL had better tools…
Before moving on, I’ll play with data and see what I find most interesting. I’ve thought about rating the movies myself to use me as the baseline for comparison… It would surely give me a sense of how ‘good’ these recommendations are.
Darn, sql backup of movieLens (100k reviews, they have a set with one million reviews!) is 3.1 Mb.
Lastly, I am getting stumped with some very simple examples in the book. Seems to me the numbers used in some formulas are incorrect or from slightly different dataset (different ratings). The math sound as are the results. I’m gonna chuck it up to book error data. I’ll follow up on this and see where I miss a step or something.