PCI – Chapter Two – Update!

After following all the examples in the chapter (spending most of my time in the Movielens data), I’ve moved the movie data to a mySQL database and calculated sim scores between arbitrary users and whole set. I wish mySQL had better tools…

Before moving on, I’ll play with data and see what I find most interesting. I’ve thought about rating the movies myself to use me as the baseline for comparison… It would surely give me a sense of how ‘good’ these recommendations are.

Darn, sql backup of movieLens (100k reviews, they have a set with one million reviews!) is 3.1 Mb.

Lastly, I am getting stumped with some very simple examples in the book. Seems to me the numbers used in some formulas are incorrect or from slightly different dataset (different ratings). The math sound as are the results. I’m gonna chuck it up to book error data. I’ll follow up on this and see where I miss a step or something.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s