I decided to rate the movies myself in order to get a feel for how good the recommendations turn out to be. I will later retrieve these with and without movies I’ve seen in the past to examine how close does ‘my predicted score’ is to my gut feeling.
After adding my ratings to the movies from set (about 1/10th of items table)…
Here follows the query for retrieving top 3 critics who most resemble my movie tastes (I am userId 944). This is inline with the top of page 15 in the book.
SELECT (sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*))) /(sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*
FROM data a INNER JOIN data b ON a.itemid = b.itemid AND a.userId <> b.userId
INNER JOIN data c ON b.userId = c.userId
WHERE a.userId = 944
GROUP BY b.userid
HAVING count(a.rating) > 90
ORDER BY ‘pearson’ DESC
Note: In the query above, I’ve restricted resultset to critics with whom I have at least 90 movies in common. I feel too little movies in common lead to vague recommendations. (i.e – Pearson Correlation Factors very close to one) This sounds like an unusually high number and I am sure would lend itself very well to some sort of ‘slider’ in a GUI for adjustment.
These results are going to be the basis for wrapping up chapter two of the book. Results will shortly follow together with my take on these…
Pearson User ---------------- ------- 0.95346258924559 519 0.91855865354369 317 0.90453403373329 808