PCI – Chapter Two – Ranking the critics.

I decided to rate the movies myself in order to get a feel for how good the recommendations turn out to be. I will later retrieve these with and without movies I’ve seen in the past to examine how close does ‘my predicted score’ is to my gut feeling.

After adding my ratings to the movies from set (about 1/10th of items table)…

Here follows the query for retrieving top 3 critics who most resemble my movie tastes (I am userId 944). This is inline with the top of page 15 in the book.

SELECT (sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*))) /(sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*
(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*))))) ‘Pearson’,
b.UserId ‘User’
FROM data a INNER JOIN data b ON a.itemid = b.itemid AND a.userId <> b.userId
INNER JOIN data c ON b.userId = c.userId
WHERE a.userId = 944
GROUP BY b.userid
HAVING count(a.rating) > 90
ORDER BY ‘pearson’ DESC

Note: In the query above, I’ve restricted resultset to critics with whom I have at least 90 movies in common. I feel too little movies in common lead to vague recommendations. (i.e – Pearson Correlation Factors very close to one) This sounds like an unusually high number and I am sure would lend itself very well to some sort of ‘slider’ in a GUI for adjustment.

These results are going to be the basis for wrapping up chapter two of the book. Results will shortly follow together with my take on these…

 Pearson           User
 ----------------  -------
 0.95346258924559  519
 0.91855865354369  317
 0.90453403373329  808

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s