PCI – Chapter Two – Pearson Score, MovieLens

Ha, the following query (mySQL) gives the Pearson Score for one critic (userId = 6) against all others in set:

SELECT count(*),
sum(a.rating) ‘sum1’,
sum(pow(a.rating,2)) ‘sum1Sq’,
sum(b.rating) ‘sum2’,
sum(pow(b.rating,2)) ‘sum2Sq’,
sum(a.rating*b.rating) ‘pSum’,
sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*)) ‘num’,
sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*
(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*)))) ‘den’,
(sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*))) / (sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*
(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*))))) ‘pearson’
FROM data a INNER JOIN data b ON a.itemid = b.itemid AND a.userId <> b.userId
WHERE a.userId = 6
GROUP BY b.userid

Please excuse my lack of blogging skills… I’ll make pretty as soon as I am able.

Best to index appropriately as work that follows may give pc-fan a workout.

And for a list of critics userId 6 should be paying attention to:

SELECT count(*) ‘movies in common’,
sum(a.rating) ‘sum1’,
sum(pow(a.rating,2)) ‘sum1Sq’,
sum(b.rating) ‘sum2’,
sum(pow(b.rating,2)) ‘sum2Sq’,
sum(a.rating*b.rating) ‘pSum’,
sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*)) ‘num’,
sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*
(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*)))) ‘den’,
b.UserId ‘User’,
(sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*))) / (sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*
(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*))))) ‘pearson’
FROM data a INNER JOIN data b ON a.itemid = b.itemid AND a.userId <> b.userId
WHERE a.userId = 6
GROUP BY b.userid
ORDER BY 10 DESC

Interesting Note – These critic recommendations are of limited values if critic in question has but a few movies in common with us (userId 6). I bet companies that use these methods for recommending other users, etc. have a sweet spot. i.e. – Too little items in common would skew score one way and too many the other.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s