Ha, the following query (mySQL) gives the Pearson Score for one critic (userId = 6) against all others in set:

SELECT count(*),

sum(a.rating) ‘sum1’,

sum(pow(a.rating,2)) ‘sum1Sq’,

sum(b.rating) ‘sum2’,

sum(pow(b.rating,2)) ‘sum2Sq’,

sum(a.rating*b.rating) ‘pSum’,

sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*)) ‘num’,

sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*

sum(a.rating) ‘sum1’,

sum(pow(a.rating,2)) ‘sum1Sq’,

sum(b.rating) ‘sum2’,

sum(pow(b.rating,2)) ‘sum2Sq’,

sum(a.rating*b.rating) ‘pSum’,

sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*)) ‘num’,

sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*

(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*)))) ‘den’,

(sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*))) / (sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*

(sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*))) / (sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*

(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*))))) ‘pearson’

FROM data a INNER JOIN data b ON a.itemid = b.itemid AND a.userId <> b.userId

WHERE a.userId = 6

GROUP BY b.userid

FROM data a INNER JOIN data b ON a.itemid = b.itemid AND a.userId <> b.userId

WHERE a.userId = 6

GROUP BY b.userid

Please excuse my lack of blogging skills… I’ll make pretty as soon as I am able.

Best to index appropriately as work that follows may give pc-fan a workout.

And for a list of critics userId 6 should be paying attention to:

SELECT count(*) ‘movies in common’,

sum(a.rating) ‘sum1’,

sum(pow(a.rating,2)) ‘sum1Sq’,

sum(b.rating) ‘sum2’,

sum(pow(b.rating,2)) ‘sum2Sq’,

sum(a.rating*b.rating) ‘pSum’,

sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*)) ‘num’,

sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*

sum(a.rating) ‘sum1’,

sum(pow(a.rating,2)) ‘sum1Sq’,

sum(b.rating) ‘sum2’,

sum(pow(b.rating,2)) ‘sum2Sq’,

sum(a.rating*b.rating) ‘pSum’,

sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*)) ‘num’,

sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*

(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*)))) ‘den’,

b.UserId ‘User’,

(sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*))) / (sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*

b.UserId ‘User’,

(sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*))) / (sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*

(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*))))) ‘pearson’

FROM data a INNER JOIN data b ON a.itemid = b.itemid AND a.userId <> b.userId

WHERE a.userId = 6

GROUP BY b.userid

ORDER BY 10 DESC

FROM data a INNER JOIN data b ON a.itemid = b.itemid AND a.userId <> b.userId

WHERE a.userId = 6

GROUP BY b.userid

ORDER BY 10 DESC

Interesting Note – These critic recommendations are of limited values if critic in question has but a few movies in common with us (userId 6). I bet companies that use these methods for recommending other users, etc. have a sweet spot. i.e. – Too little items in common would skew score one way and too many the other.

Advertisements