The following query creates an ‘in-between’ table which saves the similarity scores between myself (user id 944) and all the critics.
SELECT b.UserId ‘UserId’,
(sum(a.rating*b.rating)-(sum(a.rating)*sum(b.rating)/count(*))) / (sqrt((sum(pow(a.rating,2))-(pow(sum(a.rating),2)/count(*)))*(sum(pow(b.rating,2))-(pow(sum(b.rating),2)/count(*))))) ‘PearsonSim’
FROM data a INNER JOIN data b ON a.itemid = b.itemid AND a.userId <> b.userId
INNER JOIN data c ON b.userId = c.userId
WHERE a.userId = 944
GROUP BY b.userid
ORDER BY ‘PearsonSim’ DESC
I’ve choose not to exclude any data as to preserve scores for future use. Much more data can be generated and saved to aide future calculations but this is all needed to wrap up exercise set at hand. I’ve used Pearson instead of Euclidean because I found it easier to work with and book outlines benefits when dealing with certain types of dataset.