PCI – Chapter Two – Recommending Movies

The query that follows makes use of the previously saver data to generate movie recommendations for me. This time, I’ve excluded movies that had less than 15 ratings. Movies with a small number of reviews in our dataset do not allow us to infer much new information and i did not find the results insightful.

SELECT sum(b.rating*a.PearsonSim)/sum(a.PearsonSim) ‘Suggested Rating’,
k.rating ‘my rating’ , c.Title
FROM data b INNER JOIN PearsonSim944 a ON b.userId = a.UserId
INNER JOIN item c ON b.itemId = c.movieId
LEFT JOIN data k ON c.movieId = k.itemId and k.userId = 944
GROUP BY b.itemID
HAVING count(b.ItemId) > 15
ORDER BY 1 DESC
LIMIT 20

As you can see, I decided not to exclude movies I’ve seen on purpose on this to correlate my ratings to the predicted rating generated from these recommendation.

And here follows a table of the top movies I should put on my to-watch list 🙂

 Suggested Rating     my rating     Title
 -------------------  ------------  ------------------------------------------------------
 4.7323498717054      (null)        Welcome To Sarajevo (1997)
 4.6998348050636      5             Godfather, The (1972)
 4.6711593732692      5             Boogie Nights (1997)
 4.6088872136439      5             Schindler's List (1993)
 4.6088513709185      4             Star Wars (1977)
 4.6035380915543      5             Raiders of the Lost Ark (1981)
 4.5669663298883      3             Wallace & Gromit: The Best of Aardman Animation (1996)
 4.5381657408068      (null)        Wrong Trousers, The (1993)
 4.5149011723813      4             Shawshank Redemption, The (1994)
 4.4973936132774      5             Empire Strikes Back, The (1980)
 4.4941478597302      (null)        Casablanca (1942)
 4.4654235371938      (null)        When We Were Kings (1996)
 4.4652896069685      5             Godfather: Part II, The (1974)
 4.4557483936264      (null)        Lawrence of Arabia (1962)
 4.4432307373246      (null)        Secrets & Lies (1996)
 4.4371513818567      (null)        Usual Suspects, The (1995)
 4.4330745351495      (null)        Close Shave, A (1995)
 4.4309795230429      (null)        One Flew Over the Cuckoo's Nest (1975)
 4.4176616177839      (null)        Laura (1944)
 4.4137219730932      (null)        Paths of Glory (1957)

Not too bad eh? I am happy with these results and have learned how to generate lots of great information from nothing more than a list of items reviewed (item, reviewer, rating). The possibilities how this can be of use in systems is endless and but the tip of the iceberg of what is to come in book.

NOTE: on copy-pasting query into analyzer… WordPress seems to replace my single quotes with sql incompatible single quotes… revise accordingly.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s