Movielens Dataset – One Million

After some thought, I’ve decided to switch data sources from the ten million movie rating set to the one million movie rating set.  As seen below, this dataset just has lots more interesting data which will provide with more dimensions to explore.

The exact same data scrubbing applies (same sql as well) as I had done in the other data set a few posts ago.  Also, all the secondary supporting data generated (time, date dimensions) will fit just as well.

Unfortunately, we loose the ability to dig into description tags applied to movies by movie reviewers.  On the bright side, We have information on gender, age range, location (zip code) and occupation.  Clearly, the looses are less than all this information gained and will make for a much stronger dataset to learn from.

As previously mention, all side work done for ten million rating set will be reused here including time, date dimensions, etc.  No loss creating these either.

Advertisements

One thought on “Movielens Dataset – One Million

  1. Pingback: Movielens OLAP – Database Download « Mario Talavera Writes

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s