The 10 million ratings set from Movielens allows us to create two fact tables (linked?!). We can create a fact table for ratings and another one for tags.
Worth noting that a userIds between these two schemas (one from ratings.dat and the other from tags.dat) do match across sets. I.E. – userId 1234 in tags dataset is user 1234 (if existing) in ratings dataset. So we could link these but, for now, its simpler not to.
Information provided in the 10 million ratings set allows us to create similar star schemas as follows:
I’ll be running these proposed schemas by my peers at work and post back any insight I am sure to be missing from these. A lot of design here is based on my perception of what matters and its worth seeking advice before I trail off too much.
It is odd that we have no more information for the users in our set. The smaller, 1 million set does provide move interesting user information but no tags from said users. Perhaps it is worth considering both datasets at the same time and treating them as two different sets of study.