Movielens – Completing fact tables

With the previous schema changes in place, it is now a matter of running queries in order to complete our fact_rating and fact_tag tables.  Both these tables are missing timeId and dateId.

For the table fact_rating, the following potentially long time queries need be executed. For timeId:

UPDATE fact_rating SET timeId = (
SELECT a.timeId FROM dim_time a
WHERE a.military = hour(from_unixtime(fact_rating.timestamp))
AND a.minute = minute(from_unixtime(fact_rating.timestamp))
)

For dateId:

UPDATE fact_rating SET dateId = (
SELECT a.dateId FROM dim_date a
WHERE a.year = year(from_unixtime(fact_rating.timestamp))
AND a.month = month(from_unixtime(fact_rating.timestamp))
AND a.day = day(from_unixtime(fact_rating.timestamp))
)

Similarly, for the table fact_tag, we run the following for timeId:

UPDATE fact_tag SET timeId = (
SELECT a.timeId FROM dim_time a
WHERE a.military = hour(from_unixtime(fact_tag.timestamp))
AND a.minute = minute(from_unixtime(fact_tag.timestamp))
)

For dateId, again, the snippet is similar to the fact_rating one above:

UPDATE fact_tag SET dateId = (
SELECT a.dateId FROM dim_date a
WHERE a.year = year(from_unixtime(fact_tag.timestamp))
AND a.month = month(from_unixtime(fact_tag.timestamp))
AND a.day = day(from_unixtime(fact_tag.timestamp))
)

It is worth adding a where clause to each of these and exclude items with id being updated equals null. (I.E. – for dateId,  append ‘WHERE dateId is null’ at the end of update query.  Some of these may (just may) take days so this is a no brainer although I didn’t do in all cases.

If anyone is thinking about how long these take, I recall the timeId population for table fact_rating took my Core 2 Duo 2.26, 4GB of ram laptop 18 hours and change.  Its an ‘execute and go to bed’ operation for sure for me.  Looking forward to having a completed star-schema to play with!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s