Picking this topic up from the last post, I focused on enriching the data released. This will allow further exploration of this data.
Lets use our previous schema as our starting point. The previous post produced a good starting point for the task at hand. The records from the previous post were stored in a table as shown in Figure 1.
Figure 1 – License plate table readings.
In the previous post, we replaced the actual car license plate (TAG) with a key because of privacy concerns. This time, we are going to do the same operation for the location information in the table. Reviewing our model from the last post, our database looks like Figure 2.
Figure 2 – Database Model with Tags, our first dimension.
The general idea is to fetch all the tags that show up in the readings table and insert them, no duplicates, into their own table. To this table (the tag dimension table) we then add a key. Lastly, we take that key and replace all occurrences of said tag in the readings table with its key instead.
Similarly, lets extract a location dimension from the readings data.
We see that the records provided have location information in the form of latitude and longitude. For each of these values, we have 6 decimal places. The more decimals, the more accurate the location provided is. Here is a very nice overview on latitude and longitude applicable to our task.
From said post, we borrow the information in table 1.
Table 1 – Coordinate precision by decimal places and scale.
As we can see, 6 decimal places provides a tag reading with about 4 inches of precision. This is overkill for this post. Lets assume that a car usually parks in the vicinity of the driver’s destination. With 4 decimal places, we have about 33 feet of accuracy. Lets assume the width of a house or small office to be about 50 feet so 33 feet is still a bit too much. Using 3 decimal places would give us a location accuracy of 328 feet. This sounds more sensible to place a car on a map. Keep in mind also that the more accuracy we preserve, the more taxing any calculations are and, in our case, they would not necessarily provide a better output.
Even thou we will use 3 decimal places, we do not want not loose the resolution provided. Because of this, we are going to keep the original values in the main table and use 3 decimal places in our dimension table.
Location Dimension Creation
- Create a new table, name it Dim_Location with the following initial columns: ID, Latitude, Longitude.
- Populate this new table with every unique latitude/longitude combination from the tag readings table.
- To the tag readings table, add a new column for location key.
- Populate this new column from the location dimension’s ID that corresponds with said coordinates.
Figure 3 – Newly created location dimension
Just a few more steps. Having latitude and longitude in our location dimension allows us to grab a physical address as well. There are many ways, some free, to do this. I used the excellent Texas A&M Geo Services. They have both, free and paid tiers that would fit many enthusiast’s needs. It takes a while to go thru almost 3 million records on their free tier but, after a few days, we end up with the following populated schema.
Figure 4 – Completed location dimension.
Our schema now provides us with a lot more information. We can see that Oakland pretty much blankets the city scanning for expired tags, etc. These records spans four years and cover pretty much all of Oakland.
Image 1 – Tag readings in Oakland
Breaking down the readings by year, we can see reading activity increase as time goes by. Maybe it’s a matter of time before a city can scan all the streets in a day.
Image 2 – Tag readings per year
Lets now pick a day, the most active day to be exact, to see what type of reach a city has. At 21,828 readings, September 9, 2012 comes in at the busiest license plate scanning date we have. That is a lot of readings in one day.
Image 3 – Tags read on September 9, 2012. Color darkens as as time advances.
Lastly, this is a 5 minute increment animation for this date. It seems that the Oakland Police Department has at least one dedicated patrol car scanning plates on the city. Most times, you can see two or three covering Oakland.
Image 4 – 24 hour license plate time span in 5 minute increments.
Imagine this happening in hundreds of police departments across the US and you have a legit business case in catching cars with expired cars. Follow Uber by making these scanners cheap and put one in every car and there would be truly nowhere to hide.