Motivation often comes when multiple things come together. I started running three years ago when my (first and second) doctor said I was fat, Zombies was all the craze and Zombies Run, the phone app, was created. I’ve since made running a part of my life and my life is much better now but I digress.
Sometimes, the end results are not this significant. Regardless, Amazon recently discounted the Amazon Echo to $40, I had a week of forced vacations at work and also had to sit house while some tile work. On a very sour side note, a lot of notable people are passing away. This past week, George Micheal, Carrie Fisher (and Carrie’s mom, Debbie Reynolds) and Vera Rubin left us. Having been playing with Alexa for a few days, I wondered: ‘How difficult would it be? – to have Alexa tell me who just died’.
I scored top 65% ranking on the private leaderboard, which counts as the official score for this contest. Of 1,969 teams, my team (myself) ranked 1,261 using RMSLE as the measurement of accuracy with 0.56330.
For the public leaderboard however, I got 18% ranking instead with an RMSLE of 0.45970.
This second score is a lot better because I was able to submit my model for scoring, up to three times a day, and refine my model accordingly. At some point, I was was ranked top 11% but this was short lived, lasting no more than a week or so before some statistic giants woke up and ate my lunch.
Having defined the problem in the previous post, I’ve decided to attempt to make a first prediction to address it. Per the submission requirements, this requires us to use the complete dataset to supply a csv file with both the id of the ‘delivery’ and the predicted adjusted demand of it. Continue reading
With the database from the last post in mind, we can now go over the information provided for this contest. Most interesting to me, is the distribution of inventory delivered versus inventory returned.
Above, we can see the number of units sold each week. The green portion of the bar indicates the number of units consumed and the red portion indicates the number of units returned (unsold) from the previous week.
Here we can see the monetary amount for units sold per week, together with the monetary amount not sold from the units returned the from the previous week.
Lets prepare the data that gets us here.
For complete information on this competition, please go to Maximize sales and minimize returns of bakery goods. In a nutshell, Group Bimbo, makers of cookies from our childhood, presents an optimization problem with a lot of data in the hopes of delivering the right amount of inventory to meet, but not over estimate, demand.
My interest in this competition comes from a random email from Kaggle and a fondness for cookies common in lunchboxes of our youth. Zero Kaggle experience and equal experience in the problem at hand makes for an interesting problem to look at.
Picking this topic up from the last post, I focused on enriching the data released. This will allow further exploration of this data.
Lets use our previous schema as our starting point. The previous post produced a good starting point for the task at hand. The records from the previous post were stored in a table as shown in Figure 1.
Figure 1 – License plate table readings.
Browsing Hacker News, I recently found out about the City of Oakland releasing almost 3 million records of license plate reader data. The conversation there is way better than any blurb I could come up with. However, this is a neat opportunity to mine this data as an academic exercise.
From the source, they are hosting a list of CSV files with various bits of information. Common to all files, and of critical importance is the date and time of the tag reading and the latitude and longitude of each reading. Supplemental information as the site of the reading and source of such is often given as well. Most worrisome is the fact that the data has not been cleansed and includes the actual license tag for each reading instead of some ID. This would be the first thing to go after for data to be re-shared and used here. Continue reading