Kaggle – Bimbo Group Wrap-up

Report

I scored top 65% ranking on the private leaderboard, which counts as the official score for this contest. Of 1,969 teams, my team (myself) ranked 1,261 using RMSLE as the measurement of accuracy with 0.56330. 

image

 

For the public leaderboard however, I got 18% ranking instead with an RMSLE of 0.45970.

This second score is a lot better because I was able to submit my model for scoring, up to three times a day, and refine my model accordingly.  At some point, I was was ranked top 11% but this was short lived, lasting no more than a week or so before some statistic giants woke up and ate my lunch.

 

Continue reading

Kaggle – Grupo Bimbo Preparing The Data

With the database from the last post in mind, we can now go over the information provided for this contest.  Most interesting to me, is the distribution of inventory delivered versus inventory returned. 

image

Above, we can see the number of units sold each week.  The green portion of the bar indicates the number of units consumed and the red portion indicates the number of units returned (unsold) from the previous week.

image

Here we can see the monetary amount for units sold per week, together with the monetary amount not sold from the units returned the from the previous week. 

Lets prepare the data that gets us here.

Continue reading

Kaggle – Grupo Bimbo Inventory Demand

bimbovainilla

For complete information on this competition, please go to Maximize sales and minimize returns of bakery goods.  In a nutshell, Group Bimbo, makers of cookies from our childhood, presents an optimization problem with a lot of data in the hopes of delivering the right amount of inventory to meet, but not over estimate, demand.

My interest in this competition comes from a random email from Kaggle and a fondness for cookies common in lunchboxes of our youth.  Zero Kaggle experience and equal experience in the problem at hand makes for an interesting problem to look at.

Continue reading

Oakland License Plate Readings Database

Picking this topic up from the last post, I focused on enriching the data released.  This will allow further exploration of this data.

Lets use our previous schema as our starting point.  The previous post produced a good starting point for the task at hand.  The records from the previous post were stored in a table as shown in Figure 1.

image1

Figure 1 – License plate table readings.

Continue reading

City Of Oakland Plate Reader Data

Browsing Hacker News, I recently found out about the City of Oakland releasing almost 3 million records of license plate reader data.  The conversation there is way better than any blurb I could come up with.  However, this is a neat opportunity to mine this data as an academic exercise.

From the source, they are hosting a list of CSV files with various bits of information.  Common to all files, and of critical importance is the date and time of the tag reading and the latitude and longitude of each reading.  Supplemental information as the site of the reading and source of such is often given as well.  Most worrisome is the fact that the data has not been cleansed and includes the actual license tag for each reading instead of some ID.  This would be the first thing to go after for data to be re-shared and used here. Continue reading

Windows Live Writer Has Been Open Sourced!

This is good news.  Windows Live Writer, my blog editor of choice, is now open source!  Rechristened Open Live Writer, it has moved to the .NET Foundation briskly as a 0.5 version… I want to imagine this is to prevent its death in a closet at Microsoft but you can read all the details at the release instead.

It has been on life support for about 3 years and its still one of the very few Windows must installs for me.  Some highlights include best offline post writing available and seamless integration with WordPress.  It will be refreshing to see Live Writer Open Live Writer be further developed after so long.  Neat.