Kaggle – Grupo Bimbo Preparing The Data

With the database from the last post in mind, we can now go over the information provided for this contest.  Most interesting to me, is the distribution of inventory delivered versus inventory returned. 

image

Above, we can see the number of units sold each week.  The green portion of the bar indicates the number of units consumed and the red portion indicates the number of units returned (unsold) from the previous week.

image

Here we can see the monetary amount for units sold per week, together with the monetary amount not sold from the units returned the from the previous week. 

Lets prepare the data that gets us here.

Continue reading

Kaggle – Grupo Bimbo Inventory Demand

bimbovainilla

For complete information on this competition, please go to Maximize sales and minimize returns of bakery goods.  In a nutshell, Group Bimbo, makers of cookies from our childhood, presents an optimization problem with a lot of data in the hopes of delivering the right amount of inventory to meet, but not over estimate, demand.

My interest in this competition comes from a random email from Kaggle and a fondness for cookies common in lunchboxes of our youth.  Zero Kaggle experience and equal experience in the problem at hand makes for an interesting problem to look at.

Continue reading

Oakland License Plate Readings Database

Picking this topic up from the last post, I focused on enriching the data released.  This will allow further exploration of this data.

Lets use our previous schema as our starting point.  The previous post produced a good starting point for the task at hand.  The records from the previous post were stored in a table as shown in Figure 1.

image1

Figure 1 – License plate table readings.

Continue reading

City Of Oakland Plate Reader Data

Browsing Hacker News, I recently found out about the City of Oakland releasing almost 3 million records of license plate reader data.  The conversation there is way better than any blurb I could come up with.  However, this is a neat opportunity to mine this data as an academic exercise.

From the source, they are hosting a list of CSV files with various bits of information.  Common to all files, and of critical importance is the date and time of the tag reading and the latitude and longitude of each reading.  Supplemental information as the site of the reading and source of such is often given as well.  Most worrisome is the fact that the data has not been cleansed and includes the actual license tag for each reading instead of some ID.  This would be the first thing to go after for data to be re-shared and used here. Continue reading

Windows Live Writer Has Been Open Sourced!

This is good news.  Windows Live Writer, my blog editor of choice, is now open source!  Rechristened Open Live Writer, it has moved to the .NET Foundation briskly as a 0.5 version… I want to imagine this is to prevent its death in a closet at Microsoft but you can read all the details at the release instead.

It has been on life support for about 3 years and its still one of the very few Windows must installs for me.  Some highlights include best offline post writing available and seamless integration with WordPress.  It will be refreshing to see Live Writer Open Live Writer be further developed after so long.  Neat.

Texas A&M GeoServices Partner Program

In search for public (and fast and low cost) geocoding services, I’ve run into Texas A&M GeoServices.

I have only tested their reverse geocoding service and it was all of the three above.  It took no more than a cup of coffee to provide addresses given latitude and longitude and information added looks very promising. 

They even have a partnering program that could minimize the expense of using said service.  Neat!  Expect upcoming posts to attribute all geo data to them as so:

Geo-stuff provided by Texas A&M University GeoServices