Data Mining Orange County (Orlando Area) Crime

Lets look at recent crime in Orlando and adjacent cities.  What can we find out by exploring Orlando’s crime data?  Is the Central Florida area a relatively safe place?  Can we tell from 90 day’s worth of day?  Are some areas safer than others?  Do we have any false ideas about places in Orlando?

At the very least, lets attempt to get a better understanding of the crime issue which affects us everywhere.

Continue reading

Data Science Toolkit on VMWare Error

Tried to run Running Data Science Toolkit in VMWare but get image error?

The OVA image provided seems to have been created in VirtualBox. Attempting to deploy in VMWare give the following error:

The OVF package requires unsupported hardware.
Details: Line 33: Unsupported hardware family ‘virtual box-2.2’.

54C020C8-116D-470D-8952-BCE1A0806512

Using VMWare’s OVFTool, we can convert this image to a ‘VMWare-format’ image as follows,

From the command line,

ovftool.exe -tt=vmx —lax c:\dstk_0.50.ova c:\dstk_0.50.vmx

D0B109EC-FFED-48DE-BF61-0B7EF570AB3C

After a good while, we will end up with VMWare, 60gb disk. That is it… your data science toolkit vm.

Exploring Cohort Analysis – Part One

Simply put, cohort analysis is a technique for analyzing activity over time by a common characteristic.  Mostly used in sales and marketing, cohort analysis can be used in tasks such as analyzing customer loyalty, customer cost acquisition, marketing campaign effectiveness and to explore many other aspects of sales.

Dataset

I am using the superstore sales created by Michael Martin found here or here.  This excel file contains three sheets of which only the first one, Orders, will be used in this analysis.

Case

The store providing their sales data does monthly advertising campaigns and wants to track what impact these advertising campaigns have on the amounts of orders placed over time.  They want to use this information to evaluate their different campaigns and improve their efforts.

Given the superstore sales data and the requirements, lets present the number of orders placed per customer join date.  Presenting the number of orders per join date will show the effectiveness of advertising campaigns leading to such date.

Tools

Any database server can be used to follow along.  The code used here can easily be revised to work on any vendor’s product like MySQL, etc.  For visualization purposes, Tableau can be easily changed replace by LibreOffice or similar.

 

Continue reading

2014 Miracle Miles 15k Race

Another great, and wet, run!  Non stop rain did not stop the enthusiasm for the thousands of runners the met this year for the 5k and 15k Miracle Miles Event in Orlando.

Highlights for me:

  • First race-in-the-rain!
  • Second race without wearing headphones (in fanny pack as security blanket).  First one was Disney, by choice; more later.
  • Longest race without any walking.
  • (Sub :) ) 9 min/mi.
  • Lots of fun.

I even knew a bunch of runners at event; neat.  Lastly, I feel happy about performance.  First 15k, hence reference for distance, and time-to-beat for next 15k’s , hopeful, PR!

image

image

Building on my previous race analysis for the Oviedo 5K, I debated whether to use Tableau (rocks!) again or try Oracle’s Cloud Analytics.

Disclaimer – Happy Oracle Employee.

Even thou it has a 30 day trial, I think free developer accounts will be more suitable to accelerate adoption.  I’ve inquired and await response but they know best I guess.  Oracle does offer such a service for APEX; maybe its a matter of time…

Maybe next time; Tableau to the rescue it is.

Continue reading

Analyzing the 12th Annual Greater Oviedo 5K with Tableau

I had been meaning to write about this semi recent 5K run in beautiful Oviedo at First Baptist Church.  With by Raul’s insistence, I decided to check this local favorite out.

Held this past May 24th, 2014 on a hot Saturday morning, the course was florida-flat (42 ft elevation gain) but had plenty of shade and had that small-town feeling Oviedo is known for.

With a few weeks to ‘train’ I set out to ramp up the miles in order to reduce my time.  No medals for me yet but I enjoyed the race sights, the crowd and the location.  Nothing like an early short run to set one up for a nice weekend.

For fun, I decided to play around with the results in Tableau and try out the OSX version of the software.  Feature-wise, both Windows and OSX versions are the same, with the Windows version being just a bit more stable with big sets of data.

Continue reading

A Savannah Runaround

As far as scenery goes, this is the most distracting run I’ve ever done. I found it impossible to focus on running.  Savannah is a beautiful city.  It is small, yet packed with life and activity from the touristy bits to family life and business.

For a better write-up about Savannah Squares than this post, read this; an eloquent intro to the squares.

Continue reading