Happy Elasticsearch

So last night I found this on my lab machine…

elasticsearchsad

Silly node ran out of space.  Spawning up extra node promptly made this a non-issue.

elasticsearchhappy

Crisis averted. 

It is so nice to work with open source tools built to handle failure gracefully.  A few years ago the above scenario would have prompted a weekend-at-the-colo to the dismay of family and my sanity.  These are interesting times!

Monitoring With Collectd And Kibana

Overview

In this post, I am will be go over using the Collectd input on Logstash for gathering hardware metrics. For Logstash output, I am going to be using Elastcisearch. Logstash will allow us to centralize metrics from multiple computers into Elasticsearch. On top of Elasticsearch, I am using going to be using Kibana to display these metrics.

To this Kibana dashboard, we could add additional metrics for the processes taxing the system being monitored. This would effectively show a cause and effect story in one integrated dashboard.

kibana-display-dashboard.jpg
Gist e94ad12dfe84426971bd

Continue reading

Up And Running With Logstash

logstash

I want to talk about Logstash, a new-ish tool (to me) for managing computer logs. Logstash can easily collect logs from multiple computers or instances, transfer them to a central computer for aggregation and can even be used to parse and search these logs for analysis as they are handled. A mouthful indeed. Logstash is open-source software and is written in JRuby so it runs in the JVM. Running on the JVM has various advantages such as easy of deployment and wealth of tuning expertise available.

Continue reading

Using Threshold Analysis To Discover Emerging Crime Trends In Orlando

Do police departments across the US (the world?) have the bandwidth to pour over crime reports in order to spot trends and mitigate crimes using all the available information?  Given the ever increasing amount of data, now the norm, it will be increasingly difficult to make the best use of this data.  As it applies to Crime, being able to effectively utilize this data should improve the quality of life of everyone.

Crime Dashboard

image

For week 40, commercial burglaries on sector 2 have increased from an expected count of 7.89 to an actual count of 16.  This is almost 2 1/2 times the expected volume.  (Click image to try analysis or here for image)

Continue reading

Data Mining Orange County (Orlando Area) Crime

Lets look at recent crime in Orlando and adjacent cities.  What can we find out by exploring Orlando’s crime data?  Is the Central Florida area a relatively safe place?  Can we tell from 90 day’s worth of day?  Are some areas safer than others?  Do we have any false ideas about places in Orlando?

At the very least, lets attempt to get a better understanding of the crime issue which affects us everywhere.

Continue reading

Data Science Toolkit on VMWare Error

Tried to run Running Data Science Toolkit in VMWare but get image error?

The OVA image provided seems to have been created in VirtualBox. Attempting to deploy in VMWare give the following error:

The OVF package requires unsupported hardware.
Details: Line 33: Unsupported hardware family ‘virtual box-2.2’.

54C020C8-116D-470D-8952-BCE1A0806512

Using VMWare’s OVFTool, we can convert this image to a ‘VMWare-format’ image as follows,

From the command line,

ovftool.exe -tt=vmx —lax c:\dstk_0.50.ova c:\dstk_0.50.vmx

D0B109EC-FFED-48DE-BF61-0B7EF570AB3C

After a good while, we will end up with VMWare, 60gb disk. That is it… your data science toolkit vm.

2014 Miracle Miles 15k Race

Another great, and wet, run!  Non stop rain did not stop the enthusiasm for the thousands of runners the met this year for the 5k and 15k Miracle Miles Event in Orlando.

Highlights for me:

  • First race-in-the-rain!
  • Second race without wearing headphones (in fanny pack as security blanket).  First one was Disney, by choice; more later.
  • Longest race without any walking.
  • (Sub 🙂 ) 9 min/mi.
  • Lots of fun.

I even knew a bunch of runners at event; neat.  Lastly, I feel happy about performance.  First 15k, hence reference for distance, and time-to-beat for next 15k’s , hopeful, PR!

image

image

Building on my previous race analysis for the Oviedo 5K, I debated whether to use Tableau (rocks!) again or try Oracle’s Cloud Analytics.

Disclaimer – Happy Oracle Employee.

Even thou it has a 30 day trial, I think free developer accounts will be more suitable to accelerate adoption.  I’ve inquired and await response but they know best I guess.  Oracle does offer such a service for APEX; maybe its a matter of time…

Maybe next time; Tableau to the rescue it is.

Continue reading

Analyzing the 12th Annual Greater Oviedo 5K with Tableau

I had been meaning to write about this semi recent 5K run in beautiful Oviedo at First Baptist Church.  With by Raul’s insistence, I decided to check this local favorite out.

Held this past May 24th, 2014 on a hot Saturday morning, the course was florida-flat (42 ft elevation gain) but had plenty of shade and had that small-town feeling Oviedo is known for.

With a few weeks to ‘train’ I set out to ramp up the miles in order to reduce my time.  No medals for me yet but I enjoyed the race sights, the crowd and the location.  Nothing like an early short run to set one up for a nice weekend.

For fun, I decided to play around with the results in Tableau and try out the OSX version of the software.  Feature-wise, both Windows and OSX versions are the same, with the Windows version being just a bit more stable with big sets of data.

Continue reading