So last night I found this on my lab machine…
Silly node ran out of space. Spawning up extra node promptly made this a non-issue.
It is so nice to work with open source tools built to handle failure gracefully. A few years ago the above scenario would have prompted a weekend-at-the-colo to the dismay of family and my sanity. These are interesting times!
In this post, I am will be go over using the Collectd input on Logstash for gathering hardware metrics. For Logstash output, I am going to be using Elastcisearch. Logstash will allow us to centralize metrics from multiple computers into Elasticsearch. On top of Elasticsearch, I am using going to be using Kibana to display these metrics.
To this Kibana dashboard, we could add additional metrics for the processes taxing the system being monitored. This would effectively show a cause and effect story in one integrated dashboard.
This is an update to Is Logstash Eating My Logs?.
TL/DR – Changing Logstash configuration input from syslog to tcp resolved messages getting lost. TCP input type proved 100% reliable.
UPDATE Feb 5, 2015 – Logstash Is Not Eating My Logs
For the past few days, I’ve been testing different scenarios for sending and receiving messages between computers using logstash. Setting up was straightforward. Using Sysloggen, I’ve been able to send a large number of messages through. Only one concern; I am loosing messages.
I want to talk about Logstash, a new-ish tool (to me) for managing computer logs. Logstash can easily collect logs from multiple computers or instances, transfer them to a central computer for aggregation and can even be used to parse and search these logs for analysis as they are handled. A mouthful indeed. Logstash is open-source software and is written in JRuby so it runs in the JVM. Running on the JVM has various advantages such as easy of deployment and wealth of tuning expertise available.
Do police departments across the US (the world?) have the bandwidth to pour over crime reports in order to spot trends and mitigate crimes using all the available information? Given the ever increasing amount of data, now the norm, it will be increasingly difficult to make the best use of this data. As it applies to Crime, being able to effectively utilize this data should improve the quality of life of everyone.
For week 40, commercial burglaries on sector 2 have increased from an expected count of 7.89 to an actual count of 16. This is almost 2 1/2 times the expected volume. (Click image to try analysis or here for image)
Lets look at recent crime in Orlando and adjacent cities. What can we find out by exploring Orlando’s crime data? Is the Central Florida area a relatively safe place? Can we tell from 90 day’s worth of day? Are some areas safer than others? Do we have any false ideas about places in Orlando?
At the very least, lets attempt to get a better understanding of the crime issue which affects us everywhere.
Tried to run Running Data Science Toolkit in VMWare but get image error?
The OVA image provided seems to have been created in VirtualBox. Attempting to deploy in VMWare give the following error:
The OVF package requires unsupported hardware.
Details: Line 33: Unsupported hardware family ‘virtual box-2.2’.
Using VMWare’s OVFTool, we can convert this image to a ‘VMWare-format’ image as follows,
From the command line,
ovftool.exe -tt=vmx —lax c:\dstk_0.50.ova c:\dstk_0.50.vmx
After a good while, we will end up with VMWare, 60gb disk. That is it… your data science toolkit vm.
Another great, and wet, run! Non stop rain did not stop the enthusiasm for the thousands of runners the met this year for the 5k and 15k Miracle Miles Event in Orlando.
Highlights for me:
- First race-in-the-rain!
- Second race without wearing headphones (in fanny pack as security blanket). First one was Disney, by choice; more later.
- Longest race without any walking.
- (Sub 🙂 ) 9 min/mi.
- Lots of fun.
I even knew a bunch of runners at event; neat. Lastly, I feel happy about performance. First 15k, hence reference for distance, and time-to-beat for next 15k’s , hopeful, PR!
Building on my previous race analysis for the Oviedo 5K, I debated whether to use Tableau (rocks!) again or try Oracle’s Cloud Analytics.
Disclaimer – Happy Oracle Employee.
Even thou it has a 30 day trial, I think free developer accounts will be more suitable to accelerate adoption. I’ve inquired and await response but they know best I guess. Oracle does offer such a service for APEX; maybe its a matter of time…
Maybe next time; Tableau to the rescue it is.
I had been meaning to write about this semi recent 5K run in beautiful Oviedo at First Baptist Church. With by Raul’s insistence, I decided to check this local favorite out.
Held this past May 24th, 2014 on a hot Saturday morning, the course was florida-flat (42 ft elevation gain) but had plenty of shade and had that small-town feeling Oviedo is known for.
With a few weeks to ‘train’ I set out to ramp up the miles in order to reduce my time. No medals for me yet but I enjoyed the race sights, the crowd and the location. Nothing like an early short run to set one up for a nice weekend.
For fun, I decided to play around with the results in Tableau and try out the OSX version of the software. Feature-wise, both Windows and OSX versions are the same, with the Windows version being just a bit more stable with big sets of data.