2015 Gasparilla Half-Marathon

plate

Highlights

  • PR!
  • First time running with a pacer. Way to go Kristen.
  • First time racing with Polar! M400 ftw.
  • Running in the dark for the first few miles. Street lamps had lots of style but gave but a nice glow that didn’t light road.
  • Half marathon may be sweet spot. Not too long not too short.
  • A big portion of course was out-and-back along Tampa Bay.  This gave us, the slow runners, a clear view of the elites on their way back to finish line.  It was breathtaking to see the game these runners are playing.  Lets just say that their stride looks very different than mine.  Wow, I’m looking up!

bib

Lessons

  • Include water stations in training/race strategy. I usually walk the stations and had to play catch-up to pace group. Every, single, time. Goodness, they can drink and run at the same time.
  • Review course, at least on map. Right after mile 12, an awesome person on a crane overhead us runners blasted ‘finish-line after bridge’ over megaphone.  Jolt of excitement goes thru my body, which is starting to ache.  I kind off see bridge in the distance.  Off I go, empty my cup! Finnish line was after SECOND bridge. Thanks dude… I almost did my dance 200 ft from the finish line.
  • Plan breakfast ahead of time. It’s about impossible to wing breakfast at 4am.  Ran on empty stomach and just about fainted an hour after run.
  • Plan post-race reunion with loved ones ahead of time. Winged this and it caused unnecessary aches (and extra walking).

splits

Lastly

Best race yet.  I would definitely consider revisiting Tampa next year for Gasparilla Half.

Monitoring With Collectd And Kibana

Overview

In this post, I am will be go over using the Collectd input on Logstash for gathering hardware metrics. For Logstash output, I am going to be using Elastcisearch. Logstash will allow us to centralize metrics from multiple computers into Elasticsearch. On top of Elasticsearch, I am using going to be using Kibana to display these metrics.

To this Kibana dashboard, we could add additional metrics for the processes taxing the system being monitored. This would effectively show a cause and effect story in one integrated dashboard.

Kibana dashboard
Gist e94ad12dfe84426971bd

Pre-requisites

I am using one VM running Ubuntu 14.04 for this post. You may have to change these steps, as needed, to match your environment. Also, make sure you have updated Java before starting.

First, we have our Linux daemon gathering metrics.

Collectd – Used to collect metrics from our system(s). We could be using Collectd in multiple systems to collect each system’s metrics.
Then we have the rest of the tools as follows
Logstash – Used to transport and aggregate our metrics from each system into our destination. This destination could be a file or a database or something else. Logstash works on a system of plugins for input, filtering and output. In this case, our input is Collectd and out output is Elasticsearch.
Elastcisearch – Used to store and search our collected metrics. This is our Logstash output.
Kibana – Used to display our metrics stored in Elasticsearch.

The combination of the last three tools is commonly called the ELK stack.

collectd-elk

Installing Everything

Collectd
The easiest way to install Collectd is to use apt-get. Install both collectd and collectd-utils.

sudo apt-get update
sudo apt-get install collectd collectd-utils

Elk Stack
Before we can install Logstash, we need to add the Logstash apt repo to our system first.

wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | apt-key add -
deb http://packages.elasticsearch.org/logstash/1.4/debian stable main

Then similarly to Collectd, install Logstash,

sudo apt-get update
sudo apt-get install logstash

Elasticsearch has not apt repo so… getting and installing Elasticsearch looked like this for me.

wget -O - http://packages.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.1.deb
sudo dpkg -i elasticsearch-1.1.1.deb

Kibana comes with Elastcisearch so there is no installation needed for it.

Configuring Everything

Collectd
For Collectd, we have to create a configuration file.

# For each instance where collectd is running, we define 
# hostname proper to that instance. When metrics from
# multiple instances are aggregated, hostname will tell 
# us were they came from.
Hostname **ilos-stats**

# Fully qualified domain name, false for our little lab
FQDNLookup false

# Plugins we are going to use with their configurations,
# if needed
LoadPlugin cpu

LoadPlugin df
<Plugin df>
        Device **/dev/sda1**
        MountPoint **/**
        FSType **ext4**
        ReportReserved **true**
</Plugin>

LoadPlugin interface
<Plugin interface>
        Interface **eth0**
        IgnoreSelected false
</Plugin>

LoadPlugin network
<Plugin network>
        <Server **192.168.1.43** **25826**>
        </Server>
</Plugin>

LoadPlugin memory

LoadPlugin syslog
<Plugin syslog>
        LogLevel info
</Plugin>

LoadPlugin swap

<Include **/etc/collectd/collectd.conf.d**>
        Filter ***.conf**
</Include>

Each plugin will gather different pieces of information. For an extensive list of plugins and their details, go to the Collectd Plugins page.

The configuration above will offer a solid set of metrics to begin our monitoring task. It will provide details on cpus, harddrive, network interface, memory and swap space. This may be about 5% of what Collectd can gather!

Logstash
For the purposes of this post, Logstash’s sole task is to pick up metrics from Collectd and deliver them to Elasticsearch. For this, we are going to define one input and one output.

input {
  udp {
    port => 25826         # 25826 matches port specified in collectd.conf
    buffer_size => 1452   # 1452 is the default buffer size for Collectd
    codec => collectd { } # specific Collectd codec to invoke
    type => "collectd"
  }
}
output {
  elasticsearch {
    cluster => "logstash" # this matches out elasticsearch cluster.name
    protocol => "http"
  }
}

Elasticsseach
The only edits we are making to the Elasticsearch configuration is to revise both our cluster and node name.

cluster.name: logstash
node.name: ilos

Kibana
Just like at installation time, there is no configuration necessary for Kibana, yay!

Testing The Setup

After starting all these services (Collectd, Logstash, Elasticsearch and Kibana), lets validate our setup.

Lets make sure everything is up and running. Lets go to Kibana URL (9292) and load up the default Logstash Dashboard.

Kibana Logstash

I’ve created a gist of this dashboard to share. If the default dashboard loaded successfully, you can either follow this post to create your own dashboard or you can just grab the GIST from Kibana itself (try Load > Advanced…) and edit at will; enjoy.

If some of the displays do not register any metrics, it is most likely because the attribute names on your system differ from mines. Just edit the queries as needed.

Gist e94ad12dfe84426971bd

Reference Dashboard

To get our feet wet, lets create the simplest of dashboards. We are going to create one visualization showing the list of events coming from Collectd which will serve as our reference for creating all the displays we want.

  1. From Kibana’s home, select the last link from the right pane, Blank dashboard.
  2. Select a Time filter from the dropdown so that we bring some events in. Note how now we have a filter under filtering.
  3. Click the upper right gear, name this dashboard Collectd.
  4. Click Index tab, select day for timestamping and check Preload fields. Save.
  5. Click Add a row, name this row Events, click Create row and Save.
  6. There is now a column with three buttons on the left edge of the browser, click the green one to Add a panel.
  7. Select Table for panel type, title this All Events and pick a span of 12. Save.
  8. From the upper right menu, Save your dashboard.

kibana collectd events

This panel is showing the events as they come into Elasticsearch from Collectd. Way to go Logstash! From these events, it extracts fields for reporting on.

This is a good place to start thinking of attributes and metrics. Each of the fields shown is an attribute we can report metrics on. In this post’s case, these fields will vary depending on the plugins defined in our Collectd configuration. You can think about objects if you want as well. Either way, depending on the plugin, we will a different set of attributes to report on. In turn, if we had a different Logstash input, we would end up with a completely different set of attributes.

Each record includes the following bits of information over time.
host – this matches the hostname defined in collectd.conf. Handy attribute for aggregating from multiple event sources.
plugin – Matches one of the plugins defined in collectd.conf
plugin_instance – Means of grouping a measurement from multiple instances of a plugin. For example, say we had a plugin of cpu with a type_instance of system, on a dual cpu machine, we would have plugin_instance 0 and 1.
collectd_type – Mostly follows plugin.
type_instance – These are the available metrics per plugin.
value – This is the actual measurement for said type_instance for each plugin_instance for each plugin…
type – Collectd for this exercise.

Collectd Dashboard

For each plugin loaded, lets list their attributes of interest, their types, instances and additional attributes. These will be used to write the Kibana queries which we will use later on to filter the Collectd data for each display we create.

cpu

plugin: cpu
type_instance: wait, system, softirq, user, interrupt, steal, idle, nice
plugin_instance: 0, 1, 2, 3

Kibana queries

plugin: "cpu" AND plugin_instance: "0"
plugin: "cpu" AND plugin_instance: "1"
plugin: "cpu" AND plugin_instance: "2"
plugin: "cpu" AND plugin_instance: "3"

Each of these should have an alias, such as: cpu1, cpu2, cpu3, cpu4.

In the Kibana dashboard we just created, go ahead and add each of these as individual queries on top.

Now we can setup a display as complicated or as simple as possible. Lets try a few.

  1. As before, lets start by creating a new row. We will dedicate this one for CPUs. (Add row: CPU)
  2. To this row, lets add a Terms panel. A panel is the same as a display type and is divided in up to 12 sections called panels. Name this cpu1.
  3. With 4 cpus for me and 12 available sections per row, I select a span of 3 to fit all cpus on one row.
  4. In Parameters, lets select terms_stats as the Terms mode.
  5. For Stats type, select max.
  6. For Field, type type_instance.
  7. For Value field, type value.
  8. For Order, select max.
  9. For Style, lets select pie.
  10. For Queries, do selected and click the query labeled cpu1 we recently created.
  11. Save.

Your work should look like this
Kibana cpu

Repeat these steps for each additional Kibana query above. Ensure to save your dashboard after this step. We’ve come far!

You should have something like this to show.
Kibana cpu

df

plugin: df
type_instance: reserved, used, free
plugin_instance: root

Kibana queries

plugin: "df" AND plugin_instance: "root"

Following the same pattern as we used to cpus, we end up with something that looks like this.

Kibana df

interface

plugin: interface
plugin_instance: eth0
addtnl attributes: rx, tx

Kibana queries

plugin: "interface" AND plugin_instance: "eth0"

Here we have only one query, just like for the df plugin but we have 2 distinct attributes to filter for, one for received and one for transmit. We just use the same filter twice but specify either rx or tx in the value field.

The creation of the rx visual, for example, looks something like this.
Kibana rx

Adding another identical one but for the value field, now of tx, will result in a display combo similar to this.
Kibana interface

memory

plugin: memory
type_instance: free, buffered, cached, used

Lets try something different for memory, create the following queries
Kibana queries

plugin: "memory" AND type_instance: "free"
plugin: "memory" AND type_instance: "buffered"
plugin: "memory" AND type_instance: "cached"
plugin: "memory" AND type_instance: "used"

Alias appropriately and select colors of choice. This display is a bit different than the rest, if only because we combine all the type_instance into one histogram. It took me longer to figure this one out.

Use this one as a reference and change as needed.
Kibana memory

You should end up with something like this.
Kibana memory

swap

plugin: swap
type_instance: cached, in, free, out, used

Kibana queries

plugin: "swap" AND type_instance: "free"
plugin: "swap" AND type_instance: "in"
plugin: "swap" AND type_instance: "out"
plugin: "swap" AND type_instance: "cached"
plugin: "swap" AND type_instance: "used"

Similar to our memory display, our swap display ought to look a lot like this.
Kibana swap

That was quite the journey. Hopefully, you’ve ended with a dashboard similar to the one shown in the beginning. Most likely, it would only take you a few minutes to make it better; go for it.

Using this dashboard as a base, you could aggregate these same metrics from multiple computers into a single dashboard. This beats having to go into each computer in use to try to figure out what is being taxed and what is not. Furthermore, implementing additional Collectd Plugin, would provide information of the cause of this loads. For example, there are plugins for database(s), for monitoring the JVM and many others.

Maybe that will be the purpose of my next post, enjoy.

Up And Running With Logstash

logstash

I want to talk about Logstash, a new-ish tool (to me) for managing computer logs. Logstash can easily collect logs from multiple computers or instances, transfer them to a central computer for aggregation and can even be used to parse and search these logs for analysis as they are handled. A mouthful indeed. Logstash is open-source software and is written in JRuby so it runs in the JVM. Running on the JVM has various advantages such as easy of deployment and wealth of tuning expertise available.

Continue reading

Using Threshold Analysis To Discover Emerging Crime Trends In Orlando

Do police departments across the US (the world?) have the bandwidth to pour over crime reports in order to spot trends and mitigate crimes using all the available information?  Given the ever increasing amount of data, now the norm, it will be increasingly difficult to make the best use of this data.  As it applies to Crime, being able to effectively utilize this data should improve the quality of life of everyone.

Crime Dashboard

image

For week 40, commercial burglaries on sector 2 have increased from an expected count of 7.89 to an actual count of 16.  This is almost 2 1/2 times the expected volume.  (Click image to try analysis or here for image)

Continue reading

Data Mining Orange County (Orlando Area) Crime

Lets look at recent crime in Orlando and adjacent cities.  What can we find out by exploring Orlando’s crime data?  Is the Central Florida area a relatively safe place?  Can we tell from 90 day’s worth of day?  Are some areas safer than others?  Do we have any false ideas about places in Orlando?

At the very least, lets attempt to get a better understanding of the crime issue which affects us everywhere.

Continue reading