Simply put, cohort analysis is a technique for analyzing activity over time by a common characteristic. Mostly used in sales and marketing, cohort analysis can be used in tasks such as analyzing customer loyalty, customer cost acquisition, marketing campaign effectiveness and to explore many other aspects of sales.
The store providing their sales data does monthly advertising campaigns and wants to track what impact these advertising campaigns have on the amounts of orders placed over time. They want to use this information to evaluate their different campaigns and improve their efforts.
Given the superstore sales data and the requirements, lets present the number of orders placed per customer join date. Presenting the number of orders per join date will show the effectiveness of advertising campaigns leading to such date.
Any database server can be used to follow along. The code used here can easily be revised to work on any vendor’s product like MySQL, etc. For visualization purposes, Tableau can be easily changed replace by LibreOffice or similar.
From the Orders Sheet, we are only going to use the following columns – Order Id, Order Date and Customer Name. For further analysis, however, lets preserve all the data in our system by saving the contents of the Orders Sheet in a database table.
Lets appropriately call this table superstore_sales. Also, lets execute our analysis for month/year to match the store’s advertising efforts.
1. First, lets fetch the orders, together with the order date and the customer name.
, to_char(a.order_date, ‘YYYY-MM’) order_date
, a.customer_name customer
order by 2,3
2. To this set, lets add the date each customer placed their first order in the store.
3. Interested in the effect of each month’s advertising campaign, lets create a campaign id for each month. Basically, join date becomes campaign id. Notice we dropped the customer name as well. This is our join criteria only at this point.
Here we see, for each order placed, the monthly campaign to attribute the order to. Neat.
4. Now, lets determine the number of months between the join date for each order’s customer first order and the order date for each order.
This is a mouthful; lets explain. It is easier to see it if explained backwards.
For each customer, calculate the number of months between their first order and every other order they have placed.
There, this makes a bit more sense.
Now we have a set of all orders by date by campaign and the number of months since the customer joined store.
5. Next up, we simply count orders by campaign id by number of months since customer joined store.
select count(distinct order_id) order_count, mon_diff, campaign_id from (
This is enough to satisfy our case; not bad. Now we have the set we need to analyze sales by campaign id over time. A short dive into Tableau (try Public) and we come out with an elegant way to view a lot of information at once.
Lets go over the visualization together.
The y axis represents the number of sales (order count), the x axis represents how many months ago (months diff) these sales occurred and the different colors each represent a campaign (campaign id).
Using the first month (ago) as an example, we can tell how many orders can be attributed to each campaign for the life of the data set. Skipping to the last month, 48, we see how many orders were placed for the campaign that occurred back then. Seeing only one color makes sense since that is our first advertising campaign.
Clicking on the visualization above takes you to the interactive version where you can see the actual numbers of orders per campaign, etc. Try it.
That’s it; thanks for reading. This seems like a very effective way to analyze a large amount of sales data for insights into advertising performance.
This post only covers join date as the grouping characteristic; many other cases can be made where cohort analysis can be helpful. Similarly, many other visualizations can provide similar insight.
Maybe next time we can cover some of these; thanks for reading!
Another great, and wet, run! Non stop rain did not stop the enthusiasm for the thousands of runners the met this year for the 5k and 15k Miracle Miles Event in Orlando.
Highlights for me:
- First race-in-the-rain!
- Second race without wearing headphones (in fanny pack as security blanket). First one was Disney, by choice; more later.
- Longest race without any walking.
- (Sub :) ) 9 min/mi.
- Lots of fun.
I even knew a bunch of runners at event; neat. Lastly, I feel happy about performance. First 15k, hence reference for distance, and time-to-beat for next 15k’s , hopeful, PR!
Disclaimer – Happy Oracle Employee.
Even thou it has a 30 day trial, I think free developer accounts will be more suitable to accelerate adoption. I’ve inquired and await response but they know best I guess. Oracle does offer such a service for APEX; maybe its a matter of time…
Maybe next time; Tableau to the rescue it is.
Held this past May 24th, 2014 on a hot Saturday morning, the course was florida-flat (42 ft elevation gain) but had plenty of shade and had that small-town feeling Oviedo is known for.
With a few weeks to ‘train’ I set out to ramp up the miles in order to reduce my time. No medals for me yet but I enjoyed the race sights, the crowd and the location. Nothing like an early short run to set one up for a nice weekend.
For fun, I decided to play around with the results in Tableau and try out the OSX version of the software. Feature-wise, both Windows and OSX versions are the same, with the Windows version being just a bit more stable with big sets of data.
Anyways, back to the race. As mentioned before, this is a local Oviedo race. Most of the runners come from Oviedo itself or some close by neighborhood.
640 runners met for the race. The group was split almost evenly between men and women with 47% and 53% respectively.
The age groups looked a little more interesting. Notice the difference in gender at 40-44. More generally, from 25-44, girls clearly represent this race.
Other fun bits, the younger runner was 6 years old and the oldest 84. There were a bunch of mom/daughter and son/dad teams as well. Good family times!
As for myself, I did ok. I was able to place 7th in my age group and pr with a little bit to go.
As far as other 40 year olds, I tell myself things look mighty impressive.
I am skipping the Garmin data of this post; I just wanted to play a bit with Tableau. Maybe I’ll combine run data from Garmin next time. Garmin run data can be seen here. Tableau workbook can be found here. Race data can be found here.
Oh yeah, make sure to check out the Greater Oviedo 5K; nice all around race. Thanks for reading.
As far as scenery goes, this is the most distracting run I’ve ever done. I found it impossible to focus on running. Savannah is a beautiful city. It is small, yet packed with life and activity from the touristy bits to family life and business.
For a better write-up about Savannah Squares than this post, read this; an eloquent intro to the squares.
Because I will have to do this again in no time, here I jot down the quickest way for me to install 12c. This is the usual series of following steps, running into an error, searching for a fix, lather, rinse repeat.
Step One – Install oracle-rdbms-server-12cR1-preinstall.
Step Two – …gets us all the way here – [FATAL] PRVF-0002 : Could not retrieve local nodename
Step Three - … cannot write oradata change to /home/oracle/oradata
Step Four – Swapfile … I used 10000
Step Five – Then I see installer is bent on putting oradata directory in /home. Just revise to suit your install.
This one got me for a loop. It just so happens that both the h2 browser AND the play app should run from the same console. Otherwise, you end up running app and its database as one instance and new instance of ‘a’ database in another instance.
Do as I say and not as I do!
1. From shell, start play app. Do NOT type ‘play run’, just ‘play’.
2. Type ‘h2-browser’ next, The H2 Console should load up in the browser.
3. Now, type ‘run’ (or ‘~run’ for compile on save) to start your app.
Bingo! You should be able to now see your database.