Do police departments across the US (the world?) have the bandwidth to pour over crime reports in order to spot trends and mitigate crimes using all the available information? Given the ever increasing amount of data, now the norm, it will be increasingly difficult to make the best use of this data. As it applies to Crime, being able to effectively utilize this data should improve the quality of life of everyone.
Crime Dashboard
For week 40, commercial burglaries on sector 2 have increased from an expected count of 7.89 to an actual count of 16. This is almost 2 1/2 times the expected volume. (Click image to try analysis or here for image)
Dashboard Features
- Drilldowns
- Time Period
- Geographic – Patrol Sector > Zone
- Filtering
- Incident Count
- Patrol Sector
- Crime Type
- Z-Score value of 1.5 trips threshold changing metrics color to orange.
Orange County Crimes Database
Reusing the Orange County Crimes Database from this previous post, I thought of different ways a police department could leverage this data to optimize their daily tasks of making city safer. Police departments across the nation have full time analysts on payroll; I checked. I imagine a considerable amount of their time is spent figuring out how to best allocate resources.
Threshold Analysis
Using Threshold Analysis, it is possible to monitor crime trends by type in order for the police departments to act fast against crime waves. Furthermore, if the Orange County Sheriff’s Office had an API, this could be taken a step further and such reports could be automated! Maybe they do have such systems in place. I wonder what analysis they run internally…
Lets imagine a police sergeant wants to know, at any time, what is the expected volume of a particular crime in his area and what the actual volume is instead.
A significant difference between these values, if positive, could signal a crime wave hitting his area. Threshold analysis will not definitely assert this, however, it will allow the sergeant to focus the department’s limited time to those areas that merit attention. Maybe assigning additional resources to such area will effectively stop observed behavior.
A negative difference, conversely, could signal the positive impact a recent arrest or campaign has made. This could help validate the effort the department has devoted to a geographic area.
Either way, staying up to date with actual and expected volumes over multiple variables would enable this sergeant to better make use of his resources.
Data – Attributes
From the data available from the other post, lets list the attributes of interest.
- Geographic Hierarchy – Patrol Sector > Patrol Zone – Each patrol sector, 8 in total, has a point person. Lets assume this is the person determining ‘where is the crime’. Each of these sectors is divided into multiple zones. I imagine these zones to be like police beats as seen in the movies. A team of offices patrol specific beats right?
- Crime Type – We have the following property crimes in the set over a period of 90 days.
- Period Hierarchy – Year > Week Of Year – We have 14 weeks (30-43) for the year 2014. Because the set only has partial data for first week, 30 and for last week, 43, these will be excluded from analysis.
Data – Measurements
- Incidents Counts – Incident records include date, Patrol Zone, Sector, Crime Type information.
Data – Metrics
- Sum Of Incidents – This will be our actual incident count as reported.
- Running Average Of Incidents – This will be our expected incident count. This can be obtained by keeping a running average of incident counts over time. Pretty neat eh?
- % Difference – This is simply the difference between the first two measurements defined: actual and expected incident counts.
- Incident Standard Deviation – Similarly, for each week’s sum, the standard deviation.
- Z-Score – This is the number of standard deviations between actual and expected number of incidents. Basically, this metric helps us quantify if the actual incident count for a type is unusually high. Or small, this applies equally to unusually small incident counts.
Hey look, a star schema! Actually, if we had a need we would but, thinking that this is but scratching an itch, lets use Tableau again. This way we get to play with the data significantly faster than if we had to ramp up a traditional BI project. /rant.
Concluding Thoughts
- It would be real simple for a Police Department Analyst to replicate this work in-house for zero money out of pocket, I think. If said analyst could automate data export, then Tableau could use that massaged and cleaned as data source for analysis.
- Even better, daily data scraping, do not know how much Sheriff would like that, and some IFTTT magic could deliver daily email reports. Significant data cleansing would have to be automated as well thou.
- Can this report have less numbers, perhaps only information about the sectors, zones that tripped a threshold together with a map showing these crimes?
- Would these reports be worth anyone’s time? These crime spurts would all be noticed by police officers, just maybe not as fast.
- Could we revise report to show which crimes are ‘contagious’ and which are not?
- Could we follow up a threshold alert with the opportunity to inspect geographically adjacent areas?
- Range for crime counts should be easy to provide without adding much noise to report.
At this point I feel like I either stop, never post this or embark in another post in a crime series. Feels like a brainstorming high note, the best note to wrap session up.
Thanks for reading!