2017-18 Data Release

Jane is pleased to announce our latest data release. The University of Wisconsin’s (UW-Madison) crime log data is freely available on the campus police department’s (UWPD) website, though the site only displays the past 60-days worth of data at any given time. Requests for older logs–provided in the form of PDFs–are promptly filled by the department. To make this data more easily accessible to researchers and others interested in using the data, we’ve made the crime logs available in the form of .xlsx, .ods, .csv and .html files. The following summarizes how the campus crime log data for September 1, 2017-August 31, 2018 (available on the resources page) has been sorted and cleaned.

“Daily campus crime log” files:

  • Some words are cut off in the original PDFs and are thus cut off in the new file formats
  • Typos and other inconsistencies have not been corrected
  • Duplicate entries exist, as each PDF represents a 60-day snapshot of the log
  • An additional column, date_extracted, has been added to the new formats; date_extracted denotes the date of the original PDF on which the entry appears

Additionally, our data import via Tabula resulted in a few erroneous entries for Report #. These errors were corrected manually and include:

Report # Appeared As Time Reported Edited To
108P0M1273 8/30/18 1:57 PM 1801273
81:8001 A0M53 7/18/18 11:35 AM 1801053
81801054 7/18/18 12:02 PM 1801054
61:85051 P0M50 7/18/18 6:56 AM 1801050
M1801007 7/11/18 2:37 PM 1801007
M1801002 7/10/18 4:27 PM 1801002
1158 0A1M000 7/10/18 7:22 AM 1801000
1158 0A1M000 7/10/18 7:22 AM 1801000

Next, we deduped the new formats by event_numberdate_reporteddate_occurred, report_number, location, offense, and dispositionDeduputing resulted in the following:

  • Sept 1, 2017-Aug 31, 2018: decreased from 4377 to 2365 entries

Notes for “Dedupe of daily campus crime logs” files:

  • date_extracted has been excluded from the dedupe process
  • There may be more than one entry for a single incident if disposition changed over time

The last dataset available (as of the date of this blog post) is noted as “Clery data.” This set includes any entry that references sexual assault, domestic or dating violence, or stalking as the offense. The time_occurred field was manually edited (where the entry contained something other than date/time) in order to allow the field to be split into time_occured 1 (i.e date) and time_occurred 2 (i.e. time). Other minor typos were also resolved.

Leave a Comment

Your email address will not be published. Required fields are marked *


Deprecated: Directive 'allow_url_include' is deprecated in Unknown on line 0