For the past few weeks, Jane has been working on a data visualization project that takes publicly available UW-Madison campus crime data—collected under the Clery Act—and displays said data in a variety of ways. With the help of the UW-Madison Police Department (UWPD) and UW-Madison Libraries we’ve made some good progress, and hope to create our first infographic soon.
According to the Clery Center:
“The Clery Act requires colleges and universities that receive federal funding to disseminate a public annual security report (ASR) to employees and students every October 1st. This ASR must include statistics of campus crime for the preceding 3 calendar years, plus details about efforts taken to improve campus safety.
ASRs must also include policy statements regarding (but not limited to) crime reporting, campus facility security and access, law enforcement authority, incidence of alcohol and drug use, and the prevention of/response to sexual assault, domestic or dating violence, and stalking.”
Crime data at UW-Madison is retained in the form of a crime log and contains the nature, date, time, and location of each crime. The UW-Madison Police Department shares their daily crime log online (which contains the past 60 days of data) and publishes their annual security report each October, in accordance with the law. The most recent report, the 2017 Annual Security Report & Annual Fire Safety Report, is available here. Thus far, Jane has requested two years worth of crime log data (Sep 1, 2015 – Aug 31, 2017) from UWPD; they have provided the requested crime logs, in the form of PDFs, without delay.
Our first challenge in sorting through the logs was to pull data from the PDFs into a spreadsheet. We used Tabula, a “tool for liberating data tables locked inside PDF files,” to extract event #, date reported, date occurred, report #, location, offense, and disposition of each entry. While the first year’s worth of data was quite easily extracted; the second year presented a number of problems, such as unexpected blank fields, misaligned columns, and differently formatted PDF files—all of which complicated the process a bit. Once the data was imported into our spreadsheet, we used Excel to delete thousands of duplicate entries; luckily, Excel makes short change of duplicates.
Our next hurdle was to clean and organize the data. With over 3,500 entries remaining, we needed a way to easily correct typos, address inconsistencies, and separate data into multiple columns. OpenRefine proved ideal for our needs. The software also allowed us to parse subsets of data by offense. To date, we have identified 161 incidences of actual or attempted sexual assault, domestic or dating violence, or stalking that had been reported on or near campus during the two-year report period. (This number includes all incidences reported to UWPD; however, because the date occurred, location, and disposition data has not yet been reviewed, numbers may not align with those distributed by UWPD via the annual security report.) Jane will be requesting a third year’s worth of data shortly.
Our next step is to review this data and explore different ways of visually displaying the data points. While we know that sexual assault is notoriously underreported, it is Jane’s hope that by providing access to data-driven informational resources, such as infographics, we might encourage and support campus efforts to further discussion around sexual assault on college campuses.
Stay tuned for more on our data visualizations.
Many thanks to UW-Madison iSchool Faculty Associates Browen Masemann and Dorothea Salo, iSchool Lecturer and Wisconsin Center for Film & Theater Research Head Film Archivist Amy Sloper, and Digital Curation Librarian Cameron Cook for hosting a “Collections Carpentry Workshop” and introducing Jane to Tabula and OpenRefine.