Data Is Plural

... is a weekly newsletter of useful/curious datasets.

2015.11.25 edition

Police complaints in Chicago, refugees in America, comments on Reddit, people on government web pages, and a century of pumpkin pie.

Complaints against Chicago police. The newly-launched Citizens Police Data Project has collected more than 56,000 allegations of police misconduct. The data, covering 2002-2008 and 2011-2015, includes demographic information about the complainant and the officer, as well as the type and location of the incident. Click here to download the raw data. Related: The City of Chicago’s wide-ranging data portal includes a spreadsheet of every reported crime in the city since 2001; you can explore neighborhood trends via the Chicago Tribune. [h/t Melissa Segura and  Abraham Epton]

Refugees in America. The Department of State publishes demographic reports on refugee arrivals since 2002. The data includes country of origin, resettlement city and state, religion, age, gender, and more. Related: At BuzzFeed, I used the data to chart the past decade of refugee arrivals. Also related: The UN’s refugee data portal.

1.7 billion Reddit comments. You can download every comment posted to Reddit since October 2007 … but you’ll need some patience and a terabyte of storage. If you’re more of the instant-gratification, don’t-have-an-external-hard-drive-lying-around type, you might enjoy FiveThirtyEight’s “How The Internet* Talks,” a sort of Google Ngrams for the Reddit data. [h/t Randall Olson and Ritchie King]

The most popular government web pages. The U.S. government has one very large Google Analytics account, and has begun sharing traffic data with the public. Not every federal website is accounted for, but more than 4,000 are. Over the past 90 days, they’ve racked up approximately 1.5 billion visits. The most popular page at the time of this writing? Weather.gov. Bonus: How they built it. [h/t Rebecca Williams]

A century of pumpkin pie. In 2011, the New York Public Library launched a crowdsourcing project to transcribe its massive collection of restaurant menus, dating back to the 1850s. So far, volunteers have transcribed more than 1.3 million dishes, their prices, and where on the menu each dish appeared. The library publishes a spreadsheet of all the data, and updates it twice a month. Happy Thanksgiving!