Social Lunch - Kirsten Morehouse, Harvard

Date: 

Tuesday, November 14, 2023, 12:00pm to 1:15pm

Location: 

William James Hall, 1st floor Lecture hall, Room 105

Kirsten Morehouse, PhD student, Harvard

Topic: Recent discovery of a hidden peril of open science

Description: I recently submitted a paper with Brian Nosek and Benedek Kurdi that may be broadly interesting to our social area. It explores an unintended consequence of the open data revolution: re-identification, or the ability to combine demographic information to reveal a person’s identity without direct identifiers (such as email addresses or IP addresses). For example, Sweeney (2000) demonstrated that just three variables from the United States Census – gender, zip code, and date of birth – uniquely characterized 87% of the U.S. population. That is, 87% of the U.S. population had a unique combination of these three variables (e.g., woman born on 1/18/1996 who lives in 02142). By consequence, those individuals – including the only woman born on 1/18/1996 who lives in 02142 – can be identified using this minimal but publicly available information.

Crucially, this risk of re-identification is especially relevant to psychological science (and social psych, in particular) because (a) datasets often include a host of additional demographic information (e.g., race/ethnicity, level of education), which heightens the risk of re-identification; (b) the demographic information collected by psychological scientists exists in other public datasets, allowing the data to be linked to reveal sensitive information (e.g., health information); and (c) data sharing is becoming the norm across subdisciplines.

In the manuscript, we (a) introduce psychologists to the issue of re-identification risk, and (b) provide a complete pipeline for assessing re-identification risk, identifying appropriate risk mitigation and data sharing strategies, and implementing those strategies.

 

See also: Social Lunch