Armitage Archive

Imagine a data set with columns for customer ID, ZIP code, birth date, and gender. The customer ID is a random number, so there is no way to use it to figure out whom it represents. However, one of the customers lives in a ZIP code with a small population, and it turns out they are the only male in that ZIP code born on their birthday. As a result, the values in the ZIP code, birth date, and gender columns allow that customer to be 're-identified' in the data set, even if that customer's name is not explicitly included in the data set. This won't happen just for people in small towns. About 87 per cent of the population of the United States can be uniquely identified through a combination of only their birth date, gender, and five-digit ZIP code.