PII, or personally identifiable information, is any piece of data that someone could use to figure out who you are. Some types of PII are obvious, such as your name or Social Security number, but others are more subtle—and some data points only become PII when analyzed in combination with one another.
The United States General Services Administration uses a fairly succinct and easy-to-understand definition of PII:
The term “PII” … refers to information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual. The definition of PII is not anchored to any single category of information or technology. Rather, it requires a case-by-case assessment of the specific risk that an individual can be identified. In performing this assessment, it is important for an agency to recognize that non-PII can become PII whenever additional information is made publicly available—in any medium and from any source—that, when combined with other available information, could be used to identify an individual.
PII is increasingly valuable, and many people are increasingly worried about what use their PII is being put to, whether as part of legitimate business use by the companies that collect it or illicit use by the cybercriminals who seem to have all too easy a time getting ahold of it. This has led to a new era of legislation that aims to require that PII be locked down and its use restricted.
But if the law makes companies responsible for protecting personally identifiable information, that raises an important question: what qualifies as PII?
There are a number of pieces of data that are universally considered PII. Some of the most obvious include:
But in some ways, trying to nail down every possible specific kind of PII is a process that’s missing the point. More and more cybersecurity experts and regulatory agencies are thinking of PII in terms of what it can do if abused, rather than what it specifically is. We already saw some of that in the GSA definition above: PII is, to be a bit tautological, any information that can be used to identify a person, and sometimes you have to consider that information in a larger context in which other such information is also floating around out there. For instance: is your mother’s maiden name PII? Well, by itself, probably not. But if a hacker has your mother’s maiden name and your email address, and knows what bank you use, that might pose a problem, as that’s a frequent security question used for password resets.
The Department of Energy has a definition for what it calls high-risk PII that’s relevant here: “PII, which if lost, compromised, or disclosed without authorization, could result in substantial harm, embarrassment, inconvenience, or unfairness to an individual.” Though this definition may be frustrating to IT pros who are looking for a list of specific kinds of information to protect, it’s probably a good policy to think about PII in these terms to fully protect consumers from harm.
Before we move on, we should say a word about another related acronym you might have heard. PHI stands for protected health information, and it’s a special category of PII protected in the United States by HIPAA and the HITECH Act. Essentially, it’s PII that can also be tied to data about an individual’s health or medical diagnoses. HIPAA Journal has more details, but the important points are that any organization that handles PHI in connection with treating a patient has an obligation to protect it, and health data can be shared and used more widely—for research or epidemiological purposes, for instance—if it’s aggregated and has PII stripped out of it.
HIPAA was passed in 1996, and was one of the first U.S. laws that had provisions for protecting PII, a move spurred by the sensitive nature of medical information. As the easy transmission (and theft) of data has become more commonplace, however, more laws have arisen in jurisdictions around the world attempting to set limits on PII’s use and impose duties on organizations that collect it.
A constellation of legislation has been passed in various jurisdictions to protect data privacy and PII. These laws are of different levels of strictness, but because data flows across borders and many companies do business in different countries, it’s often the most restrictive laws that end up having the widest effects, as organizations scramble to unify their policies and avoid potential fines.
The United States does not have a single overarching data protection law beyond the provisions of HIPAA and other legislation pertaining to healthcare; that said, those laws apply to any companies that do business with healthcare providers, so their ambit is surprisingly wide. In addition, several states have passed their own legislation to protect PII. The California Privacy Rights Act, which went into effect in 2020, is one of the strictest, and has become something of a de facto standard for many U.S. companies due to California’s size and economic clout, especially within the tech industry. Virginia followed suit with its own Consumer Data Protect Protection Act, and many other states are expected to get in on the game. It’s also worth noting that several states have passed so-called safe harbor laws, which limit a company’s financial liability for data breaches so long as they had reasonable security protections in place.