When considering what variables to de-identify, determine FOR EACH VARIABLE the value of the information for data analysis versus for data security. You may determine that, if the data is too sensitive, rather than de-identifying, you should destroy the data or share the data with restricted-access only.
The amount of risk associated with re-identification can depend upon
- the sensitivity of the dataset's topic (more vulnerable topics would be more dangerous if re-identified)
- the specificity of the identifier (a more detailed job title more searchable than more generic title)
- the size (rows/observations) of the dataset overall (fewer rows means easier to narrow down options)
- the size and composition of the population (smaller groups that exist means easier to guess if participated in the study)
- the combination of information in the dataset (particular variables put together could make guessing more doable)
- the recency of the data collected (newer data is more relevant to today)