Data Viz KitRights  |  Data  |  Questions  |  Charts  |  Hazards  |  Improvements

Selection Bias

Selection bias is when data for analysis are chosen in a way that it is neither a complete enumeration of all the possible data (like a census) nor a random, scientific sample. Selection bias often affects data related to human rights violations. This is because human rights-related data is frequently collected in situations of limited access, danger, high stigma, or risk of punishment. In addition, survivors and victims may mistrust the organization collecting data, or fear that their information may not be secure against misuse.

Data samples affected by selection bias are referred to as convenience samples. These rely on information “selected by those who provide it or observe it,” rather than on a probability-based selection procedure. For example, testimonies from individuals who choose to tell their stories to NGOs, text messages coming from disaster-stricken areas, and cases captured through crowd-sourcing platforms are all limited to people who elect to come forward with their experiences, or who have access to cellphones and are able to send text messages. In general, convenience samples often favor people who have one or both of consistent interaction with, or access to, online data. By relying on convenience samples for data-based policy decisions and responses to human rights violations, human rights data practitioners run the risk of systematically excluding those people who do not enjoy such access. This is problematic, because people with low access to the internet or online data are already likely to have been marginalized in other ways, and may be more vulnerable to human rights abuses.

A lack of probability-based selection procedure also makes statistical inference an unreliable tool for drawing conclusions based on the sample. A random sample – in which every member of the population has the same chance of being selected – becomes more and more representative of the population as a whole as the number of individuals in the sample grows, making statistical analysis a powerful and accurate tool for drawing conclusions. But in a convenience sample with selection bias, this logic no longer holds. Because not every member of the population has the same chance of being selected for the sample, the sample does not become more representative of the general population when it grows larger. Moreover, it is impossible to quantify the probability that some members are less likely than others to be selected, meaning that the type and degree of bias remains unknown.

For more information, see the Human Rights Data Analysis Group’s Core Concepts page, particularly their section on Convenience Samples.