Correcting for participation bias in non-probability samples using multiple reference samples

Reasons for the study

Because researchers can’t include entire populations as participants in their studies, they use “samples”—a subset of the target population of interest. One type of sampling is non-probability sampling, in which researchers pick people to participate in their study who are the easiest to get information from. For this reason, this type of sampling is also referred to as convenience sampling.

Non-probability or convenience samples are becoming increasingly popular among applied health researchers due to their low costs and high response rates. However, when participants are chosen because of their convenience, the resulting sample can over- or under-represent certain demographic, lifestyle, occupational and health-related characteristics in the target population. This can, in turn, lead to erroneous inferences.

Currently, researchers aim to mitigate this bias by using what is called “a representative reference survey” from the same target population in conjunction with the convenience sample to approximate the unknown participation rates. More than one reference survey is often required to account for all the important variables associated with individuals’ decisions to participate in the convenience sample. Yet existing statistical methods are not designed to accommodate more than one reference survey. This study aims to fill this gap.

Objectives of the study

  • Expand existing methods to integrate two reference surveys
  • Assess the performance of the new methods in simulated data under scenarios of practical importance
  • Apply the new methods to real-world observational and multi-centre randomized control trial (RCT) convenience samples
  • Develop packages in R and SAS (software used for statistical analysis) to help other researchers apply these new methods to their primary data collected through convenience samples

Target audience

Scientists in the work and health research ecosystem and beyond (i.e. any scientist who collects primary data) will benefit from this methodological innovation.

Project status


Research team

  • Victoria Landsman, Institute for Work & Health (PI)
  • Peter Smith, Institute for Work & Health (PI)
  • Nancy Carnide, Institute for Work & Health
  • Ivan Carrillo-Garcia, Statistics Canada
  • Aya Mitani, University of Toronto
  • Lingxiao Wang, U.S. National Cancer Institute
  • Barry Graubard, U.S. National Cancer Institute

Funded by

Canadian Institutes of Health Research (CIHR)