What researchers mean by... bootstrapping

Bootstrapping is a statistical technique for determining how confident we can be in the findings of a study

Pick yourself up by your bootstraps. You’ve heard the saying. Though it’s actually impossible to lift yourself off the ground by pulling up on your boots, the phrase is a metaphor for getting out of a difficult situation by your own efforts.

The statistical term bootstrapping is named from this saying. It refers to a technique that offers a seemingly impossible solution to a statistical problem.

When scientists want to know something about a large population (e.g., average height, frequency of symptoms), they cannot measure or ask every individual in the population. Instead, they will randomly sample a smaller group of people and use the measurements of this smaller group to estimate an answer to the research question. They will also determine how confident they can be that their findings (in the sample) represent the true value of the statistic (e.g. average, frequency) in the population from which the sample was taken. Often, researchers use proven mathematical formulas to determine these confidence levels.

But sometimes mathematical formulas won’t work or don’t exist to determine confidence levels. This is where bootstrapping comes in. It allows researchers to calculate confidence levels or other measures of accuracy using the sample itself—by resampling over and over again from the original sample.

Let’s take a hypothetical example. Say you want to know how well workers in Ontario are functioning three months after they hurt their back at work (100 pts = full functional abilities, 0 pts = no functional ability). You can’t survey all 7,000 workers who sustained a low-back injury during a given year, so you take a random sample of 400 of these workers. You learn that their average (mean) functional level at three months is 73 pts.

If no formula was available, how confident could you be that 73 pts was the mean functional level at three months among all workers in Ontario who had a low-back injury that year? You could repeat your sampling many times and use all the samples to create your confidence interval. However, this would be time-consuming, expensive and, potentially, not even feasible.

So you turn to bootstrapping, where you conduct your resampling within your one real sample. If you were doing bootstrapping manually—and you wouldn’t; bootstrapping is only possible because of the power of computers—you would do something like this (with a nod to Biostatistics for Dummies for providing the outline of this manual process).

1. Write the level of function of each of the 400 workers sampled on a piece of paper and put all 400 in a brown paper bag.

2. Reach in and pull out one of the pieces of paper. Record the level (69 pts) and put the paper back in the bag. 3. Reach in again, pull out a piece of paper, record the level (74 pts) and return the paper to the bag

4. Repeat this another 398 times until you have recorded 400 levels, each time returning the paper to the bag. This is called sampling with replacement

5. Based on these 400 values, calculate the mean functional level. Because the paper is returned to the bag each time, some may be selected more than once and some not at all. As a result, this new mean will be slightly different.

6. Now, repeat steps two through five 1,000 times, writing down the mean of each new sample of 400 values.

7. Take the 1,000 means you calculated and order them from smallest to largest. Remove the smallest and largest 2.5 per cent (25 means). The smallest and largest remaining numbers—maybe 69.4 and 76.2—are the lower and upper 95 per cent confidence limits around your original sample estimate of 73 pts. This means that 95 times out of 100, this interval covers the true population mean.

So when researchers say they performed bootstrapping, you know they ran the data from their original sample through a software program that resampled it over and over, as described above. Researchers do this to determine how confident they can be that the findings from their original sample truly reflect what would have been found if they had been able to study all the people in the population.

Source:  At Work, Issue 90, Fall 2017: Institute for Work & Health, Toronto