Validity and reliability
Suppose you hear about a new study on depression in workers who have been disabled by a workplace accident. This study shows that depression levels are the same in injured and non-injured workers.*
Perhaps these results surprise you, so you start to take a closer look at the way researchers measured — or found out about — depression levels in the workers. Were their measures reliable? Were they valid?
Reliability and validity are important concepts in research. The everyday use of these terms provides a sense of what they mean (For example, your friends are reliable. Your opinion is valid). In research, their use is a little more complex.
This column explains the importance of validity and reliability in survey questionnaires or other measures used in a study. In our next column, we'll talk about validity in another way, concerning overall study findings.
So let's take a closer look at the measure — in this case, a new questionnaire — that the researchers used to ask workers about their symptoms of depression.
Validity refers to whether the researchers actually measured what they wanted to measure — symptoms of depression — and not something else, such as stress or anxiety levels. Reliability means that responses to the questionnaire were consistent.
Did these researchers do everything they could to strengthen the reliability and validity of their questionnaire? Here are some things they should have considered.
Ensuring the validity of measurement
At the outset, the researchers needed to consider the face validity of the questionnaire. Face validity can be described as a sense that the questionnaire looks like it measures what it was intended to measure. Were the questions phrased appropriately? Did the options for responding seem appropriate?
Content validity is also usually one of the first ways to ensure the validity of a questionnaire or other measure. The researchers could have asked experts in depression to consider their questions against the known symptoms of depression. These symptoms include depressed mood, sleeping problems, weight changes and physical pain. To have content validity, the questionnaire should include items about known symptoms.
The researchers could have also established criterion validity. How well do the results from their questionnaire compare with other measures of depression? One way to assess this is to give the workers two questionnaires: a “gold standard” questionnaire that's already been validated, and the new one. Then they could compare findings. Another way might be to follow the workers over time to see how the questionnaire results relate to the workers' actual treatment for depression later on.
Unlike physical traits such as weight or blood pressure, depression is not easily seen or measured. This is called a “construct.” The researchers might do some mini-experiments with their questionnaire and other measures to establish construct validity. For instance, if workers were given a questionnaire on a similar construct, such as psychological distress, the results should be related. A questionnaire on a different construct, such as happiness, would have opposite results.
Ensuring the reliability of measurement
Reliability refers to two things. First, reliability means the researchers would get similar results if they repeated their questionnaire soon afterwards with the same workers. The “repeatability” of the questionnaire would be high. This is called test-retest reliability.
The other aspect of reliability concerns the consistency among the questions. Because all the questions relate to depression, you would expect all the answers to be fairly consistent.
If our depression researchers were sloppy in ensuring the validity or reliability of their questionnaire, it could have affected their study's overall results. It's important to note that you can never prove reliability or validity conclusively, but results will be more accurate if the measures in a study are as reliable and valid as possible.
* This example is fictional.
For further reading, see: Health measurement scales: a practical guide to their development and use (third edition) by David Streiner and Geoff Norman.
Source: At Work, Issue 50, Fall 2007: Institute for Work & Health, Toronto
