Information bias occurs during data collection and includes:
- Misclassification bias
- Ecological fallacy
- Regression to the mean
- Other biases
I will discuss each of these in turn.
- Misclassification bias
This occurs when exposed/diseased subjects can be classified as non-exposed/non-diseased and vice-versa; or due to random errors in data entry/capture, missing data, rounding.
The commonest biases that produce misclassification are:
- Detection bias: Occurs when the outcome is not assessed/measured in the same way between the comparison groups. It is controlled by masking the outcome assessors.
- Observer/interviewer bias: Occurs when the observer/interviewer is aware of the hypothesis/disease status/exposure status (including intervention received).
- Recall bias: Occurs if the presence of disease influences the perception of its causes/ knowledge of exposure influences subjects’ answers.
- Reporting bias: Occurs when study participants give answers they perceive are of interest/refuse to divulge or underreport information about embarrassing issues or undesirable behaviours.
Observer/interviewer bias and reporting bias can be minimized through blinding/masking.
The two major types of misclassification bias are:
- Differential misclassification bias: Here, misclassification is different in the comparison groups. The estimate may be biased in either direction- either towards or away from the null hypothesis.
Example: The recalled exposure is not same for cases and controls in a case-control study.
- Non-differential misclassification bias: This occurs when the misclassification is uniform across the comparison groups (Eg: exposure is equally misclassified in diseased and non-diseased). The direction of bias depends on the type of variable: for binary (two categories [present/absent, etc.]) variables the estimate is biased towards the null hypothesis (there is no difference); for polytomous (more than two categories) variables bias may be biased away from the null hypothesis also.
2. Ecological fallacy
This bias is produced when group level analyses are used to make inferences at the individual level. Here, exposure and disease are measured at the group/population level (Eg: per capita consumption of fat and prevalence of cardiovascular disease in a state/country- this is based on the total consumption of fat by people (group level data), not how much an individual actually consumes (individual level data). One may safely compare states/countries and make inferences about overall exposure and disease. However, what is true at group/population level may not be true at the individual level. Therefore, making inferences at the individual level when one has only access to group level data may yield inferences that are true at the group level but false at the individual level (ecological fallacy)). Similarly, disease rates at the national/state level may mask local variations, resulting in false inferences.
3. Regression to the mean
This bias reflects the phenomenon that the values of a variable tend to move towards the centre of its distribution (mean/median value) with subsequent measurements. (Eg: If the first blood pressure reading obtained is high, subsequent readings will show a decline toward the mean blood pressure value). This may be neutralised by including an appropriate reference group and basing classification of study subjects on more than one measurement.
4. Other information biases
- Hawthorne effect: There is an increase in productivity or other outcome under study when participants are aware that they are being observed. (Eg: Physicians take more time to conduct a medical interview when they are being observed than when they are alone).
- Lead time bias: When screening results in early detection of disease, those who benefit from such early detection have longer disease duration than those who are not screened.
- Protopathic bias: This occurs when an exposure is modified (initiated/stopped) in response to a symptom of a hitherto undiagnosed disease (outcome). (Eg: Patients experiencing pain due to early heart failure may take NSAIDs for the pain, leading to an association between NSAID use and heart failure in a case-control study, for instance) A related bias (the sick quitter bias) occurs when persons with risky behaviour (such as heavy smoking) quit their habit due to disease. Studies analysing current behaviour as a risk factor will label them as non-exposed, underestimating the true association.
- Temporal ambiguity: This is common in cross-sectional and ecological studies where it cannot be established that exposure precedes effect/outcome.
- Verification bias: This occurs when assessing the validity of a diagnostic test. Here, the reference test (gold standard test) is executed less frequently when the diagnostic test result is negative.
Link to article on Selection Bias:
Links to useful articles: