Sample Size Calculation: The Essentials (Part 2)

In the previous article I discussed how to calculate sample size for studies where one wants to estimate a mean or proportion. I also described how to calculate sample size using either absolute precision or relative precision. In this article, I will discuss sample size calculation for Analytical studies (Case-Control, and Cohort studies).

Background Information:

Case Control Studies: These are observational epidemiological studies where the outcome/disease has occurred in some of the study subjects (‘Cases’) before they are recruited into the study. Some people who do not have the outcome/disease of interest are recruited to serve as ‘Controls’. Exposure status is compared between cases (diseased) and controls (non-diseased). The ratio of cases to controls may be 1:1 to (usually) 1:4. Such studies are suitable to investigate rare conditions and take less time to complete compared to cohort studies. It is possible to investigate multiple risk factors for the disease/outcome of interest.

Cohort studies: These are observational epidemiological studies where none of the study subjects has developed the outcome/disease of interest at the start of the study. Subjects are observed over time and measurements taken. This may be done by scrutinizing health records (in the case of retrospective cohort studies), or taking direct measurements (in the case of prospective cohort studies). Such studies are time-consuming and very expensive. Participant attrition/loss to follow-up is a major challenge in such studies. It is possible to investigate multiple outcomes for the exposure(s) under consideration. Disease status is compared between exposed and un-exposed study subjects. It is possible to obtain incidence (new cases) in such studies.

Key Messages:

Case-Control Studies:

To calculate sample size for case-control studies, we need the following:

• The value of alpha
• The value of beta
• Proportion of controls with exposure (p0)
• Proportion of cases with exposure (p1)
• Ratio of cases to controls

The formula for calculating sample size is:

Conventionally, alpha (two-sided) = 0.05 (or 5%) and beta = 0.20 (or 20%).

The simplest situation is when the ratio of cases to controls is 1:1, so we will assume equal size for cases and controls in our example.

Example:

A researcher wants to conduct a case control study on oral contraceptive use and the risk of thromboembolism in women of reproductive age. Previous studies state that 10% of women use oral contraceptives and around 18% of women who use oral contraceptives develop thromboembolism. Alpha is 0.05 and beta is 0.20. Assume equal size for cases and controls (1:1 ratio).

Solution:

From the problem, we have

p0 = proportion of controls who use oral contraceptives = 0.10

p1 = proportion of cases who use oral contraceptives = 0.18

q0 = (1-p0) = 1- 0.10 = 0.90

q1 = (1-p1) = 1 – 0.18 = 0.82

= 1.96*

*This is the value of the standard normal distribution corresponding to a significance level of alpha [1.96 for a 2-sided test at the 0.05 level].

= 0.84^

^This is the value of the standard normal distribution corresponding to the desired level of power (here, 80%).

Substituting the values, we get

n= ([(0.10*0.90) + (0.18*0.82)]* [1.96 + 0.84]^2)/ (0.18-0.10)^2

= (0.2376*7.84)/0.0064

= 291.06

Thus, we need 291 cases and 291 controls for the study.

Cohort studies:

To calculate sample size for cohort studies, we need the following:

• The value of alpha
• The value of beta
• Proportion of unexposed at risk of disease/outcome (p0)
• Proportion of exposed at risk of disease/outcome (p1)
• Ratio of exposed to unexposed

Conventionally, alpha (two-sided) = 0.05 (or 5%) and beta = 0.20 (or 20%).

The formula to calculate sample size for cohort studies is:

You must have observed that the formula is identical to that for case-control studies. However, although the notation is identical, the terms p0 and p1 represent different entities compared to a case-control study.

Example:

A researcher wants to investigate the relationship between occupational exposure to loud noise and hearing loss. Previous studies indicate that the risk of developing hearing loss in those who are occupationally not exposed to loud noise is 0.15 (15% of those without occupational exposure to loud noise have a risk of developing hearing loss). The risk of developing hearing loss in occupational workers with exposure to loud noise is reported to be 0.25 (25%). Alpha is 0.05 and beta is 0.20. Assume equal size for exposed and unexposed (1:1 ratio).

Solution:

From the problem, we have

p0 = proportion of non-exposed at risk of hearing loss = 0.15

p1 = proportion of exposed at risk of hearing loss = 0.25

q0 = (1-p0) = 1- 0.15 = 0.85

q1 = (1-p1) = 1 – 0.25 = 0.75

= 1.96*

*This is the value of the standard normal distribution corresponding to a significance level of alpha [1.96 for a 2-sided test at the 0.05 level].

= 0.84^

^This is the value of the standard normal distribution corresponding to the desired level of power (here, 80%).

Substituting the values, we get

n= ([(0.15*0.85) + (0.25*0.75)]* [1.96 + 0.84]^2)/ (0.18-0.10)^2

= (0.315*7.84)/0.01

= 246.96

Thus, we need 247 individuals with occupational exposure to loud noise and an equal number of people without occupational exposure to loud noise for the study.

Note: There are other formulae also, which use the number of observations (how many times measurements will be obtained from the subjects), etc. but that is beyond the scope of this article.

Summary:

To calculate sample size for case-control/ cohort studies, we need the following:

• The value of alpha (conventionally 0.05 (5%))
• The value of beta (conventionally 0.20 (20%))
• p0: Proportion of controls with exposure (Case-Control)/ risk of disease among unexposed (Cohort)
• p1: Proportion of cases with exposure (Case-Control)/ risk of disease among exposed (Cohort)
• Ratio of cases to controls (Case-Control)/ exposed to unexposed (Cohort)

The general formula to calculate sample sizes for analytical studies is:

The value of the second term in the numerator [after (p0q0 + p1q1)] is 7.84 when alpha = 0.05 (5%) and beta = 0.20 (20%).