Sample Size Calculation: The Essentials (Part 2)

In the previous article I discussed how to calculate sample size for studies where one wants to estimate a mean or proportion. I also described how to calculate sample size using either absolute precision or relative precision. In this article, I will discuss sample size calculation for Analytical studies (Case-Control, and Cohort studies).

Background Information:

Case Control Studies: These are observational epidemiological studies where the outcome/disease has occurred in some of the study subjects (‘Cases’) before they are recruited into the study. Some people who do not have the outcome/disease of interest are recruited to serve as ‘Controls’. Exposure status is compared between cases (diseased) and controls (non-diseased). The ratio of cases to controls may be 1:1 to (usually) 1:4. Such studies are suitable to investigate rare conditions and take less time to complete compared to cohort studies. It is possible to investigate multiple risk factors for the disease/outcome of interest.

Cohort studies: These are observational epidemiological studies where none of the study subjects has developed the outcome/disease of interest at the start of the study. Subjects are observed over time and measurements taken. This may be done by scrutinizing health records (in the case of retrospective cohort studies), or taking direct measurements (in the case of prospective cohort studies). Such studies are time-consuming and very expensive. Participant attrition/loss to follow-up is a major challenge in such studies. It is possible to investigate multiple outcomes for the exposure(s) under consideration. Disease status is compared between exposed and un-exposed study subjects. It is possible to obtain incidence (new cases) in such studies.

Key Messages:

Case-Control Studies:

To calculate sample size for case-control studies, we need the following:

  • The value of alpha
  • The value of beta
  • Proportion of controls with exposure (p0)
  • Proportion of cases with exposure (p1)
  • Ratio of cases to controls

The formula for calculating sample size is:

Conventionally, alpha (two-sided) = 0.05 (or 5%) and beta = 0.20 (or 20%).

The simplest situation is when the ratio of cases to controls is 1:1, so we will assume equal size for cases and controls in our example.

Example:

A researcher wants to conduct a case control study on oral contraceptive use and the risk of thromboembolism in women of reproductive age. Previous studies state that 10% of women use oral contraceptives and around 18% of women who use oral contraceptives develop thromboembolism. Alpha is 0.05 and beta is 0.20. Assume equal size for cases and controls (1:1 ratio).

Solution:

From the problem, we have

p0 = proportion of controls who use oral contraceptives = 0.10

p1 = proportion of cases who use oral contraceptives = 0.18

q0 = (1-p0) = 1- 0.10 = 0.90

q1 = (1-p1) = 1 – 0.18 = 0.82

= 1.96*

*This is the value of the standard normal distribution corresponding to a significance level of alpha [1.96 for a 2-sided test at the 0.05 level].

= 0.84^

^This is the value of the standard normal distribution corresponding to the desired level of power (here, 80%).

Substituting the values, we get

n= ([(0.10*0.90) + (0.18*0.82)]* [1.96 + 0.84]^2)/ (0.18-0.10)^2

= (0.2376*7.84)/0.0064

= 291.06

Thus, we need 291 cases and 291 controls for the study.

Cohort studies:

To calculate sample size for cohort studies, we need the following:

  • The value of alpha
  • The value of beta
  • Proportion of unexposed at risk of disease/outcome (p0)
  • Proportion of exposed at risk of disease/outcome (p1)
  • Ratio of exposed to unexposed

Conventionally, alpha (two-sided) = 0.05 (or 5%) and beta = 0.20 (or 20%).

The formula to calculate sample size for cohort studies is:

This image has an empty alt attribute; its file name is sample-size-formula-for-analytical-studies.png

You must have observed that the formula is identical to that for case-control studies. However, although the notation is identical, the terms p0 and p1 represent different entities compared to a case-control study.

Example:

A researcher wants to investigate the relationship between occupational exposure to loud noise and hearing loss. Previous studies indicate that the risk of developing hearing loss in those who are occupationally not exposed to loud noise is 0.15 (15% of those without occupational exposure to loud noise have a risk of developing hearing loss). The risk of developing hearing loss in occupational workers with exposure to loud noise is reported to be 0.25 (25%). Alpha is 0.05 and beta is 0.20. Assume equal size for exposed and unexposed (1:1 ratio).

Solution:

From the problem, we have

p0 = proportion of non-exposed at risk of hearing loss = 0.15

p1 = proportion of exposed at risk of hearing loss = 0.25

q0 = (1-p0) = 1- 0.15 = 0.85

q1 = (1-p1) = 1 – 0.25 = 0.75

= 1.96*

*This is the value of the standard normal distribution corresponding to a significance level of alpha [1.96 for a 2-sided test at the 0.05 level].

= 0.84^

^This is the value of the standard normal distribution corresponding to the desired level of power (here, 80%).

Substituting the values, we get

n= ([(0.15*0.85) + (0.25*0.75)]* [1.96 + 0.84]^2)/ (0.18-0.10)^2

= (0.315*7.84)/0.01

= 246.96

Thus, we need 247 individuals with occupational exposure to loud noise and an equal number of people without occupational exposure to loud noise for the study.

Note: There are other formulae also, which use the number of observations (how many times measurements will be obtained from the subjects), etc. but that is beyond the scope of this article.

Summary:

To calculate sample size for case-control/ cohort studies, we need the following:

  • The value of alpha (conventionally 0.05 (5%))
  • The value of beta (conventionally 0.20 (20%))
  • p0: Proportion of controls with exposure (Case-Control)/ risk of disease among unexposed (Cohort)
  • p1: Proportion of cases with exposure (Case-Control)/ risk of disease among exposed (Cohort)
  • Ratio of cases to controls (Case-Control)/ exposed to unexposed (Cohort)

The general formula to calculate sample sizes for analytical studies is:

The value of the second term in the numerator [after (p0q0 + p1q1)] is 7.84 when alpha = 0.05 (5%) and beta = 0.20 (20%).

Useful Links:

Link to the previous article:

https://communitymedicine4all.com/2021/09/28/sample-size-calculation-the-essentials-part-1/

Links to other articles related to sample size calculation:

https://communitymedicine4all.com/2014/05/11/sample-size-calculation-cross-sectional-studies/

https://communitymedicine4all.com/2018/06/23/how-to-calculate-sample-size-with-epi-info-7/

https://communitymedicine4all.com/2014/12/30/relative-and-absolute-precision-in-sample-size-calculation/

12 thoughts on “Sample Size Calculation: The Essentials (Part 2)

  1. Asif Khan

    Thanks for sharing such a valuable information.
    If we want to know the level of iron in deseased person as compared to the level of iron among normal population, so the question is that what will be the study type and what will be sample size formula??
    Please help

    Like

    Reply
    1. drroopesh Post author

      Dear Iyke,

      There are several formulae for calculation of sample size, depending on the study design
      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3775042/
      I am sharing a link to an article with the names of formula originators that may be useful:
      https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwici7umneT9AhXIT2wGHWE1DikQFnoECAoQAw&url=https%3A%2F%2Fso04.tci-thaijo.org%2Findex.php%2FATI%2Farticle%2Fdownload%2F254253%2F173847%2F938756%23%3A~%3Atext%3DCochran%2520Formula%2520(Cochran%252C%25201977)%26text%3D%252D%2520Reliability%2520level%252095%2525%2520or%2520significance%2Cthe%2520population%2520proportion%2520is%2520unknown.&usg=AOvVaw1oHdKi01k61WKtTeK4buQm

      I am also sharing the link to an article describing the calculation of sample size for case-control studies using EpiCalc:
      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6548115/
      You will find additional sources cited in the above articles.

      I hope this helps.
      Regards,
      Dr. Roopesh

      Like

      Reply
    1. drroopesh Post author

      Dear Hajara,

      Why do you want to increase the sample size of a cohort study? Taking repeated measurements over time is likely to decrease the sample size requirement compared to a cross-sectional study. However, the risk of loss to attrition/follow-up means one has to compensate for that when calculating sample size.

      Regards,
      Dr. Roopesh

      Like

      Reply
  2. emma

    hello, I am a bit confuse with my research design, I plan to prospective follow up patients who receive a particular treatment during one year, and analyse the response group and no response group’s baseline dermographics to find out the risk factors that affecting the treatment outcome. so is this prospective cohort study ? if not, what is it? thanks for your comments !

    Like

    Reply
    1. drroopesh Post author

      Dear Emma,

      Both a cohort study and a Randomized Controlled Trial (RCT) involve follow-up of study subjects. However, a cohort study is observational (that is, the study subjects determine their exposure- they choose whether they want to do/not do something [exposure]) while a RCT is Experimental (that is, the investigator determines exposure and not study subjects. Typically drugs, vaccines, treatments, etc. are evaluated through RCTs.) Therefore, it is very important to know if your study is experimental or observational.

      If your study subjects are themselves choosing from one of several treatment options and you are merely observing them over time, yours is a cohort study. However, if an investigator is assigning treatments to them and then observing them over time for occurrence of outcome, it is an experimental study.

      I hope this helps.
      Regards,
      Dr. Roopesh

      Like

      Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.