Sample Size Calculation: The Essentials (Part 1)

This article is intended to provide basic information on sample size calculation for those undertaking the Basic Course in Biomedical Research (BCBR) offered by the National Institute of Epidemiology (NIE).

Background Information:

Inferential statistics: A branch of statistics where an inference about the population parameter is made on the basis of a sample estimate. Basically, one tries to guess the population value (parameter) based on the value (estimate) obtained from a sample drawn from the same population. Truth be told, we can never be absolutely certain that we have captured the true population value. Therefore, there is always uncertainty whether the sample estimate is close to the population parameter. Confidence levels are used to indicate how sure we are that we have captured the true population value (or how close the population parameter is to the sample estimate).

In order to calculate sample size, one needs to understand (and use) the following terms:

Alpha (Type I error): This is also referred to as the significance level of a test. It is the probability of rejecting the null hypothesis when it is actually true. Usually alpha is kept at 5% or 0.05 in decimal notation.

Illustration: A shepherd boy cries ‘Wolf! Wolf!’ although there is no wolf. Villagers come running to attack the wolf but find no wolf. Here, the villagers committed a Type I error by believing there was a wolf (the null hypothesis would have been: There is no wolf) when in reality there was none.

Confidence level (1 minus alpha): Simply put, this is the complement of alpha and indicates the probability that a sample estimate is within certain specified limits of the true value. This is generally kept at 95% (since alpha is set at 5%).

Beta (Type II error): This is the probability of failing to reject the null hypothesis when it is actually false. Usually beta is kept at 20% or 0.20 in decimal notation.

Illustration: The shepherd boy cries ‘Wolf! Wolf!’ when a wolf attacks the sheep. Villagers do not respond as they believe there is no wolf and he is bluffing. Here, the villagers committed a Type II error by believing there was no wolf (the null hypothesis would have been: There is no wolf) when in reality there was a wolf.

Power (1 minus beta): This is merely the complement of beta and indicates the probability of correctly rejecting the null hypothesis when it is false. Usually this is 80% or 0.80 in decimal notation (since beta is kept at 20%).

Note 1: The values of alpha (5%), and beta (20%) are the maximum acceptable. One can lower the value of alpha to 1%, etc. and that of beta to 10%, etc. but cannot increase these above 5% and 20% respectively since doing so will adversely affect the confidence level and power of the study.

Precision: This is a measure of how close a sample estimate is to the true population value. It may be expressed in absolute terms or relative to the sample estimate.

Standard normal deviate (z): This is also referred to as the z statistic and is used to standardize a value from a normal distribution (to create a standard normal distribution). It is given by the formula:

The conversion of values to a standard normal deviate (z score) allows the application of standard normal distribution, which is a probability distribution. Here, the distance of each value is measured in number of standard deviations (sigma in the above equation) from the mean. Since the probability of each location is known (from the z distribution table), one can use this in sample size calculation (for instance).

Key Messages:

In the context of BCBR, the following situations for sample size calculation have been described:

  1. Sample size calculation for estimating Population Mean
  2. Sample size calculation for estimating Proportions
  3. Sample size calculation for Analytical Studies (case-control and cohort)

We will consider each of the above in turn:

Sample size calculation for estimating Population Mean

To calculate sample size for estimating population mean we need the following values:

  1. Confidence level (usually 95%)
  2. Standard deviation
  3. Width of confidence interval (precision)

The formula is given as:

When solving we input the value of z corresponding to two-tailed 95% confidence level from the z distribution table (1.96), and other values from the problem.

Example:

A nutritionist wants to conduct a survey among a population of adolescent girls to determine the average daily protein intake. She wants the estimate to be within 5 units of the true value in either direction. A confidence level of 95% is decided, and the population standard deviation is suspected to be 20 grams.

Solution:

From the problem:

z = 1.96,

standard deviation (sigma) = 20, and

precision (d) = 5

Substituting, n = (1.96^2)*(20^2)/(5^2) = 61.47

Sample size for estimating proportions

Generally, this situation arises in the context of cross-sectional studies. Here, there are two potential conditions:

  1. using absolute precision
  2. using relative precision

The formula for calculating sample size is given as:

To calculate sample size we need the following values:

  1. Confidence level (usually 95%)
  2. Prevalence
  3. Precision (absolute or relative precision)

When solving we input the value of z corresponding to two-tailed 95% confidence level from the z distribution table (1.96), and other values from the problem.

Proportion/Prevalence (p) may be expressed in percentage or decimal notation. q is the complement of p and must be expressed in the same notation as p. Thus, if p is expressed in percentage, q = (100-p), and if p is expressed in decimals, q = (1-p). When prevalence is unknown, taking p as 0.5 (or 50%) will yield the largest sample size.

When the problem specifies absolute precision, use the absolute precision value as such in the denominator. When the problem specifies relative precision, express precision (d) as a proportion of p [usually one takes relative precision up to maximum 20% of p].

Example (Absolute precision):

An investigator wants to estimate the true immunization coverage in a community. From previous studies the immunization coverage is estimated to be 80%. The investigator desires 95% confidence level and wants the results to be within 4% of the true value.

Solution:

From the problem:

z = 1.96 (the value of standard normal distribution corresponding to a significance level of 0.05 (i.e. 5%) for a two-sided test.

p = expected proportion in the population = 80% [or 0.80]

q = (100-p) = 20% [or (1-p) = 0.2]

d = absolute precision = 4% [or 0.04]

Substituting, n = (1.96^2)*(80*20)/(4^2) = 384

Example (Relative precision):

An investigator wants to estimate the true immunization coverage in a community. From previous studies the immunization coverage is estimated to be 80%. The investigator desires 95% confidence level with a relative precision of 20%.

Solution:

From the problem:

z = 1.96 (the value of standard normal distribution corresponding to a significance level of 0.05 (i.e. 5%) for a two-sided test.

p = expected proportion in the population = 80% [or 0.80]

q = (100-p) = 20% [or (1-p) = 0.2]

d = relative precision = 20% [or 0.20] of p = 0.2*80 = 16

Substituting, n = (1.96^2)*(80*20)/(16^2) = 24.01

To obtain the same sample size as in the previous example (384), one must take a relative precision of 5%.

Note 2: Always use the same notation consistently- do not use percentages in the numerator and decimals in the denominator.

Summary:

Sample size calculation requires

  1. Confidence level (typically 95%- corresponds to a z value of 1.96)
  2. Estimate of population standard deviation or proportion/prevalence
  3. Precision (absolute or relative)

Of these, z is usually 1.96 and other values are provided in the problem.

I will discuss sample size estimation of analytical studies in the next article.

Useful Links:

Link to previous articles on Sample size calculation for Cross-Sectional studies:

https://communitymedicine4all.com/2014/05/11/sample-size-calculation-cross-sectional-studies/

https://communitymedicine4all.com/2018/06/23/how-to-calculate-sample-size-with-epi-info-7/

Link to previous article on Absolute and relative precision:

https://communitymedicine4all.com/2014/12/30/relative-and-absolute-precision-in-sample-size-calculation/

1 thought on “Sample Size Calculation: The Essentials (Part 1)

  1. Pingback: Sample Size Calculation: The Essentials (Part 2) | communitymedicine4all

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.