Sample size calculation: Cross-sectional studies

Let us consider the estimation of sample size for a cross-sectional study.

In order to estimate the required sample size, we need to know the following:

p: The prevalence of the condition/ health state. If the prevalence is 32%, it may be either used as such (32%), or in its decimal form (0.32).

q: i. When p is in percentage terms: (100-p)

    ii. When p is in decimal terms: (1-p)

d (or l): The precision of the estimate. This could either be the relative precision, or the absolute precision. This will be discussed later in this post.

Za [Z alpha]: The value of z from the probability tables. If the values are normally distributed, then 95% of the values will fall within 2 standard errors of the mean. The value of z corresponding to this is 1.96 (from the standard normal variate tables). 

The formula for estimating sample size is given as:

        (Za)^2[p*q]      where the symbol ^ means ‘to the power of’; * means ‘multiplied by’

N=      d^2                                      that is, “Z-alpha squared into pq; upon d-square”

 substituting the values of Za, we get:

N= (1.96)^2[p*q]

           d^2

We can round off the value of Za (1.96) to 2, to obtain:

N= (2)^2[p*q]

         d^2

or, N= 4pq/ d^2      that is, “4 pq by d-square”

 

Example:

I wish to conduct a cross-sectional study on awareness of Hepatitis B among school children. A literature search reveals that other investigators have reported knowledge to range from 5% to 20% among students of grades 6 through 8. What should the size of my sample be?

 

The formula requires us to input the value of d (precision). If the absolute precision is known, there is no problem. However, often we can only input a relative precision. Where do we get the value of relative precision from?

Typically, relative precision is taken as a proportion of ‘p’. The maximum permissible limit is 20% of ‘p’.

In the above example, if ‘p’ is 20%, then ‘d’ will be (20/100)*20= 0.2*20= 4 {Taking a relative precision of 20%}.

This means that we will be able to detect a ‘p’ (prevalence) of 18% or more {half the value of relative precision on either side of ‘p’–> +/- 2%: 18% to 22%}.

That is, by taking a relative precision of 20% of ‘p’, the study will be able to detect the true awareness level if the actual prevalence is 18% or more. If the actual prevalence is less than 18%, however, the study will be unable to detect it accurately.

Therefore, the larger the value of ‘p’ (prevalence), the larger the possible value of ‘d’ (relative precision), keeping ‘d’ fixed (say, at 20% of ‘p’). If the prevalence is 50%, ‘d’ (20% of ‘p’) would then be 0.2*50= 10 (as compared to ‘d’ = 4 when ‘p’ = 20%).

The reverse is also true: the smaller the value of ‘p’, the smaller the value of ‘d’. A smaller ‘d’ implies a larger sample size. Therefore, the choice of ‘p’ is crucial. 

We can now input the values in the formula to obtain the sample size:

For the calculation we will take ‘d’ as 4. This yields:

N= (4*20*80)/ (4*4)

  = 400 this sample size will enable us to detect the truth if the prevalence is between 18-22% (or more).

If we took ‘p’= 5, then the sample size would be:

N= (4*5*95)/(1*1)                                           [‘d’= 0.2*5= 1]

  = 1900 this sample size will enable us to detect the truth if the prevalence is between 4-6% (or more).

So should I take ‘p’= 20% or ‘p’=5%?

That depends upon:

1. The  location of the original study- if you are planning to conduct the study in an urban area, use the prevalence reported by studies conducted in urban areas, and vice versa.

2. The available resources (time, manpower, money, etc.). Aim for the largest feasible sample size. The size should be adequate to yield 80% power. Do not unnecessarily increase the sample size unless the intention is to obtain greater power. If so, please mention the same in the methodology section.

3. The results of your pilot study. If you have conducted a pilot study, the prevalence obtained from that study should be taken as ‘p’. This will be much more accurate than any other external value.

 

Note 1: If you have multiple objectives, you must calculate the required sample size for each objective, then choose the largest sample size thus obtained. This will ensure adequate power for all objectives, else the study will lack power for one or more objectives. That is, you may not be able to detect a significant result where it actually exists because you failed to include enough subjects to detect it.

Note 2: It is advisable to mention a range rather than a single value for sample size. This is standard practice in the west, but not in India. A range may be obtained by calculating the sample size for different values of ‘p’.

 

282 thoughts on “Sample size calculation: Cross-sectional studies

  1. Tonia

    Dear Dr Roopesh, please I am conducting a cross sectional study on assessment of biomedical waste management and disposal practices among selected hospitals in Port Harcourt Nigeria. I am looking for a formula to use in calculating my sample size. Thanks

    Like

    Reply
    1. drroopesh Post author

      Dear Tonia,

      The formula for cross-sectional studies is the same as that mentioned in the article.
      You will have to substitute the values of p, q, and determine the relative precision desired to compute the sample size.

      Regards,
      Dr. Roopesh

      Like

      Reply
      1. Anonymous

        Dear Dr Roopesh ,
        How does one calculate a sample size for an unknown prevalence of a condition, especially if it is a pilot study.

        Like

        Reply
        1. drroopesh Post author

          Dear Anonymous,

          For a pilot study one typically surveys up to 20 or 30 individuals. Their responses are not included in the main study later, though.
          Where the prevalence is unknown, one enquires with local practitioners, general public to guesstimate the prevalence in addition to reviewing literature for clues about the same. One would often be able to obtain a possible range of values (from x% to y%, for instance). Next, one estimates sample sizes for the lower value and the higher value to determine feasibility of conducting a study with the estimated values (it is desirable to use the higher sample size estimate), and finalizes sample size.

          I hope this helps.
          Regards,

          Dr. Roopesh

          Like

          Reply
  2. ekikere marcel

    I am doing a cross sectional study checking serum endothelin1 levels in heart failure patients and its correlates , comparing characteristics of patients with elevated levels with those with normal levels of endothelin1, please can i use this formula to calculate sample size?

    Like

    Reply
  3. Tabe Glorias

    Hi Dr, I’m doing a research project tilted the effects of worker’s incentives on employee performance in higher institutions in buea, cameroon and I’m using the cross sectional sampling technique. I’m confused with how to calculate my sample size from a population of 120 people

    Like

    Reply
    1. drroopesh Post author

      Dear Tabe Glorias,

      What type of study are you planning to conduct (qualitative or quantitative)? A qualitative study may be more appropriate from what you have written. Alternatively, you could simply analyze some metric(s) of employee performance using routinely collected data.

      Do let me know.
      Regards,
      Dr. Roopesh

      Like

      Reply
  4. Hassan

    Dear Dr. Roopesh

    I am planning to conduct a cross sectional study for cardiovascular health behaviors and associated factors among coronary artery disease patients. However, there is no prior studies assessing the prevalence of CAD or these variables in my country despite a very thorough literature review and by addressing different health sectors.
    In this case, how I can calculate my sample size? can I use the prevalence of in other countries in the same region?

    Like

    Reply
  5. Dr Ginsau

    Dear Dr Roopesh

    I will be conducting a cross-sectional study involving three hospitals in a metropolis. My total sample size is 160 for the whole metropolis using above formula. Is there a formula I can use to calculate the sample size for each hospital?
    Thank you.

    Like

    Reply
    1. drroopesh Post author

      Dear Dr. Ginasu,

      You could use the formula provided to calculate overall sample size, then use stratified random sampling (with each hospital constituting one stratum) to determine the proportion of the overall sample size that must be obtained from individual hospitals. Within each stratum you will have to apply an appropriate sampling method to obtain the required sub-sample.

      I hope this helps.
      Regards,
      Dr. Roopesh

      Like

      Reply
  6. Marissa

    Hi Dr. Roopesh,

    I am currently conducting a superiority trial evaluating 4 drugs have on improving hemoglobin levels in anemic patients to determine which one is best. I was wondering how to go about conducting a sample size calculation for this. Particularly, what prevalence should I be searching the literature for? Should it be the prevalence of anemia?

    Best,
    Shrey

    Like

    Reply
    1. drroopesh Post author

      Dear Marissa,

      The requirements of a clinical trial with four arms are very different from one with two arms. As analysis will be complex, it is best to consult a statistician experienced with such designs beforehand. The effect size will be needed to estimate sample size here. Therefore, you should search literature for trials that will help determine effect size (the magnitude of differences between drugs). You will also have to apriori set the superiority margin.

      I hope this helps.

      Regards,
      Dr. Roopesh

      Like

      Reply
  7. Sophie Carrard

    Dear Dr. Roopesh,

    I am planning to carry a cross sectional survey about use of technologies of a specific population, which parameters should I take? Thanks for your answer and kind regards, Sophie

    Like

    Reply
    1. drroopesh Post author

      Dear Sophie,

      It depends on the research gap, study population, and your research question (which in turn will influence the objectives). If the study population is students, then factors influencing academic performance may be important. The use of technology among elderly for specific needs may require inclusion of different parameters. Technology aided healthcare service delivery may warrant inclusion of other parameters. In essence, the choice of parameters is dictated by their influence on the outcome of interest.

      I hope this helps.
      Regards,
      Dr. Roopesh

      Like

      Reply
  8. Lelisa

    Dear, Dr. Roopesh, I’m studying A five years prevelance and associated factors of HBV among pregnant Women In Kelem Walaga Zone,(Oromia) from Ethiopia. How can I Determine my Sample size? which value of P is permissible for this Study?

    Like

    Reply
  9. Mak

    Dear Dr.Roopesh,
    Conducting cross-sectional study on prevalence of delay in patient,diagnostic and treatment delay and associated factors among breast cancer patients. How can I calculate the sample size,which of the 3 p values to take to calculate sample size? Possible way of data analysis, can it be done with logistic regression ?

    Like

    Reply
    1. drroopesh Post author

      Dear Mark,

      Calculate sample size using each of the three prevalence values in turn, then choose the largest sample size as the sample size for the study. This way the study will be adequately powered for each of your three objectives.

      Logistic regression is used when you are dealing with a single categorical outcome variable and want to investigate the influence of one or more independent variables on the outcome variable. If your variables of interest fit this requirement, you can definitely perform logistic regression. Before performing logistic regression, however, it is important to perform univariate (counts and frequencies/ descriptive statistics) and bivariate analyses (t-test, chi-square test, etc.) to understand the data and discover patterns/relationships.

      Hope this helps.
      Regards,
      Dr. Roopesh

      Like

      Reply
    2. Anonymous

      Hi Dr Roopesh,
      I am conducting a study to ascertain what screening method is more acceptable between the Breast self examination and Clinical Breast Examination. How do a incoporate a 10% difference in the groups in my sample size calculation. Also do I have to calculate a sample size different fro BSE and CBE

      Like

      Reply
      1. drroopesh Post author

        Dear Anonymous,

        You may want to use the sample size calculator with Epi Info for this- it allows users to input prevalence for both comparison groups.
        Although separate sample size calculation is not required for BSE and CBE (the difference is accounted for in the calculation as mentioned above), you must perform separate sample size calculation for each objective and choose the largest feasible sample size estimate. This will ensure there is adequate power for each objective.

        I hope this helps.
        Regards,

        Dr. Roopesh

        Like

        Reply
  10. Frank Gondwe

    Hi Dr Roopesh
    am doing a study to determine proportion of pregnant mothers who received TTV in a particular location. The national proportion of pregnant mothers who received TTV is 23% and the national prevalence of pregnant women is 12%. Which of the two will i use to determine the sample size and which formula to use.

    Like

    Reply
  11. Anonymous

    Hy Dr
    I want to carry out validation study of Cervical cancer biomarkers in urine samples of patients with healthy volunteers as control in my community and from a teaching hospital. The disease has a prevalence of 13.6 as reported. Please how do I calculate my sample size. Thank you in advance

    Like

    Reply
    1. drroopesh Post author

      Dear Anonymous,

      If you are planning to conduct a validation study using a Case-Control study design, I would recommend using the sample size formula for case control studies instead of cross-sectional studies. Alternatively, you could use software like GPower to estimate sample size based on the main statistical analysis you intend to perform. However, you will need to supply effect size (the [anticipated] magnitude of difference between the two groups).

      Regards,
      Dr. Roopesh

      Like

      Reply
  12. Anonymous

    Hi Dr.Roopesh,

    I am planning to conduct a cross sectional survey study to evaluate women’s response to the dense breast notification in 1, awareness of their breast density, 2, attained knowledge on breast density, 3, cancer worry and 4, how each of breast density awareness, attained knowledge and cancer worry would impact on women’s intentions to be screened. How do I calculate the sample size needed?

    Like

    Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.