Sample size calculation: Cross-sectional studies

Let us consider the estimation of sample size for a cross-sectional study.

In order to estimate the required sample size, we need to know the following:

p: The prevalence of the condition/ health state. If the prevalence is 32%, it may be either used as such (32%), or in its decimal form (0.32).

q: i. When p is in percentage terms: (100-p)

    ii. When p is in decimal terms: (1-p)

d (or l): The precision of the estimate. This could either be the relative precision, or the absolute precision. This will be discussed later in this post.

Za [Z alpha]: The value of z from the probability tables. If the values are normally distributed, then 95% of the values will fall within 2 standard errors of the mean. The value of z corresponding to this is 1.96 (from the standard normal variate tables). 

The formula for estimating sample size is given as:

        (Za)^2[p*q]      where the symbol ^ means ‘to the power of’; * means ‘multiplied by’

N=      d^2                                      that is, “Z-alpha squared into pq; upon d-square”

 substituting the values of Za, we get:

N= (1.96)^2[p*q]

           d^2

We can round off the value of Za (1.96) to 2, to obtain:

N= (2)^2[p*q]

         d^2

or, N= 4pq/ d^2      that is, “4 pq by d-square”

 

Example:

I wish to conduct a cross-sectional study on awareness of Hepatitis B among school children. A literature search reveals that other investigators have reported knowledge to range from 5% to 20% among students of grades 6 through 8. What should the size of my sample be?

 

The formula requires us to input the value of d (precision). If the absolute precision is known, there is no problem. However, often we can only input a relative precision. Where do we get the value of relative precision from?

Typically, relative precision is taken as a proportion of ‘p’. The maximum permissible limit is 20% of ‘p’.

In the above example, if ‘p’ is 20%, then ‘d’ will be (20/100)*20= 0.2*20= 4 {Taking a relative precision of 20%}.

This means that we will be able to detect a ‘p’ (prevalence) of 18% or more {half the value of relative precision on either side of ‘p’–> +/- 2%: 18% to 22%}.

That is, by taking a relative precision of 20% of ‘p’, the study will be able to detect the true awareness level if the actual prevalence is 18% or more. If the actual prevalence is less than 18%, however, the study will be unable to detect it accurately.

Therefore, the larger the value of ‘p’ (prevalence), the larger the possible value of ‘d’ (relative precision), keeping ‘d’ fixed (say, at 20% of ‘p’). If the prevalence is 50%, ‘d’ (20% of ‘p’) would then be 0.2*50= 10 (as compared to ‘d’ = 4 when ‘p’ = 20%).

The reverse is also true: the smaller the value of ‘p’, the smaller the value of ‘d’. A smaller ‘d’ implies a larger sample size. Therefore, the choice of ‘p’ is crucial. 

We can now input the values in the formula to obtain the sample size:

For the calculation we will take ‘d’ as 4. This yields:

N= (4*20*80)/ (4*4)

  = 400 this sample size will enable us to detect the truth if the prevalence is between 18-22% (or more).

If we took ‘p’= 5, then the sample size would be:

N= (4*5*95)/(1*1)                                           [‘d’= 0.2*5= 1]

  = 1900 this sample size will enable us to detect the truth if the prevalence is between 4-6% (or more).

So should I take ‘p’= 20% or ‘p’=5%?

That depends upon:

1. The  location of the original study- if you are planning to conduct the study in an urban area, use the prevalence reported by studies conducted in urban areas, and vice versa.

2. The available resources (time, manpower, money, etc.). Aim for the largest feasible sample size. The size should be adequate to yield 80% power. Do not unnecessarily increase the sample size unless the intention is to obtain greater power. If so, please mention the same in the methodology section.

3. The results of your pilot study. If you have conducted a pilot study, the prevalence obtained from that study should be taken as ‘p’. This will be much more accurate than any other external value.

 

Note 1: If you have multiple objectives, you must calculate the required sample size for each objective, then choose the largest sample size thus obtained. This will ensure adequate power for all objectives, else the study will lack power for one or more objectives. That is, you may not be able to detect a significant result where it actually exists because you failed to include enough subjects to detect it.

Note 2: It is advisable to mention a range rather than a single value for sample size. This is standard practice in the west, but not in India. A range may be obtained by calculating the sample size for different values of ‘p’.

 

282 thoughts on “Sample size calculation: Cross-sectional studies

  1. Sekartaji

    Dear Dr Roopesh,

    I would like to conduct a cross sectional study and I have difficulties to find the formula to calculated my sample size because the population is quite huge about 211,857. I am going to survey the knowledge, health belief and intention of female adolescent towards HPV vaccination and no previous study had ever done about this topic in my country. Could you please give me an advice about that matter?

    Your help is greatly appreciated.

    Sincerely,
    Sekartaji

    Like

    Reply
    1. drroopesh Post author

      Dear Sekartaji,

      If I understand the question correctly, you want to know how to compute sample size from a population of 211,857 individuals.

      Please use the prevalence from the following (and similar) articles to estimate the required sample size using the formula for cross-sectional studies:
      https://www.ncbi.nlm.nih.gov/pubmed/24188759

      In order to obtain your sample, you might consider cluster or multi-stage sampling.

      Hope this helps.

      Regards,
      Dr. Roopesh

      Like

      Reply
    1. drroopesh Post author

      Dear David,

      It is not ethical or practical to unnecessarily inflate the sample size for any study.

      The commonest reason for wanting to do so would be to increase the power of the study to detect even minor differences of interest.

      Another reason could be the desire to capture as much variation in the population as possible. However, this could be achieved by adopting a good sampling method.

      Regards,
      Dr. Roopesh

      Like

      Reply
  2. Achanya

    How do I calculate the sample size for which the cases will be matched with control, give previous study gave prevalence of 32%.

    Like

    Reply
    1. drroopesh Post author

      Dear Achanya,

      Do you intend to have 1:1 matching, or higher?

      I hope you realize that in a case control study one is comparing proportions of outcome between cases and controls.
      Therefore, for sample size calculation, you need to provide proportions for both cases and controls.

      Regards,
      Dr. Roopesh

      Like

      Reply
  3. Winfred Nelson

    when calculating sample size for three communities using sloven’s formula, if you add total for the three (for example 1474) and calculate you get about half the size ( 315) then you can use proportion formula to redistribute. However, if you were to calculate for each of the communities with populations 350, 774 and 350 you get a total of 624. Now, if I am using a mixed methods what number should I interview 315 or 624?

    Like

    Reply
  4. Winfred Nelson

    In fact the design is exploratory sequential so I will do a questionnaire survey generalise results and based on that select my qualitatives ( FGDs and Indepth interviews etc. The three communities are made up of farmers who all practice rainfed farming, but farmers from 2 of the communities also practice dry season farming because they use small scale dams during the dry season. Again what are my justifications for interviewing 315, and not 624 is it okay so I do not incur unnecessary cost ?

    Like

    Reply
      1. Winfred

        Thanks very much Dr. Roopesh. Have downloaded the materials and take a critical look at them. If there are any issues thereafter , I will get back. Have a good day.

        Like

        Reply
  5. Qusay

    I Would like to conduct a study which hasn’t been done in my country, so how can I estimate a sample size. My study is the influence of body mass index on liver size.
    Regards,

    Like

    Reply
    1. drroopesh Post author

      Dear Qusay,

      Even though the study hasn’t been conducted in your country, it is possible to estimate sample size.

      From literature, identify the findings reported by other investigators. They would likely have reported several measures- AP diameter/ Transverse diameter/ Volume, etc. Determine which measure is of importance to your study, and note the relationship between BMI and that specific measure.

      Identify a study that was conducted in a setting similar to your own (even if in another country, factors like setting (rural/ urban); economic status (developing/ developed); etc. could be similar).

      Then determine what proportion of subjects in that study have the relationship of interest. Use that to estimate sample size using the formula provided in the article above.

      Hope this helps.

      Regards,
      Dr. Roopesh

      Like

      Reply
  6. Anonymous

    Hi
    i am going to conduct a cross section study about the prevalence of cancer in ladys around the age of the menopause with an ovarian cyst and looking of a biochemical marker called Ca 125
    still i am unable to calculate the sample size ?

    Like

    Reply
    1. drroopesh Post author

      Dear Someone,

      Please perform a detailed review of literature and determine what proportion of perimenopausal women with ovarian cysts have elevated Ca 125 levels.

      Use that proportion to estimate sample size by substituting in the formula provided in the article above.

      If you get a range, estimate sample size using the lowest proportion, and use that to conduct your study if feasible.

      Regards,
      Dr.Roopesh

      Like

      Reply
  7. Solomon

    am going to do survey on bankingt industry . but there population size are different from one another. how am i going to deaal with that please help

    Like

    Reply
    1. drroopesh Post author

      Dear Solomon,

      You could try using cluster sampling method to conduct your survey. Each Bank would constitute a cluster, and you could perform sampling proportionate to size.

      If restricted to branches of a single bank, clusters could be determined on the basis of zones or regions, with business handled (in money terms- $, ₹, etc.) determining the proportionate size of each cluster.

      Hope this helps.
      Regards,
      Dr. Roopesh

      Like

      Reply
  8. Boniface

    Dear Dr Roopesh,

    I am conducting a cross sectional study on prevalence of cardiomyopathy among diabetes patients. Similar study done in my country showed a prevalence of 40%. I used the above formula for cross- sectional studies and used relative precision, 20%(of 40%). I was asked by my university research committee, why have I chosen relative precision instead of absolute precision. Initially when I was writting my proposal I tried absolute precision and it had given me a high sample of 334. When i used a relative precision, 20%(of 40%), it had given me,144, which I preferred (due to the limited study budget). How do you think I should answer the above question? And help me specifically with reasons for using relative precision instead of absolute precision?

    Like

    Reply
  9. Joshua C

    I am conducting a research on Sleep disorders in children with enlarged adenoids and tonsils in a hospital in Nigeria.Kindly help me with the type of study design and sample size calculation since I could not find a similar study and prevalence

    Like

    Reply
  10. Ramkumar

    hello,drroopesh im planning conduct cross sectional study of tb cervical lymphadenopathy clinico patho and demographic profile without folllowup for minimum of 1 yr … i dont know how to to calculate sample size .. previous studies are there but they are having indifferent sample size .. and pls help help me to calculate sample size of around 100

    Like

    Reply
      1. Anonymous

        objectives in the mean of demographic and clinico pathological profile , study population is op patients and in ward patients , outcome measures based final reports

        Like

        Reply
      2. Anonymous

        THESIS PROTOCOL

        CLINICO-PATHOLOGICAL AND DEMOGRAPHIC PROFILE OF
        TUBERCULAR CERVICAL LYMPHADENOPATHY
        Thesis Protocol Submitted For
        DIPLOMATE OF NATIONAL BOARD
        (RESPIRATORY MEDICINE)

        AIMS AND OBJECTIVES

        PRIMARY OUTCOME

        • TO STUDY THE CLINICO-PATHOLOGICAL AND DEMOGRAPHIC
        PROFILE OF TUBERCULAR CERVICAL LYMPHADENOPATHY PATIENTS

        MATERIAL AND METHODS

        STUDY DESIGN

        The present study is proposed to be a Cross-Sectional study will be conducted in NATIONAL INSTITUTE OF TB AND RESPIRATORY DISEASES where the patients in both OPD and IPD.The patients will be enrolled between aug’2017 to dec’ 2018 will be part of the study .

        STUDY METHOD

        Patients who are attending OPD and pt’s in IPD will be enquired about detailed history and through clinical examination will be done.Followed by all routine investigations and special tests like mantoux test, usg abdomen and FNAC of lymphnode with sample direct smear, cytopathological examination and culture for MTB will be done at NITRD. And finally reports will be analyzed as in the profoma.

        SAMPLE SIZE AND STUDY PERIOD

        The expected patients in the study will be between aug’2017 to dec’2018 who are giving consent for the study and those who are eligible for study.

        CRITERIA FOR SELECTION OF PATIENTS

        Inclusion criteria;
        • All patients who are agree to participate in the study.
        Exclusion criteria;
        • Patients who are not willing to participate in the study.
        • Patients with primary diagnosis of other diseases(e.g: cancer,sarcoidosis, pyogenic infections & etc).
        REVIEW OF LITERATURE
        DEMOGRAPHIC INCIDENCE;
        Mm rahman et al Out of 60 patients 40 were female and 20 were male and female male ratio was 2: 1. The most vulnerable age group was the 2nd decade 23(38.33%). The present study shows that the peak age incidence is 2nd decade of life (38.3%) and the 2nd highest incidence 3rd decade with 30%.

        Hussain et al out of 50 patients Male to female ratio is 2.1:1 most common during 2nd and 3rd decade of life (52% )with a peak incidence in the 2nd decade (32%).

        Devendra et al Out of 118 cases was found to be more prevalent in females as 30 out of 54(55.55%). In this study, we found out that TBL are commoner in 13-30 age groups, 83.33% .

        Vasuda et al out of 227 There were 113 (49.7%) female and 114 (50.3%) The maximum number [167 (73.6%)] of cases suggestive of cytomorphology of tubercular lymphadenitis were aged in the range of 11–30 years.

        Shaukat et al total 110 cases Out of these 42(38.1%) were males and 68(61.8%) were female. The majority of patients were in the age range between 10 to 30 years and next group belong to the 4th decade.
        Rasool et al Total 46 of which cases Female gender was found in the majority 28(61.87%) while male gender was 18(39.13%).
        Soumya et al A total of 63 patients were enrolled in the study of which 25 were males and 38 females The most commonly affected group in the study was 15–24 years age comprising of 57.1% (36 cases).

        Mohammed ali et al 115 cases there were 71 males and 44 females. The male to female ratio in present study was 1.61:1The majority ofpatients affected were in the age group of 13 to 20 years (39.13%) followed by 21 to 30 years (28.70%). The least affected age group was 61 to 70 years (1.74%).

        Chaitali et al Data of 80 patients was analyzed in this study.Gender wise 57 (71.3%) were females and remaining 23 (28.7%) were males.

        Naresh et al Males 48% and females 52%. In 50 cases the disease commonly affected the affected were 2nd decade 18% and 3rd decade 8% respectively. Commonest age group affected is between 11and 20> 21, and 30 closely followed by 31 and 40 years .

        CLINICAL PRESENTATION

        Karthi et al, Majority did not have symptoms 16 cases (31.4%) out of 51 showed symptoms fever was the most common , seen in 31% of cases, followed by malaise in 18% . It was observed 8 cases (15.6%) out of 51 cases had a positive history contact with tb . It was observed that posterior triangle was the commonest to get involved (31.3%) followed by upper deep jugular (21.5%). Levels 1, 3 and 4 were equally involved.And the majority of nodes (78.4%) were 4 cm. It was seen in 41 cases out of total 51 cases (80.3%) had U/L involvement. The remaining (19.7%) had bilateral involvement. and multiple node involvement in 39 cases (76.5%) while 12 cases (23.5%) showed single. Matting was observed in 14 of the 51 cases (27.4%). discrete lymph nodes which was present in 37 of the 51 cases (29.7%).

        Mohankumar et al 18 cases (27.69%) out of 65 cases of tubercular showed presence of symptoms. It was observed that only 4 cases (6.15%) out of 65 cases had a positive history.It was observed that the majority of nodes affected in tuberculosis (80%) were less than 4 cm in size it was observed that Upper jugular group (level-2) was the commonest to get involved in tuberculosis (30.76%) .2-5 Among the cases only 15.39% cases presented with bilateralnode

        mmrahman et al Out of 60 patients BCG vaccination had a significant protective role; 19(31.67%) were vaccinated and 41 (68.33%) wereTuberculin test was positive in 44(73.34%) and negative in 2 (3.33%) and doubtful in 14 (23.33%).The common presentations were neck swelling 60 (100%), fever 40 (66.67%) and night sweat in 30(50%), wt loss 21(35%).

        Devendra et al In this study 1-2 cm size group were found to be having equal chances of tubercular and non-specific reactive lymphadenitis but 78.94% lymph nodes with size >2 cm were positive for tubercular lymphadenitis .Fever> anorexia>malaise>night sweats & weight loss was commoner symptoms in TBL

        Vasuda et al The study having 227 tb cervical lymphadenopathy pts
        The majority of the patients were otherwise healthy adults, and constitutional symptoms were present in 13% only. All the groups of cervical lymph node were involved including right and left cervical, posterior triangle, submental, submandibular, and supraclavicular regions.

        Zyedzulfiquer et al Study having 242 cases of tb cervical lymphadenopathy
        Most common constitutional symptoms are fever as wt loss(75%), night sweats(72%), LOA(45%).Most of the patients don’t have active contact only 28% had contact and 28% had past h/o tb treatment duration of lymphadenopathy in most of cases was less than 3 months.The size of Lymph Node was more than 1 cm and less than 2 cms in 70% of the patients. Gross appearance of Lymphadenopathy was multiple mattered in 65% of the patients with no tenderness in 78%

        Salman et al study population is 50 patients.Symptoms vary from 6 months to 2 yrs but m/c 7 wks to 3months 39 patients didn’t have any constitutional symptoms and remaining m/c had fever>malaise> LOA. H/O tb contact history was present in 19 patients. Examination showed b/l seen in 60% and location m/c post triangle(70%) f/b upper deep cervical(24%) and most of the lymphnode size was <1.15cm.

        Shaukat et al study population was 80 patients. In our study fever and weight loss are common complaint 52.7% and 63.6% respectively And b/l more common than unilateral and anterior group of nodes are more common than post group of nodes
        Rasool et al Multiple lymphadenitis was found in majority of the cases 26(56.53%), while 20(43.47%) cases were found with presentation.We found lymph node less than 3 CM found in 31(67.39%) cases and more on of single lymphadenitis than 3 CM were in15 (32.61%) cases. Fever was commonest clinical feature in 76% cases, following by swelling, abscess, solid nodes, weight loss, loss of appetite and others were noted with percentage of 55.69%, 39.13%, 45.65%, 58.69% and 21.73% respectively

        CYTO PATHOLOGICAL, CULTURE AND DIRECT SMEAR EXAMINATION
        Karthikeyan et al Out of the 51 histopathologically confirmed cases of tuberculous cervical lymphadenitis, a diagnosis of tuberculosis was made in 43 cases by FNAC. The other 7 cases were diagnosed as chronic non-specific lymphadenitis. There were no false positive cases on FNAC. 44 cases were true negative for tuberculosis. The sensitivity and specificity of FNAC for diagnosing tuberculous lymphadenitis is therefore 86% and 100% respectively .

        Mohan kumar et al In the present study, both sensitivity and specificity of FNAC for for tuberculosis sensitivity was only 86.20% and specificity was 100%.

        Mm rahman et al In this study among 60 patients 44 (73.34%) were tuberculin positive (more than 10 mm induration), 14 (23.33%) were doubtful (between 1-10 mm) and 2 (3.33%) were negative(no induration seen Among the 60 patients of tuberculouscervicallymphadenitis 51 (85%) had caseation.

        Vasuda et al In this study, the cytomorphological features observed in the cases were caseating epithelioid granulomas [47.6%(108/227)], granulomatous lymphadenitis [33.9% (77/227)], necrotizing lymphadenitis [1.8% (4/227)], and necrotizing suppurative lymphadenitis [16.7% (38/227)] of cases. ZNstaining for AFB was done in all the cases. Smear positivityfor Mycobacterium sp. by conventional ZN method was 19.4% (44/227). AFB positivity was the maximum (44.7%) in necrotizing suppurative lymphadenitis .
        The appearance of aspirates found more commonly was blood mixed in 68.3% cases, followedby whitish cheesy material in 21.1%, pus-like in 6.2%, and yellowish in 4.4%. AFB positivity was the maximum (42.8%)in pus-like aspirate.

        Salman et al The study having population of 50 cases of which 41(82%) cases have been confirmed by FNAC. AFB seen in by direct smear examination in 12 cases and 9(18%) needed excisinal biopsy to confirm the diagnosis.

        Soumyajit et al FNAC was diagnostic in 42 cases (73.7%) where epitheloid granuloma and Langhan’s cells with or without necrosis was seen. The aspirate from affected lymph nodes did not reveal AFB in most of the cases. Only 23 samples (40.4%) revealed AFB after ZN staining. FNAC was non specific in 15 samples which further required incision/ excision biopsy for diagnosis.

        PROFORMA

        CASE NO: OPD REG NO:
        NAME: FATHER/HUSBAND NAME:
        AGE: SEX:
        OCUPATION: MARIETAL STATUS:
        AREA:

        PRESENTING COMPLIANT: DURATION
        LYMPHNODE ENLARGEMENT:
        FEVER:
        COUGH:
        WEIGT LOSS:
        LOSS OF APPETITE:
        CHEST PAIN:
        OTHERS POSITIVE HISTORY:

        PAST HISTORY:
        TUBERCULOSIS:
        HYPERTENSION:
        DIABETES:
        HIV:
        SURGICAL INTERVENTION:
        BLOOD TRANSFUSION:
        OTHER PAST SIGNIFICANT HISTORY:

        PERSONAL HISTORY:
        H/O SMOKING:
        H/O ALCOHOL:
        H/O DRUG ABUSE:
        BLADDER AND BOWEL COMPLIANT:
        H/O CONTACT WITH TB:
        NO OF CHILDREN:

        TREATMENT HISTORY:
        H/O ATT:
        ANY OTHER MEDICATION:

        GENERAL EXAMINATION:
        TEMPERATURE:
        B.P: PULSE: RESPIRATORY RATE:
        PALLOR: ICTERUS: CLUBBING: CYANOSIS: PEDAL EDEMA:
        BCG SCAR:
        LYMPHNODE :

        SYSTEMIC EXAMINATION
        CVS:

        RS:

        P/A:

        CNS:

        INVESTIGATIONS REPORTS;
        HB: TLC: DLC: ESR:
        Blood sugar(random): UREA: CREATININE:
        S.BILIRUBIN:Total- Direct- SGOT/SGPT/ALP:
        S.PROTEIN:Total- Albumin-
        URINE:Albumin- sugar- microscopy
        Sputum for AFB(D/S):
        X-ray CHEST:
        USG abdomen:
        FNAC report:
        AFB by D/S:
        CULTURE report:

        Like

        Reply
        1. drroopesh Post author

          I am not sure I understand what exactly you intend to do.

          You will recruit patients with tuberculous cervical lymphadenopathy, and obtain some information- this much is clear.

          What is not clear is what question you are trying to answer by collecting that information. That is why I requested you to provide your research question in PICO format.

          Please note that unless you provide an answerable research question, I will be unable to provide additional assistance.

          Regards,
          Dr. Roopesh

          Like

          Reply
  11. adaze woghiren

    hello please i m trying to correlate two variables in estimating the severity of chronic liver disease how do i go about calculating my sample size since it is a cross sectional study m conducting, thanks.

    Like

    Reply
    1. drroopesh Post author

      Dear Adaze,

      Please use the formula provided in above: 4pq/ l^2.

      If you provide details of your objectives and outcome variables, I might be able to provide specific guidance.

      Please note that I will be very busy this week, so might not be able to respond before the weekend.

      Regards,
      Dr. Roopesh

      Like

      Reply
  12. sara

    hello dr.
    my study is to identify the number of stem cells in diabetic patients group and non diabetic group then compare between tow groups. so is it comparative cross sectional design or case cnotrol? and how i can estimate the sample size?

    Like

    Reply
    1. drroopesh Post author

      Dear Sara,

      What is your research question? The study design is determined by the research question.

      Please formulate your research question using the PICO criteria and revert to me.

      Please note that I will be very busy over the coming week, hence might be unable to respond before the weekend.

      Regards,
      Dr. Roopesh

      Like

      Reply
      1. sara

        thanks dr. for replying…
        my research question is:
        in mild gestational diabetic women, is the number and quality of the haematopoietic stem cells of umbilical cord blood affected compared to non-gestational diabetic women?

        Like

        Reply
  13. bonifacelumori

    Dear Roopesh,

    I am still confused about sample size calculation. My study is on prevalence and factors associated with cardiomyopathy among diabetic patients. I wanted to used a prevalence of 67.8 ( a similar study done in my country). Please show me how your sample size will be, so that I can compare with what I got( which I think is not correct). Use absolute precision and 95% confident interval.

    With regards,
    Boniface

    Like

    Reply
  14. Kunle

    Hi, kindly clarify which formula I need to use to calculate the sample size for my study “Cryptosporidium parvum among HIV positive and Seronegative subjects attending National Hospital, Ilado”.
    The main objective is to compare the prevalence of C. parvum is these group of people. The study design is comparative cross-sectional

    Like

    Reply
  15. Anonymous

    Dear dr Roopesh,
    i m fawad qazi doing start research on “cadiopulmonary fitness in DOW medical university” by 1 mile walk test (rockport test). need ur help to calculate sample size.

    Like

    Reply
    1. drroopesh Post author

      Dear Fawad,

      You will have to state your research question in PICO format, objective(s) and outcome measure(s).

      Please go through previous comments in this thread, and related articles on this blog as well.

      Regards,
      Dr. Roopesh

      Like

      Reply
  16. annonymous

    Dear Dr Roopesh i am conducting a comparative sexual abuse study among adolescents in and out of school. i want a sample size of 520 for each group, what prevalence can i use to arrive at that using the sample size formula for comparing proportions, please help . Thanks

    Like

    Reply
    1. drroopesh Post author

      Dear Anonymous,

      Please note that one does not decide the sample size in advance and then reverse engineer to determine the prevalence.

      What you need to do is determine the prevalence from literature, then use the prevalence values thus obtained to estimate sample size. This needs to be done for each objective. Finally, select the largest sample size estimate obtained as your required sample size.

      Regards,

      Dr. Roopesh

      Like

      Reply
  17. Ann

    Dear Dr Roopesh,
    i would like to conduct a comparative cross sectional study comparing the mean of an analyte in 4 different cohort of patients, how do i calculate the sample size?
    Thanks.
    Regards

    Like

    Reply
      1. salman karim

        Dear Dr Roopesh, please can you guide me about how we calculate sample size for behavioral sciences studies ( related to student psychology).

        thank you salman karim

        On Sat, Oct 28, 2017 at 5:55 AM, communitymedicine4asses wrote:

        > drroopesh commented: “Dear Ann, Please state your research question (in > PICO format), and objective(s). Regards, Dr. Roopesh” >

        Like

        Reply
        1. drroopesh Post author

          Dear Salman Karim,

          The calculation depends upon the type of study and variables under consideration.

          The simplest approach is to determine sample size based on the type of study, as described here.

          The procedure would remain the same:
          1. State your research question (PICO format)- determines the study design
          2. State your objectives- provides information about the outcome variable(s) under consideration.
          3. From literature, determine values for outcome variable(s)
          4. Substitute values in appropriate formula
          5. Obtain required sample size.

          I hope this helps.

          Regards,
          Dr. Roopesh

          Like

          Reply
  18. Hasan

    dear Dr. Roopesh
    I am going to conduct a research to identify the factors affecting the patient satisfaction on rehabilitation service quality. hereby i used cross-sectional study design with a questioner. but i faced challenges to calculate my sample size. so could you give me any ideas, please?

    Like

    Reply
    1. drroopesh Post author

      Dear Priti,

      All epidemiological studies include comparison(s). Therefore, the study design is ‘Cross-Sectional Study’, not ‘Cross-Sectional Comparative Study’.

      You may use the formula provided in the article to estimate sample size.

      Regards,
      Dr. Roopesh

      Like

      Reply
  19. priti sapkota

    Calculation of sample size:
    Based on the study conducted by Johncy SS, Samuel TV, Jayalakshmi MK, Dhanyakumar G, Bondade SY. Prevalence of respiratory and non-respiratory symptoms in female sweepers, the sample size will be calculated for two proportion cases and controls. The study shows respiratory symptoms cough in 13.3% of controls and 36.6% of cases.
    Hence, Prevalence in cases (P1) = 0.366
    Prevalence in Controls (P2) = 0.133
    q1= 1-P1 =1-0.366=0.634
    q2=1-P2= 1-0.133 = 0.867
    Zα/2 at 95% = 1.96
    Zβ at 80% power = 0.846
    Ṕ= P1+P2/2 = 0.366 + 0.113/2 = 0.2495
    Ǭ = 1- Ṕ = 1-0.2495 = 0.7505
    Sample size (n) = { Zα/2 √2 Ṕ Ǭ+ Zβ √P1q1 +P2q2}2
    (P1-P2)2

    = {1.96√2×0.2495×0.7505 +0.846√0.366×0.634 + 0.133x 0.867} 2
    (0.366-0.133)2
    = 53 in each group
    Adding around 10% for non-response, a total of 118 samples in which 59 in sanitation workers and 59 in comparison group will be enrolled.

    is this calculation of sample size correct for cross-sectional study.

    Like

    Reply
    1. drroopesh Post author

      Dear Priti,

      Please note that in cross-sectional studies as well as case control studies, there is no need to adjust for non-response, since there is no follow-up, and those who don’t wish to participate are simply excluded from the study.

      Regards,
      Dr. Roopesh

      Like

      Reply
  20. priti sapkota

    thank you. could you please provide me an example calculation of sample size of cross-sectional study where comparison is used.

    Like

    Reply
    1. drroopesh Post author

      Dear Priti,

      Like I said earlier, comparisons are integral to epidemiological studies. For the purpose of sample size estimation, one only needs details of the variables in the formula- described in the article.

      Having said that, one could estimate sample size based on the type of variables under study, and differences between them-
      difference between two means
      difference between two proportions, and so on.

      A good example of how that works is using the free tool G*Power:
      http://www.gpower.hhu.de/en.html

      Click to access 10.3758%2FBF03203630.pdf

      You could also read the following article:
      https://communitymedicine4asses.wordpress.com/2014/04/18/sample-size-calculation-two-ways-of-approaching-it/

      I hope this helps.

      Regards,
      Dr. Roopesh

      Like

      Reply
  21. Mira

    Dear Dr Roopesh,

    I’m conducting a cross sectional study among a population of 162 workers, I have calculated my sample size using two proportions P1 and P2 formula, but the sample size obtained is 263 which is bigger than the population. May I know how can I correct the sample size so that it will be less than the population?

    Thank you and regards,
    Mira

    Like

    Reply
  22. Mira

    Dear Dr Roopesh,

    I have a population of 162 workers for my study, but the sample size calculated is bigger than my population which is 263. May I know how can I correct the sample size to be smaller than the population? Thank you

    Like

    Reply
  23. Hassan Benya

    I would like to conduct a cross sectional study but somehow confused to calculate my sample size because the population is quite huge about 1,055,964. I am doing a hypothetical study project title “Is there a relationship between socio-economic status and the risk of acquiring hepatitis B infection in Freetown, Sierra Leone” . Kindly I need your help on this.

    Like

    Reply
    1. drroopesh Post author

      Dear Hassan,

      The calculation of sample size remains largely unchanged. What you need to determine is the sampling method. Perhaps, multi-stage sampling will be suitable in your case.

      Regards,
      Dr. Roopesh

      Like

      Reply
  24. hoodo

    hello Dr
    my study design is cross-sectional study by collecting milk samples from various milk vendors and interview and observation of milking process and milk handling practices and of milk vendors and milk producers.
    which sample size i use

    Liked by 1 person

    Reply
  25. Ange

    Hi Dr. Roopesh

    I am doing a cross sectional study on determining bone mass in the lower limb of post-menopausal women and compare this with bone mass at other sites in these same women. How do I calculate the sample size for this project?

    Like

    Reply
  26. Pingback: How to calculate Sample Size with Epi Info 7: Cross-Sectional studies | communitymedicine4asses

  27. lulu

    i am doing a cross section study about the level of fruits and vegetables among adolescents in day and boarding schools how do i calculate the sample size if i don’t have the value of p

    Like

    Reply
    1. drroopesh Post author

      Dear Lulu,

      In case you don’t have a value of prevalence from literature, you may estimate the same from observations (yours and others’). Preferably, you must guess a maximum possible and minimum possible value, then calculate sample size using both values. That will give you a range of sample sizes. Choose one that is most feasible.

      I hope this helps.

      Regards,
      Dr. Roopesh

      Like

      Reply
  28. Ernest Nwachukwu

    Dear Dr. Roopesh,

    I am carrying out a study to describe the mobility profile of community-dwelling older adults in a region with a population of about 146,647 older adults.

    Please could you explain to me how to calculate the appropriate sample size for the study.

    Kind regards.

    Like

    Reply
  29. Ernest Nwachukwu

    Dear Dr Roopesh,

    Here is my research question in PICO format:
    What are the mobility profiles (patterns) of community-dwelling older adults in the southeastern part of Nigeria.

    P: community-dwelling older adults
    I: Test with Short Physical Performance Battery and 6 minutes walk test. Then interview with Preclinical disability scale and Lower Extremity Functional Scale.
    C: none
    O: mobility profiles (i.e. no mobility limitation, preclinical mobility limitation, mild mobility limitation, moderate mobility limitation or severe mobility limitation) or performance in the test.

    I hope this helps you in guiding me through the calculation of the appropriate sample size. Please remember the population size of the older adults in this region is about 146,647.

    Kind regards!

    Ernest

    Like

    Reply
      1. Ernest Nwachukwu

        Dear Dr Roopesh,

        Thank you so much for the link. I have gone through two of the articles and I have come up with the following research question in PICO format:

        “Among community-dwelling older adults, how prevalent is pre-clinical disability?”

        I hope I got it right this time.

        Please help me on how to calculate the appropriate sample size for this study taking the population of older adults in this region to be 146,647.

        Kind regards!

        Like

        Reply
        1. drroopesh Post author

          Dear Ernest,

          Since your research question seeks to determine the prevalence of pre-clinical disability, the study design would be cross-sectional study.

          In order to estimate sample size, one would require the prevalence of pre-clinical disability in a similar population; or a rough estimate of prevalence from clinical experience.

          The formula 4pq/l^2 will yield the sample size for a cross sectional study, where
          p: prevalence of preclinical disability (in %)
          q: (100-p)
          l: relative precision (a proportion of p; up to a maximum of 20% of p).

          You could obtain values of p from various studies, and take the largest sample size that is practical for you.

          Please also go through the following:
          https://communitymedicine4asses.com/2018/06/23/how-to-calculate-sample-size-with-epi-info-7/

          Regards,
          Dr. Roopesh

          Like

          Reply
          1. Ernest Nwachukwu

            Dear Dr. Roopesh,

            This has been most helpful to me. I have already invited several of my friends carrying out research works to visit this site.

            Thanks a million times.

            Regards
            Ernest

            Like

            Reply
          2. Ernest Nwachukwu

            Dear Dr Roopesh,

            I am happy to visit this site again.

            I will like to know if there is a scholarly or widely accepted name for this formula for calculating sample size for cross-sectional studies: 4pq/l^2.

            Kind regards

            Ernest

            Like

            Reply
            1. drroopesh Post author

              Dear Ernest,

              The formula doesn’t have a particular name. Nevertheless, it will be found in any biostatistics/ research methodology text dealing with sample size estimation.

              Regards,
              Dr. Roopesh

              Like

              Reply
  30. Nilusha Gayan Mahakumbura

    Dear Dr. Roopesh,

    I’m carrying out a descriptive cross sectional study and i want to take a sample out of a finite population. What sample size calculating formulas are the best for that?

    Thank you!

    best regards!

    Like

    Reply
  31. Nneoma

    Dear, Dr. Roopesh
    I’m carrying out an experimental work on the effect of aerobic exercise on self esteem of overweight and obese youth in university and I need to get a good sample size calculation.
    Thanks

    Like

    Reply
    1. drroopesh Post author

      Dear Nneoma,

      Apologies for the delay in responding.

      Please provide me your research question in PICO format, as well as objectives.

      Regards,
      Dr. Roopesh

      Like

      Reply
  32. Nneoma

    Dear Roopesh
    ‘what effect does aerobic exercise have on self esteem and self perceived body image of overweight and obese undergraduate students ‘.
    Sir this is my research topic in Pico format, it is an experimental study that involves two groups
    Thank you
    Nneoma

    Like

    Reply
    1. drroopesh Post author

      Dear Nneoma,

      What is the comparison group? Are you comparing between overweight and obese students, or are they a single group that you will compare with another group (normal)?

      For calculation of sample size of an experimental study, you need to specify the type of RCT- superiority/ equivalence/ non-inferiority; and provide an estimate of the effect size (how much of a difference do you expect between the two groups?).

      Please share the link to the main reference article you wish to use estimates from.

      Regards,
      Dr. Roopesh

      Like

      Reply
  33. Nneoma

    Dear Dr. Roopesh
    There are 2 groups, one undergoes exercise as intervention while the other is a control group. It is RCT that has both overweight and obese in each group. I’m really confused about the difference to expect from the two groups. As for the article there seems to be no similar research in my country.
    Thanks
    Nneoma

    Like

    Reply
  34. Nneoma

    Dear Dr. Roopesh
    There are 2 groups, one undergoes exercise as intervention while the other is a control group. It is RCT that has both overweight and obese in each group. I’m really confused about the difference to expect from the two groups. As for the article there seems to be no similar research in my country. It is non inferiority RCT.
    Thanks
    Nneoma

    Like

    Reply
    1. drroopesh Post author

      Dear Nneoma,

      Your outcome is self-esteem, which (I suspect) will be assessed by a tool that assigns scores. The difference in scores between those who undergo exercise (intervention) compared to those in the control arm is what I seek. You may obtain this information from existing studies (need not have been done in your setting, but must have similar study population (in terms of eligibility criteria)); or clinical observation (you may guess the difference based on observations from practice).

      I hope this helps.

      Regards,
      Dr. Roopesh

      Like

      Reply
      1. Nneoma

        Dear Roopesh
        I still couldn’t make something out of it. please is there no other way to calculate non inferiority RCT, that involves only two groups. Please is there any calculator or formula for it. Kindly check.
        Meanwhile I found a similar study but there was no sample size calculation. participants were those who showed interest in the study, the study was done without any particular number of sample in mind.
        my regards,
        Nneoma.

        Like

        Reply
        1. drroopesh Post author

          Dear Nneoma,

          Unfortunately, there is no alternative.

          However, I’ll try to simplify things for you:

          You are planning a RCT which has two arms- one of which receives exercise as the intervention, while the other is a control arm. The purpose is to see if the intervention affects self-esteem or not.

          If the process of randomization is done properly, both arms should be similar with respect to known and unknown confounders. In simple terms, randomization will cause the overweight and normal individuals to be distributed uniformly in both arms (this way, their influences will get cancelled out).

          Assuming you plan to use the Rosenberg self-esteem scale, you will possibly administer the tool before the start of intervention to determine baseline self-esteem scores. If randomization has been performed well, there shouldn’t be a significant difference between the two arms’ self-esteem scores.

          Next, you require the intervention arm to exercise for specified duration and intensity, while the control arm doesn’t. After some time you will stop the intervention. At this point, you will possibly measure the self-esteem scores once again.

          Unless self-esteem naturally declines with time, there should not be a significant difference between the two measurements of the control arm. However, there should be a difference between the intervention arm and control arm. It is the magnitude of this difference that is required for computation of sample size.

          Continuing with the Rosenberg self-esteem scale as our example, the scale has a maximum score of 30, with values less than 15 indicating low self-esteem. What you need to do is guess the scores before and after intervention between the two arms. You may find values reported by other researchers in general population. These could be taken as the baseline score in ordinary people. Ask yourself if the scores in your study population are likely to be higher or lower than those values. Take an educated guess and determine a value. Don’t worry too much about it being very accurate- it should be okay as long as you aren’t completely off the mark. This value is your baseline score (estimated). Now guess how much difference to self-esteem scores the intervention is likely to make over the duration of the study. Assume there is no change in the control arm. What is the difference between the scores of the intervention arm and the control arm? This is the difference you need to supply for calculation of sample size.

          I hope this helps.

          Regards,
          Dr. Roopesh

          Like

          Reply
          1. Nneoma

            Dear Roopesh
            I’m very sorry for the late reply
            jhrba.com › articles
            The Effects of Physical Activity on Self-Esteem: A Comparative Study
            I hope this is worth it.
            Regards
            Nneoma

            Like

            Reply
              1. drroopesh Post author

                Dear Nneoma,

                Based on the details in the article, and assuming you intend to conduct a parallel trial with equal allocation, the estimated sample size (power 80%, alpha error 5%) would be 13 subjects in each arm.

                Regards,
                Dr. Roopesh

                Like

                Reply
                1. Nneoma

                  Dear Roopesh
                  Thank you so much for your help, but I still need the formula or the textbook or even a link so that I will be able to reference it. Thank you once again.
                  Regards,
                  Nneoma.

                  Like

                  Reply
  35. monaelesely

    I’m starting cross sectional interventional study to detect the effect of kinesiotape on proprioception post ACL reconstruction surgery. i can’t find relative literature . how can I calculate the sample size?

    Like

    Reply
    1. drroopesh Post author

      Dear Monaelesely,

      A cross-sectional study will be inappropriate if you wish to establish/ investigate causality. A longitudinal study (or at least a pre-post design) is desirable.

      Regards,
      Dr. Roopesh

      Like

      Reply
    1. drroopesh Post author

      Dear Monaelesely,

      The study has several flaws, the first being the study design. A cross-sectional study is one where each subject contributes a single observation only. In the study, subjects contributed more than one measurement. The study would be best described as quasi-experimental.

      There should have been controls in order to establish that the change observed was on account of the tape, and not the subjects practicing tasks before the second measurement. As such, there is a strong risk of bias.

      Convenient sampling further limits the generalizability of findings, since the sample wouldn’t be representative of the population from which it was drawn.

      The sample size calculation would have to be for a non-inferiority RCT, not a cross-sectional study.

      I recommend that you conduct a proper Randomized Controlled Trial, avoiding the errors committed by the authors of the article.

      Regards,
      Dr. Roopesh

      Like

      Reply
  36. Aremu Olalekan

    hello Dr. Roopesh,
    i am working on a dissertation titled “assessment of quality of life and functional vision in children with visual impairment”. it is a cross sectional descriptive study. i will like to know the appropriate sample size i can use for this study and how do i go about calculating the sample size.
    Thank you

    Like

    Reply
    1. drroopesh Post author

      Dear Aremu,

      Please state your objectives in PICO format. Then use the formula 4*pq/ l^2 mentioned in the article, using prevalence values from existing literature as ‘p’. ‘q’ is simply (1-p); and ‘l’ is 20% of ‘p’.

      Please go through the comment thread for details. If you still have doubts, feel free to let me know.

      Regards,
      Dr. Roopesh

      Like

      Reply
  37. iqra ishrat

    hi dear dr. Roopesh
    i am working on title ”correlation between serum albumin levels and grades of esophageal varices in patients with chronic liver disease”. its a cross sectional study but i don’t know how to calculate the sample size in this study? in different countries the prevalence of chronic liver diseases is different. in US 2million annually death occur and in china its about 400,000 patients die annually.
    could you please guide me
    thankyou.

    Like

    Reply
    1. drroopesh Post author

      Dear Iqra,

      The calculation of sample size involves the estimation of a range of sample sizes, then choosing the most appropriate value based on feasibility, etc.

      What you mention are the absolute number of deaths due to chronic liver disease.

      What you need to calculate the sample size is the proportion of population with esophageal varices in chronic liver disease.

      I will be better able to guide you if you provide your study objectives, study population and research question (in PICO format).

      Regards,
      Dr. Roopesh

      Like

      Reply
  38. Kaustav jain

    hello dr roopesh,

    I am doing a study on TO DETERMINE THE RELATIONSHIP BETWEEN FRONTAL SINUS PNEUMATIZATION AND DIFFERENT ANATOMIC VARIANTS OF PARANASAL SINUSES ON MAXILLOFACIAL CT.

    I would be taking maxillofacial CT scan of random patients and classify them on the basis of frontal sinus morphology (on CT scan) into 3 groups (aplasia/ hypoplasia, medium and hyperplasia
    Then in each of the 3 groups look for variations(on CT scan) like Upper and middle concha pneumatisation , internal carotid artery dehiscence, nasal septal deviation etc.

    Since i am doing this on normal individuals just correlating it between normal structures whether they coexist or not so there is no prevalence…(prevalence can be found in literature like prevalence of full pneumatization of frontal sinus with deviated nasal septum….but i would be dividing patients into 3 groups and look for multiple things in 1 group

    Pls help me for taking approprite sample size or should i just take p as 0.5 and calculate sample size.
    Thank you.

    Like

    Reply
  39. Abhijit Das

    Dear Dr. Roopesh Sir,
    One article on “Sample size calculation for agreement study, particularly cohen’s kappa estimation” will be beneficial.
    Thank you, sir.

    Like

    Reply
  40. Davidson

    I am a bit confused on the sample size estimation formula to use for a study to determine the knowledge and practice of first aid by lay people. Do I use the formula for quantitative or qualitative for cross-sectional studies. Also I have not come across ‘prevalence’ of first aid practice. How does that fit into my calculation.

    Like

    Reply
  41. Pingback: Sample Size Calculation: The Essentials (Part 1) | communitymedicine4all

  42. Pingback: Sample Size Calculation: The Essentials (Part 2) | communitymedicine4all

  43. Talha

    Dear Dr. Roopesh,
    I’m writing a synopsis titled “Efficacy Of Selective Laser Trabeculoplasty (SLT) in Primary Open Angle Glaucoma in Patients on a Single or No Topical Drug Regimen” , in which my objective is to measure the fall in intra ocular pressure of the patients following SLT laser at 1,3 and 6 months interval. My understanding is that it’s a single armed Quasi- experimental study design for the synopsis. My 2 questions are
    1. what should be an appropriate sampling technique for this study?
    2. what formula can be used for calculating sample size?

    Thankyou.

    Like

    Reply
    1. drroopesh Post author

      Dear Talha,

      From your description I understand that there are two arms- single topical drug and no topical drug regimen. Please provide your research question in PICO format so that I may determine the appropriate study design. It is not evident that your study is a quasi-experimental design from your description.

      Regards,
      Roopesh

      Like

      Reply
      1. Dr. Talha Nafees

        Dear Dr. Roopesh,
        Following is my understanding of PICO for synopsis.

        P(patient/population) = patients of primary open angle glaucoma visiting outpatient department of hospital.
        I(intervention) = SLT laser will be applied to the patient eyes after recording their intra-ocular pressure(IOP)
        C(comparison)= compared to the (IOP) of the patients before SLT laser application.
        0(outcome) = decrease in IOP of the patients following SLT laser.

        Thankyou

        Like

        Reply
          1. Dr.Talha Nafees

            Dear Dr. Roopesh,
            thankyou so much for your kind help. I’ve determined its quasi experimental study.
            One last thing about sample size , is this formula okay for calculation sample size in this study
            Sample size n = [DEFF*Np(1-p)]/ [(d2/Z21-α/2*(N-1)+p*(1-p)]
            thankyou sir.

            Like

            Reply
            1. drroopesh Post author

              Dear Dr. Talha,

              There are different types of quasi-experimental study designs and analytic approaches differ for each. Please note that a single group pretest posttest design without control is a poor design with many challenges to internal and external validity.

              Regarding the sample size formula you have mentioned, it is appropriate for cluster studies with two groups. Typically, such studies are population based, not hospital based. Since you have only one group, the formula is inappropriate.

              For a single group pretest-posttest design without control the sample size formula is

              n = 2 + (Z1-α/2 + Z1-β)^2 * S^2/d^2

              where S = Standard deviation,
              d= Relative precision

              Thank you for your patience.

              Regards,
              Roopesh

              Like

              Reply
            2. Anis

              Dear Dr. Talha Nafees,
              If you don’t mind, I want to know about the name of the formula for sample size that you mentioned before. What is the name of that formula?

              Thank you

              Like

              Reply
  44. Seraj

    Hello Dr Roopesh
    Im doing a cross sectional study and there havent been any studies done regarding the topic im working on
    So i can determine the prevalence
    Is there any way where i can measure the sample size needed?

    Like

    Reply
    1. drroopesh Post author

      Dear Seraj,

      If there are no published studies, you can conduct a pilot survey to obtain an estimate of the prevalence, then use that for calculation.

      Regards,
      Dr. Roopesh

      Like

      Reply

Leave a reply to Ernest Nwachukwu Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.