Let us consider the estimation of sample size for a cross-sectional study.

In order to estimate the required sample size, we need to know the following:

**p**: The prevalence of the condition/ health state. If the prevalence is 32%, it may be either used as such (32%), or in its decimal form (0.32).

**q**: i. When p is in percentage terms: (100-p)

ii. When p is in decimal terms: (1-p)

**d (or l)**: The precision of the estimate. This could either be the relative precision, or the absolute precision. This will be discussed later in this post.

**Za [Z alpha]**: The value of z from the probability tables. If the values are normally distributed, then 95% of the values will fall within 2 standard errors of the mean. The value of z corresponding to this is 1.96 (from the standard normal variate tables).

The formula for estimating sample size is given as:

(Za)^2[p*q] where the symbol ^ means ‘to the power of’; * means ‘multiplied by’

N= d^2 that is, “Z-alpha squared into pq; upon d-square”

substituting the values of Za, we get:

N= (1.96)^2[p*q]

d^2

We can round off the value of Za (1.96) to 2, to obtain:

N= (2)^2[p*q]

d^2

or, N= 4pq/ d^2 that is, “4 pq by d-square”

**Example:**

I wish to conduct a cross-sectional study on awareness of Hepatitis B among school children. A literature search reveals that other investigators have reported knowledge to range from 5% to 20% among students of grades 6 through 8. What should the size of my sample be?

The formula requires us to input the value of d (precision). If the absolute precision is known, there is no problem. However, often we can only input a relative precision. Where do we get the value of relative precision from?

Typically, relative precision is taken as a proportion of ‘p’. The maximum permissible limit is 20% of ‘p’.

In the above example, if ‘p’ is 20%, then ‘d’ will be (20/100)*20= 0.2*20= 4 {Taking a relative precision of 20%}.

This means that we will be able to detect a ‘p’ (prevalence) of 18% or more {half the value of relative precision on either side of ‘p’–> +/- 2%: 18% to 22%}.

That is, by taking a relative precision of 20% of ‘p’, the study will be able to detect the true awareness level if the actual prevalence is 18% or more. If the actual prevalence is less than 18%, however, the study will be unable to detect it accurately.

Therefore, the larger the value of ‘p’ (prevalence), the larger the possible value of ‘d’ (relative precision), keeping ‘d’ fixed (say, at 20% of ‘p’). If the prevalence is 50%, ‘d’ (20% of ‘p’) would then be 0.2*50= 10 (as compared to ‘d’ = 4 when ‘p’ = 20%).

The reverse is also true: the smaller the value of ‘p’, the smaller the value of ‘d’. A smaller ‘d’ implies a larger sample size. Therefore, the choice of ‘p’ is crucial.

We can now input the values in the formula to obtain the sample size:

For the calculation we will take ‘d’ as 4. This yields:

N= (4*20*80)/ (4*4)

= 400 this sample size will enable us to detect the truth if the prevalence is between 18-22% (or more).

If we took ‘p’= 5, then the sample size would be:

N= (4*5*95)/(1*1) [‘d’= 0.2*5= 1]

= 1900 this sample size will enable us to detect the truth if the prevalence is between 4-6% (or more).

So should I take ‘p’= 20% or ‘p’=5%?

That depends upon:

1. The location of the original study- if you are planning to conduct the study in an urban area, use the prevalence reported by studies conducted in urban areas, and vice versa.

2. The available resources (time, manpower, money, etc.). Aim for the largest feasible sample size. The size should be adequate to yield 80% power. Do not unnecessarily increase the sample size unless the intention is to obtain greater power. If so, please mention the same in the methodology section.

3. The results of your pilot study. If you have conducted a pilot study, the prevalence obtained from that study should be taken as ‘p’. This will be much more accurate than any other external value.

**Note 1**: * If you have multiple objectives, you must calculate the required sample size for each objective, then choose the largest sample size thus obtained. This will ensure adequate power for all objectives, else the study will lack power for one or more objectives.* That is, you may not be able to detect a significant result where it actually exists

*because*you failed to include enough subjects to detect it.

**Note 2**: It is advisable to mention a range rather than a single value for sample size. This is standard practice in the west, but not in India. A range may be obtained by calculating the sample size for different values of ‘p’.

SekartajiDear Dr Roopesh,

I would like to conduct a cross sectional study and I have difficulties to find the formula to calculated my sample size because the population is quite huge about 211,857. I am going to survey the knowledge, health belief and intention of female adolescent towards HPV vaccination and no previous study had ever done about this topic in my country. Could you please give me an advice about that matter?

Your help is greatly appreciated.

Sincerely,

Sekartaji

LikeLike

drroopeshPost authorDear Sekartaji,

If I understand the question correctly, you want to know how to compute sample size from a population of 211,857 individuals.

Please use the prevalence from the following (and similar) articles to estimate the required sample size using the formula for cross-sectional studies:

https://www.ncbi.nlm.nih.gov/pubmed/24188759

In order to obtain your sample, you might consider cluster or multi-stage sampling.

Hope this helps.

Regards,

Dr. Roopesh

LikeLike

David LazarusWhat are the possible reasons for increasing sample size for cross-sectional studies?

LikeLike

drroopeshPost authorDear David,

It is not ethical or practical to unnecessarily inflate the sample size for any study.

The commonest reason for wanting to do so would be to increase the power of the study to detect even minor differences of interest.

Another reason could be the desire to capture as much variation in the population as possible. However, this could be achieved by adopting a good sampling method.

Regards,

Dr. Roopesh

LikeLike

David LazarusThank you so much for the answer and well appreciated.

LikeLike

AchanyaHow do I calculate the sample size for which the cases will be matched with control, give previous study gave prevalence of 32%.

LikeLike

drroopeshPost authorDear Achanya,

Do you intend to have 1:1 matching, or higher?

I hope you realize that in a case control study one is comparing proportions of outcome between cases and controls.

Therefore, for sample size calculation, you need to provide proportions for both cases and controls.

Regards,

Dr. Roopesh

LikeLike

Winfred Nelsonwhen calculating sample size for three communities using sloven’s formula, if you add total for the three (for example 1474) and calculate you get about half the size ( 315) then you can use proportion formula to redistribute. However, if you were to calculate for each of the communities with populations 350, 774 and 350 you get a total of 624. Now, if I am using a mixed methods what number should I interview 315 or 624?

LikeLike

Winfred NelsonIn fact the design is exploratory sequential so I will do a questionnaire survey generalise results and based on that select my qualitatives ( FGDs and Indepth interviews etc. The three communities are made up of farmers who all practice rainfed farming, but farmers from 2 of the communities also practice dry season farming because they use small scale dams during the dry season. Again what are my justifications for interviewing 315, and not 624 is it okay so I do not incur unnecessary cost ?

LikeLike

drroopeshPost authorHere’s a link to another useful document- by Creswell himself:

http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1047&context=dberspeakers

You might also find the following useful: http://epubs.scu.edu.au/cgi/viewcontent.cgi?article=1069&context=comm_pubs

Regards,

Dr. Roopesh

LikeLike

WinfredThanks very much Dr. Roopesh. Have downloaded the materials and take a critical look at them. If there are any issues thereafter , I will get back. Have a good day.

LikeLike

Winfred NelsonEXPLANATORY SEQUENTIAL rather by Creswell

LikeLike

drroopeshPost authorDear Winfred Nelson,

Please go through the following document for clarity on Mixed Methods Research:

The sample size would be influenced by the choice of qualitative approach; as well as the relationship between quantitative and qualitative samples- identical, nested, parallel or multilevel.

Hope this helps.

Regards,

Dr. Roopesh

LikeLike

QusayI Would like to conduct a study which hasn’t been done in my country, so how can I estimate a sample size. My study is the influence of body mass index on liver size.

Regards,

LikeLike

drroopeshPost authorDear Qusay,

Even though the study hasn’t been conducted in your country, it is possible to estimate sample size.

From literature, identify the findings reported by other investigators. They would likely have reported several measures- AP diameter/ Transverse diameter/ Volume, etc. Determine which measure is of importance to your study, and note the relationship between BMI and that specific measure.

Identify a study that was conducted in a setting similar to your own (even if in another country, factors like setting (rural/ urban); economic status (developing/ developed); etc. could be similar).

Then determine what proportion of subjects in that study have the relationship of interest. Use that to estimate sample size using the formula provided in the article above.

Hope this helps.

Regards,

Dr. Roopesh

LikeLike

AnonymousHi

i am going to conduct a cross section study about the prevalence of cancer in ladys around the age of the menopause with an ovarian cyst and looking of a biochemical marker called Ca 125

still i am unable to calculate the sample size ?

LikeLike

drroopeshPost authorDear Someone,

Please perform a detailed review of literature and determine what proportion of perimenopausal women with ovarian cysts have elevated Ca 125 levels.

Use that proportion to estimate sample size by substituting in the formula provided in the article above.

If you get a range, estimate sample size using the lowest proportion, and use that to conduct your study if feasible.

Regards,

Dr.Roopesh

LikeLike

Solomonam going to do survey on bankingt industry . but there population size are different from one another. how am i going to deaal with that please help

LikeLike

drroopeshPost authorDear Solomon,

You could try using cluster sampling method to conduct your survey. Each Bank would constitute a cluster, and you could perform sampling proportionate to size.

If restricted to branches of a single bank, clusters could be determined on the basis of zones or regions, with business handled (in money terms- $, ₹, etc.) determining the proportionate size of each cluster.

Hope this helps.

Regards,

Dr. Roopesh

LikeLike

BonifaceDear Dr Roopesh,

I am conducting a cross sectional study on prevalence of cardiomyopathy among diabetes patients. Similar study done in my country showed a prevalence of 40%. I used the above formula for cross- sectional studies and used relative precision, 20%(of 40%). I was asked by my university research committee, why have I chosen relative precision instead of absolute precision. Initially when I was writting my proposal I tried absolute precision and it had given me a high sample of 334. When i used a relative precision, 20%(of 40%), it had given me,144, which I preferred (due to the limited study budget). How do you think I should answer the above question? And help me specifically with reasons for using relative precision instead of absolute precision?

LikeLike

drroopeshPost authorDear Boniface,

Please read the article on relative and absolute precision:

https://communitymedicine4asses.wordpress.com/2014/12/30/relative-and-absolute-precision-in-sample-size-calculation/

The article has links to useful sources. You may benefit from reading both.

I hope this helps.

Regards,

Dr. Roopesh

LikeLike

Joshua CI am conducting a research on Sleep disorders in children with enlarged adenoids and tonsils in a hospital in Nigeria.Kindly help me with the type of study design and sample size calculation since I could not find a similar study and prevalence

LikeLike

drroopeshPost authorDear Joshua,

Please state your research question (in PICO format) and objective(s).

Regards,

Dr. Roopesh

LikeLike

Ramkumarhello,drroopesh im planning conduct cross sectional study of tb cervical lymphadenopathy clinico patho and demographic profile without folllowup for minimum of 1 yr … i dont know how to to calculate sample size .. previous studies are there but they are having indifferent sample size .. and pls help help me to calculate sample size of around 100

LikeLike

drroopeshPost authorDear Ramkumar,

Please state your objective(s), study population and outcome measure(s).

Regards,

Dr. Roopesh

LikeLike

Anonymousobjectives in the mean of demographic and clinico pathological profile , study population is op patients and in ward patients , outcome measures based final reports

LikeLike

AnonymousTHESIS PROTOCOL

CLINICO-PATHOLOGICAL AND DEMOGRAPHIC PROFILE OF

TUBERCULAR CERVICAL LYMPHADENOPATHY

Thesis Protocol Submitted For

DIPLOMATE OF NATIONAL BOARD

(RESPIRATORY MEDICINE)

AIMS AND OBJECTIVES

PRIMARY OUTCOME

• TO STUDY THE CLINICO-PATHOLOGICAL AND DEMOGRAPHIC

PROFILE OF TUBERCULAR CERVICAL LYMPHADENOPATHY PATIENTS

MATERIAL AND METHODS

STUDY DESIGN

The present study is proposed to be a Cross-Sectional study will be conducted in NATIONAL INSTITUTE OF TB AND RESPIRATORY DISEASES where the patients in both OPD and IPD.The patients will be enrolled between aug’2017 to dec’ 2018 will be part of the study .

STUDY METHOD

Patients who are attending OPD and pt’s in IPD will be enquired about detailed history and through clinical examination will be done.Followed by all routine investigations and special tests like mantoux test, usg abdomen and FNAC of lymphnode with sample direct smear, cytopathological examination and culture for MTB will be done at NITRD. And finally reports will be analyzed as in the profoma.

SAMPLE SIZE AND STUDY PERIOD

The expected patients in the study will be between aug’2017 to dec’2018 who are giving consent for the study and those who are eligible for study.

CRITERIA FOR SELECTION OF PATIENTS

Inclusion criteria;

• All patients who are agree to participate in the study.

Exclusion criteria;

• Patients who are not willing to participate in the study.

• Patients with primary diagnosis of other diseases(e.g: cancer,sarcoidosis, pyogenic infections & etc).

REVIEW OF LITERATURE

DEMOGRAPHIC INCIDENCE;

Mm rahman et al Out of 60 patients 40 were female and 20 were male and female male ratio was 2: 1. The most vulnerable age group was the 2nd decade 23(38.33%). The present study shows that the peak age incidence is 2nd decade of life (38.3%) and the 2nd highest incidence 3rd decade with 30%.

Hussain et al out of 50 patients Male to female ratio is 2.1:1 most common during 2nd and 3rd decade of life (52% )with a peak incidence in the 2nd decade (32%).

Devendra et al Out of 118 cases was found to be more prevalent in females as 30 out of 54(55.55%). In this study, we found out that TBL are commoner in 13-30 age groups, 83.33% .

Vasuda et al out of 227 There were 113 (49.7%) female and 114 (50.3%) The maximum number [167 (73.6%)] of cases suggestive of cytomorphology of tubercular lymphadenitis were aged in the range of 11–30 years.

Shaukat et al total 110 cases Out of these 42(38.1%) were males and 68(61.8%) were female. The majority of patients were in the age range between 10 to 30 years and next group belong to the 4th decade.

Rasool et al Total 46 of which cases Female gender was found in the majority 28(61.87%) while male gender was 18(39.13%).

Soumya et al A total of 63 patients were enrolled in the study of which 25 were males and 38 females The most commonly affected group in the study was 15–24 years age comprising of 57.1% (36 cases).

Mohammed ali et al 115 cases there were 71 males and 44 females. The male to female ratio in present study was 1.61:1The majority ofpatients affected were in the age group of 13 to 20 years (39.13%) followed by 21 to 30 years (28.70%). The least affected age group was 61 to 70 years (1.74%).

Chaitali et al Data of 80 patients was analyzed in this study.Gender wise 57 (71.3%) were females and remaining 23 (28.7%) were males.

Naresh et al Males 48% and females 52%. In 50 cases the disease commonly affected the affected were 2nd decade 18% and 3rd decade 8% respectively. Commonest age group affected is between 11and 20> 21, and 30 closely followed by 31 and 40 years .

CLINICAL PRESENTATION

Karthi et al, Majority did not have symptoms 16 cases (31.4%) out of 51 showed symptoms fever was the most common , seen in 31% of cases, followed by malaise in 18% . It was observed 8 cases (15.6%) out of 51 cases had a positive history contact with tb . It was observed that posterior triangle was the commonest to get involved (31.3%) followed by upper deep jugular (21.5%). Levels 1, 3 and 4 were equally involved.And the majority of nodes (78.4%) were 4 cm. It was seen in 41 cases out of total 51 cases (80.3%) had U/L involvement. The remaining (19.7%) had bilateral involvement. and multiple node involvement in 39 cases (76.5%) while 12 cases (23.5%) showed single. Matting was observed in 14 of the 51 cases (27.4%). discrete lymph nodes which was present in 37 of the 51 cases (29.7%).

Mohankumar et al 18 cases (27.69%) out of 65 cases of tubercular showed presence of symptoms. It was observed that only 4 cases (6.15%) out of 65 cases had a positive history.It was observed that the majority of nodes affected in tuberculosis (80%) were less than 4 cm in size it was observed that Upper jugular group (level-2) was the commonest to get involved in tuberculosis (30.76%) .2-5 Among the cases only 15.39% cases presented with bilateralnode

mmrahman et al Out of 60 patients BCG vaccination had a significant protective role; 19(31.67%) were vaccinated and 41 (68.33%) wereTuberculin test was positive in 44(73.34%) and negative in 2 (3.33%) and doubtful in 14 (23.33%).The common presentations were neck swelling 60 (100%), fever 40 (66.67%) and night sweat in 30(50%), wt loss 21(35%).

Devendra et al In this study 1-2 cm size group were found to be having equal chances of tubercular and non-specific reactive lymphadenitis but 78.94% lymph nodes with size >2 cm were positive for tubercular lymphadenitis .Fever> anorexia>malaise>night sweats & weight loss was commoner symptoms in TBL

Vasuda et al The study having 227 tb cervical lymphadenopathy pts

The majority of the patients were otherwise healthy adults, and constitutional symptoms were present in 13% only. All the groups of cervical lymph node were involved including right and left cervical, posterior triangle, submental, submandibular, and supraclavicular regions.

Zyedzulfiquer et al Study having 242 cases of tb cervical lymphadenopathy

Most common constitutional symptoms are fever as wt loss(75%), night sweats(72%), LOA(45%).Most of the patients don’t have active contact only 28% had contact and 28% had past h/o tb treatment duration of lymphadenopathy in most of cases was less than 3 months.The size of Lymph Node was more than 1 cm and less than 2 cms in 70% of the patients. Gross appearance of Lymphadenopathy was multiple mattered in 65% of the patients with no tenderness in 78%

Salman et al study population is 50 patients.Symptoms vary from 6 months to 2 yrs but m/c 7 wks to 3months 39 patients didn’t have any constitutional symptoms and remaining m/c had fever>malaise> LOA. H/O tb contact history was present in 19 patients. Examination showed b/l seen in 60% and location m/c post triangle(70%) f/b upper deep cervical(24%) and most of the lymphnode size was <1.15cm.

Shaukat et al study population was 80 patients. In our study fever and weight loss are common complaint 52.7% and 63.6% respectively And b/l more common than unilateral and anterior group of nodes are more common than post group of nodes

Rasool et al Multiple lymphadenitis was found in majority of the cases 26(56.53%), while 20(43.47%) cases were found with presentation.We found lymph node less than 3 CM found in 31(67.39%) cases and more on of single lymphadenitis than 3 CM were in15 (32.61%) cases. Fever was commonest clinical feature in 76% cases, following by swelling, abscess, solid nodes, weight loss, loss of appetite and others were noted with percentage of 55.69%, 39.13%, 45.65%, 58.69% and 21.73% respectively

CYTO PATHOLOGICAL, CULTURE AND DIRECT SMEAR EXAMINATION

Karthikeyan et al Out of the 51 histopathologically confirmed cases of tuberculous cervical lymphadenitis, a diagnosis of tuberculosis was made in 43 cases by FNAC. The other 7 cases were diagnosed as chronic non-specific lymphadenitis. There were no false positive cases on FNAC. 44 cases were true negative for tuberculosis. The sensitivity and specificity of FNAC for diagnosing tuberculous lymphadenitis is therefore 86% and 100% respectively .

Mohan kumar et al In the present study, both sensitivity and specificity of FNAC for for tuberculosis sensitivity was only 86.20% and specificity was 100%.

Mm rahman et al In this study among 60 patients 44 (73.34%) were tuberculin positive (more than 10 mm induration), 14 (23.33%) were doubtful (between 1-10 mm) and 2 (3.33%) were negative(no induration seen Among the 60 patients of tuberculouscervicallymphadenitis 51 (85%) had caseation.

Vasuda et al In this study, the cytomorphological features observed in the cases were caseating epithelioid granulomas [47.6%(108/227)], granulomatous lymphadenitis [33.9% (77/227)], necrotizing lymphadenitis [1.8% (4/227)], and necrotizing suppurative lymphadenitis [16.7% (38/227)] of cases. ZNstaining for AFB was done in all the cases. Smear positivityfor Mycobacterium sp. by conventional ZN method was 19.4% (44/227). AFB positivity was the maximum (44.7%) in necrotizing suppurative lymphadenitis .

The appearance of aspirates found more commonly was blood mixed in 68.3% cases, followedby whitish cheesy material in 21.1%, pus-like in 6.2%, and yellowish in 4.4%. AFB positivity was the maximum (42.8%)in pus-like aspirate.

Salman et al The study having population of 50 cases of which 41(82%) cases have been confirmed by FNAC. AFB seen in by direct smear examination in 12 cases and 9(18%) needed excisinal biopsy to confirm the diagnosis.

Soumyajit et al FNAC was diagnostic in 42 cases (73.7%) where epitheloid granuloma and Langhan’s cells with or without necrosis was seen. The aspirate from affected lymph nodes did not reveal AFB in most of the cases. Only 23 samples (40.4%) revealed AFB after ZN staining. FNAC was non specific in 15 samples which further required incision/ excision biopsy for diagnosis.

PROFORMA

CASE NO: OPD REG NO:

NAME: FATHER/HUSBAND NAME:

AGE: SEX:

OCUPATION: MARIETAL STATUS:

AREA:

PRESENTING COMPLIANT: DURATION

LYMPHNODE ENLARGEMENT:

FEVER:

COUGH:

WEIGT LOSS:

LOSS OF APPETITE:

CHEST PAIN:

OTHERS POSITIVE HISTORY:

PAST HISTORY:

TUBERCULOSIS:

HYPERTENSION:

DIABETES:

HIV:

SURGICAL INTERVENTION:

BLOOD TRANSFUSION:

OTHER PAST SIGNIFICANT HISTORY:

PERSONAL HISTORY:

H/O SMOKING:

H/O ALCOHOL:

H/O DRUG ABUSE:

BLADDER AND BOWEL COMPLIANT:

H/O CONTACT WITH TB:

NO OF CHILDREN:

TREATMENT HISTORY:

H/O ATT:

ANY OTHER MEDICATION:

GENERAL EXAMINATION:

TEMPERATURE:

B.P: PULSE: RESPIRATORY RATE:

PALLOR: ICTERUS: CLUBBING: CYANOSIS: PEDAL EDEMA:

BCG SCAR:

LYMPHNODE :

SYSTEMIC EXAMINATION

CVS:

RS:

P/A:

CNS:

INVESTIGATIONS REPORTS;

HB: TLC: DLC: ESR:

Blood sugar(random): UREA: CREATININE:

S.BILIRUBIN:Total- Direct- SGOT/SGPT/ALP:

S.PROTEIN:Total- Albumin-

URINE:Albumin- sugar- microscopy

Sputum for AFB(D/S):

X-ray CHEST:

USG abdomen:

FNAC report:

AFB by D/S:

CULTURE report:

LikeLike

drroopeshPost authorI am not sure I understand what exactly you intend to do.

You will recruit patients with tuberculous cervical lymphadenopathy, and obtain some information- this much is clear.

What is not clear is what question you are trying to answer by collecting that information. That is why I requested you to provide your research question in PICO format.

Please note that unless you provide an answerable research question, I will be unable to provide additional assistance.

Regards,

Dr. Roopesh

LikeLike

adaze woghirenhello please i m trying to correlate two variables in estimating the severity of chronic liver disease how do i go about calculating my sample size since it is a cross sectional study m conducting, thanks.

LikeLike

drroopeshPost authorDear Adaze,

Please use the formula provided in above: 4pq/ l^2.

If you provide details of your objectives and outcome variables, I might be able to provide specific guidance.

Please note that I will be very busy this week, so might not be able to respond before the weekend.

Regards,

Dr. Roopesh

LikeLike

sarahello dr.

my study is to identify the number of stem cells in diabetic patients group and non diabetic group then compare between tow groups. so is it comparative cross sectional design or case cnotrol? and how i can estimate the sample size?

LikeLike

drroopeshPost authorDear Sara,

What is your research question? The study design is determined by the research question.

Please formulate your research question using the PICO criteria and revert to me.

Please note that I will be very busy over the coming week, hence might be unable to respond before the weekend.

Regards,

Dr. Roopesh

LikeLike

sarathanks dr. for replying…

my research question is:

in mild gestational diabetic women, is the number and quality of the haematopoietic stem cells of umbilical cord blood affected compared to non-gestational diabetic women?

LikeLike

bonifacelumoriDear Roopesh,

I am still confused about sample size calculation. My study is on prevalence and factors associated with cardiomyopathy among diabetic patients. I wanted to used a prevalence of 67.8 ( a similar study done in my country). Please show me how your sample size will be, so that I can compare with what I got( which I think is not correct). Use absolute precision and 95% confident interval.

With regards,

Boniface

LikeLike

drroopeshPost authorDear Boniface,

Please read page 59 of the following document:

I believe this will help resolve your doubts.

Regards,

Dr. Roopesh

LikeLike

KunleHi, kindly clarify which formula I need to use to calculate the sample size for my study “Cryptosporidium parvum among HIV positive and Seronegative subjects attending National Hospital, Ilado”.

The main objective is to compare the prevalence of C. parvum is these group of people. The study design is comparative cross-sectional

LikeLike

drroopeshPost authorDear Kunle,

The formula would be the same as mentioned in the article: 4pq/l^2.

Regards,

Dr. Roopesh

LikeLike

AnonymousDear dr Roopesh,

i m fawad qazi doing start research on “cadiopulmonary fitness in DOW medical university” by 1 mile walk test (rockport test). need ur help to calculate sample size.

LikeLike

drroopeshPost authorDear Fawad,

You will have to state your research question in PICO format, objective(s) and outcome measure(s).

Please go through previous comments in this thread, and related articles on this blog as well.

Regards,

Dr. Roopesh

LikeLike

annonymousDear Dr Roopesh i am conducting a comparative sexual abuse study among adolescents in and out of school. i want a sample size of 520 for each group, what prevalence can i use to arrive at that using the sample size formula for comparing proportions, please help . Thanks

LikeLike

drroopeshPost authorDear Anonymous,

Please note that one does not decide the sample size in advance and then reverse engineer to determine the prevalence.

What you need to do is determine the prevalence from literature, then use the prevalence values thus obtained to estimate sample size. This needs to be done for each objective. Finally, select the largest sample size estimate obtained as your required sample size.

Regards,

Dr. Roopesh

LikeLike

AnnDear Dr Roopesh,

i would like to conduct a comparative cross sectional study comparing the mean of an analyte in 4 different cohort of patients, how do i calculate the sample size?

Thanks.

Regards

LikeLike

drroopeshPost authorDear Ann,

Please state your research question (in PICO format), and objective(s).

Regards,

Dr. Roopesh

LikeLike

salman karimDear Dr Roopesh, please can you guide me about how we calculate sample size for behavioral sciences studies ( related to student psychology).

thank you salman karim

On Sat, Oct 28, 2017 at 5:55 AM, communitymedicine4asses wrote:

> drroopesh commented: “Dear Ann, Please state your research question (in > PICO format), and objective(s). Regards, Dr. Roopesh” >

LikeLike

drroopeshPost authorDear Salman Karim,

The calculation depends upon the type of study and variables under consideration.

The simplest approach is to determine sample size based on the type of study, as described here.

The procedure would remain the same:

1. State your research question (PICO format)- determines the study design

2. State your objectives- provides information about the outcome variable(s) under consideration.

3. From literature, determine values for outcome variable(s)

4. Substitute values in appropriate formula

5. Obtain required sample size.

I hope this helps.

Regards,

Dr. Roopesh

LikeLike

AnonymousThank you so much

LikeLike

Hasandear Dr. Roopesh

I am going to conduct a research to identify the factors affecting the patient satisfaction on rehabilitation service quality. hereby i used cross-sectional study design with a questioner. but i faced challenges to calculate my sample size. so could you give me any ideas, please?

LikeLike

drroopeshPost authorDear Hasan,

Please state your research question, objective(s) and outcome measure(s).

Regards,

Dr. Roopesh

LikeLike

priti sapkotahow to calculate sample size for cross sectional comparative study

LikeLike

drroopeshPost authorDear Priti,

All epidemiological studies include comparison(s). Therefore, the study design is ‘Cross-Sectional Study’, not ‘Cross-Sectional Comparative Study’.

You may use the formula provided in the article to estimate sample size.

Regards,

Dr. Roopesh

LikeLike

priti sapkotaCalculation of sample size:

Based on the study conducted by Johncy SS, Samuel TV, Jayalakshmi MK, Dhanyakumar G, Bondade SY. Prevalence of respiratory and non-respiratory symptoms in female sweepers, the sample size will be calculated for two proportion cases and controls. The study shows respiratory symptoms cough in 13.3% of controls and 36.6% of cases.

Hence, Prevalence in cases (P1) = 0.366

Prevalence in Controls (P2) = 0.133

q1= 1-P1 =1-0.366=0.634

q2=1-P2= 1-0.133 = 0.867

Zα/2 at 95% = 1.96

Zβ at 80% power = 0.846

Ṕ= P1+P2/2 = 0.366 + 0.113/2 = 0.2495

Ǭ = 1- Ṕ = 1-0.2495 = 0.7505

Sample size (n) = { Zα/2 √2 Ṕ Ǭ+ Zβ √P1q1 +P2q2}2

(P1-P2)2

= {1.96√2×0.2495×0.7505 +0.846√0.366×0.634 + 0.133x 0.867} 2

(0.366-0.133)2

= 53 in each group

Adding around 10% for non-response, a total of 118 samples in which 59 in sanitation workers and 59 in comparison group will be enrolled.

is this calculation of sample size correct for cross-sectional study.

LikeLike

drroopeshPost authorDear Priti,

The study design is Case Control Study.

Please go through the following:

You may use the following online tools to calculate sample size:

http://www.openepi.com/SampleSize/SSCC.htm

http://sampsize.sourceforge.net/iface/s3.html

Hope this helps.

Regards,

Dr. Roopesh

LikeLike

drroopeshPost authorDear Priti,

Please note that in cross-sectional studies as well as case control studies, there is no need to adjust for non-response, since there is no follow-up, and those who don’t wish to participate are simply excluded from the study.

Regards,

Dr. Roopesh

LikeLike

priti sapkotathank you. could you please provide me an example calculation of sample size of cross-sectional study where comparison is used.

LikeLike

drroopeshPost authorDear Priti,

Like I said earlier, comparisons are integral to epidemiological studies. For the purpose of sample size estimation, one only needs details of the variables in the formula- described in the article.

Having said that, one could estimate sample size based on the type of variables under study, and differences between them-

difference between two means

difference between two proportions, and so on.

A good example of how that works is using the free tool G*Power:

http://www.gpower.hhu.de/en.html

You could also read the following article:

https://communitymedicine4asses.wordpress.com/2014/04/18/sample-size-calculation-two-ways-of-approaching-it/

I hope this helps.

Regards,

Dr. Roopesh

LikeLike

priti sapkotathank you very much

LikeLike

MiraDear Dr Roopesh,

I’m conducting a cross sectional study among a population of 162 workers, I have calculated my sample size using two proportions P1 and P2 formula, but the sample size obtained is 263 which is bigger than the population. May I know how can I correct the sample size so that it will be less than the population?

Thank you and regards,

Mira

LikeLike

MiraDear Dr Roopesh,

I have a population of 162 workers for my study, but the sample size calculated is bigger than my population which is 263. May I know how can I correct the sample size to be smaller than the population? Thank you

LikeLike

drroopeshPost authorDear Mira,

I suggest you use finite population correction.

The following should help:

https://onlinecourses.science.psu.edu/stat414/print/book/export/html/264

http://www.statisticshowto.com/finite-population-correction-factor/

Regards,

Dr. Roopesh

LikeLike

michaelyou can use the reduction formula

n/1+n/N

263/1+263/162=100.3

n=your sample

N= total population you have

LikeLike

Hassan BenyaI would like to conduct a cross sectional study but somehow confused to calculate my sample size because the population is quite huge about 1,055,964. I am doing a hypothetical study project title “Is there a relationship between socio-economic status and the risk of acquiring hepatitis B infection in Freetown, Sierra Leone” . Kindly I need your help on this.

LikeLike

drroopeshPost authorDear Hassan,

The calculation of sample size remains largely unchanged. What you need to determine is the sampling method. Perhaps, multi-stage sampling will be suitable in your case.

Regards,

Dr. Roopesh

LikeLike

hoodohello Dr

my study design is cross-sectional study by collecting milk samples from various milk vendors and interview and observation of milking process and milk handling practices and of milk vendors and milk producers.

which sample size i use

LikeLiked by 1 person

drroopeshPost authorDear Hoodo,

Thanks for writing in.

What is/are the objective(s) of your study?

Regards,

Dr. Roopesh

LikeLiked by 1 person

AngeHi Dr. Roopesh

I am doing a cross sectional study on determining bone mass in the lower limb of post-menopausal women and compare this with bone mass at other sites in these same women. How do I calculate the sample size for this project?

LikeLike

drroopeshPost authorDear Ange,

You might want to consider cluster sampling. The following might be useful:

http://www.statisticshowto.com/what-is-cluster-sampling/

Please read the following for a more detailed discussion of cluster sampling. Sample size calculation is discussed from page 12 onwards.

I hope this helps.

Regards,

Dr. Roopesh

LikeLike

Pingback: How to calculate Sample Size with Epi Info 7: Cross-Sectional studies | communitymedicine4asses

lului am doing a cross section study about the level of fruits and vegetables among adolescents in day and boarding schools how do i calculate the sample size if i don’t have the value of p

LikeLike

drroopeshPost authorDear Lulu,

In case you don’t have a value of prevalence from literature, you may estimate the same from observations (yours and others’). Preferably, you must guess a maximum possible and minimum possible value, then calculate sample size using both values. That will give you a range of sample sizes. Choose one that is most feasible.

I hope this helps.

Regards,

Dr. Roopesh

LikeLike

Krishna SubedCOULD YOU GIVE ME THE REFERENCE ABOUT WHEN WE CAN USE RELATIVE PRECISION AS %OF PROPORTION? THANK YOU IN ADVACNE

LikeLike

drroopeshPost authorDear Krishna Subed,

Please find the references at the end of the article on relative and absolute precision:

https://communitymedicine4asses.com/2014/12/30/relative-and-absolute-precision-in-sample-size-calculation/

Regards,

Dr. Roopesh

LikeLike

Ernest NwachukwuDear Dr. Roopesh,

I am carrying out a study to describe the mobility profile of community-dwelling older adults in a region with a population of about 146,647 older adults.

Please could you explain to me how to calculate the appropriate sample size for the study.

Kind regards.

LikeLike

drroopeshPost authorDear Ernest,

Please provide me with your research question in PICO format- that determines the study design.

Thanks!

Dr. Roopesh

LikeLike

Ernest NwachukwuDear Dr Roopesh,

Here is my research question in PICO format:

What are the mobility profiles (patterns) of community-dwelling older adults in the southeastern part of Nigeria.

P: community-dwelling older adults

I: Test with Short Physical Performance Battery and 6 minutes walk test. Then interview with Preclinical disability scale and Lower Extremity Functional Scale.

C: none

O: mobility profiles (i.e. no mobility limitation, preclinical mobility limitation, mild mobility limitation, moderate mobility limitation or severe mobility limitation) or performance in the test.

I hope this helps you in guiding me through the calculation of the appropriate sample size. Please remember the population size of the older adults in this region is about 146,647.

Kind regards!

Ernest

LikeLike

drroopeshPost authorDear Ernest,

Please go through the article below for guidance on formulating a research question using the PICO criteria:

https://communitymedicine4asses.com/2013/08/18/how-to-formulate-a-research-question-the-pico-criteria/

Regards,

Dr. Roopesh

LikeLike

Ernest NwachukwuDear Dr Roopesh,

Thank you so much for the link. I have gone through two of the articles and I have come up with the following research question in PICO format:

“Among community-dwelling older adults, how prevalent is pre-clinical disability?”

I hope I got it right this time.

Please help me on how to calculate the appropriate sample size for this study taking the population of older adults in this region to be 146,647.

Kind regards!

LikeLike

drroopeshPost authorDear Ernest,

Since your research question seeks to determine the prevalence of pre-clinical disability, the study design would be cross-sectional study.

In order to estimate sample size, one would require the prevalence of pre-clinical disability in a similar population; or a rough estimate of prevalence from clinical experience.

The formula 4pq/l^2 will yield the sample size for a cross sectional study, where

p: prevalence of preclinical disability (in %)

q: (100-p)

l: relative precision (a proportion of p; up to a maximum of 20% of p).

You could obtain values of p from various studies, and take the largest sample size that is practical for you.

Please also go through the following:

https://communitymedicine4asses.com/2018/06/23/how-to-calculate-sample-size-with-epi-info-7/

Regards,

Dr. Roopesh

LikeLike

Ernest NwachukwuDear Dr. Roopesh,

This has been most helpful to me. I have already invited several of my friends carrying out research works to visit this site.

Thanks a million times.

Regards

Ernest

LikeLike

drroopeshPost authorDear Ernest,

I am glad to have been of help to you. Do visit again!

Regards,

Dr. Roopesh

LikeLike

Ernest NwachukwuDear Dr Roopesh,

I am happy to visit this site again.

I will like to know if there is a scholarly or widely accepted name for this formula for calculating sample size for cross-sectional studies: 4pq/l^2.

Kind regards

Ernest

LikeLike

drroopeshPost authorDear Ernest,

The formula doesn’t have a particular name. Nevertheless, it will be found in any biostatistics/ research methodology text dealing with sample size estimation.

Regards,

Dr. Roopesh

LikeLike

Nilusha Gayan MahakumburaDear Dr. Roopesh,

I’m carrying out a descriptive cross sectional study and i want to take a sample out of a finite population. What sample size calculating formulas are the best for that?

Thank you!

best regards!

LikeLike

drroopeshPost authorDear Nilusha,

Please see my response to Mira (31 Dec 2017) above regarding finite population.

Regards,

Dr. Roopesh

LikeLike

NneomaDear, Dr. Roopesh

I’m carrying out an experimental work on the effect of aerobic exercise on self esteem of overweight and obese youth in university and I need to get a good sample size calculation.

Thanks

LikeLike

drroopeshPost authorDear Nneoma,

Apologies for the delay in responding.

Please provide me your research question in PICO format, as well as objectives.

Regards,

Dr. Roopesh

LikeLike

NneomaDear Roopesh

‘what effect does aerobic exercise have on self esteem and self perceived body image of overweight and obese undergraduate students ‘.

Sir this is my research topic in Pico format, it is an experimental study that involves two groups

Thank you

Nneoma

LikeLike

drroopeshPost authorDear Nneoma,

What is the comparison group? Are you comparing between overweight and obese students, or are they a single group that you will compare with another group (normal)?

For calculation of sample size of an experimental study, you need to specify the type of RCT- superiority/ equivalence/ non-inferiority; and provide an estimate of the effect size (how much of a difference do you expect between the two groups?).

Please share the link to the main reference article you wish to use estimates from.

Regards,

Dr. Roopesh

LikeLike

NneomaDear Dr. Roopesh

There are 2 groups, one undergoes exercise as intervention while the other is a control group. It is RCT that has both overweight and obese in each group. I’m really confused about the difference to expect from the two groups. As for the article there seems to be no similar research in my country.

Thanks

Nneoma

LikeLike

NneomaDear Dr. Roopesh

There are 2 groups, one undergoes exercise as intervention while the other is a control group. It is RCT that has both overweight and obese in each group. I’m really confused about the difference to expect from the two groups. As for the article there seems to be no similar research in my country. It is non inferiority RCT.

Thanks

Nneoma

LikeLike

drroopeshPost authorDear Nneoma,

Your outcome is self-esteem, which (I suspect) will be assessed by a tool that assigns scores. The difference in scores between those who undergo exercise (intervention) compared to those in the control arm is what I seek. You may obtain this information from existing studies (need not have been done in your setting, but must have similar study population (in terms of eligibility criteria)); or clinical observation (you may guess the difference based on observations from practice).

I hope this helps.

Regards,

Dr. Roopesh

LikeLike

NneomaDear Roopesh

I still couldn’t make something out of it. please is there no other way to calculate non inferiority RCT, that involves only two groups. Please is there any calculator or formula for it. Kindly check.

Meanwhile I found a similar study but there was no sample size calculation. participants were those who showed interest in the study, the study was done without any particular number of sample in mind.

my regards,

Nneoma.

LikeLike

drroopeshPost authorDear Nneoma,

Unfortunately, there is no alternative.

However, I’ll try to simplify things for you:

You are planning a RCT which has two arms- one of which receives exercise as the intervention, while the other is a control arm. The purpose is to see if the intervention affects self-esteem or not.

If the process of randomization is done properly, both arms should be similar with respect to known and unknown confounders. In simple terms, randomization will cause the overweight and normal individuals to be distributed uniformly in both arms (this way, their influences will get cancelled out).

Assuming you plan to use the Rosenberg self-esteem scale, you will possibly administer the tool before the start of intervention to determine baseline self-esteem scores. If randomization has been performed well, there shouldn’t be a significant difference between the two arms’ self-esteem scores.

Next, you require the intervention arm to exercise for specified duration and intensity, while the control arm doesn’t. After some time you will stop the intervention. At this point, you will possibly measure the self-esteem scores once again.

Unless self-esteem naturally declines with time, there should not be a significant difference between the two measurements of the control arm. However, there should be a difference between the intervention arm and control arm. It is the magnitude of this difference that is required for computation of sample size.

Continuing with the Rosenberg self-esteem scale as our example, the scale has a maximum score of 30, with values less than 15 indicating low self-esteem. What you need to do is guess the scores before and after intervention between the two arms. You may find values reported by other researchers in general population. These could be taken as the baseline score in ordinary people. Ask yourself if the scores in your study population are likely to be higher or lower than those values. Take an educated guess and determine a value. Don’t worry too much about it being very accurate- it should be okay as long as you aren’t completely off the mark. This value is your baseline score (estimated). Now guess how much difference to self-esteem scores the intervention is likely to make over the duration of the study. Assume there is no change in the control arm. What is the difference between the scores of the intervention arm and the control arm? This is the difference you need to supply for calculation of sample size.

I hope this helps.

Regards,

Dr. Roopesh

LikeLike

NneomaDear Roopesh

I’m very sorry for the late reply

jhrba.com › articles

The Effects of Physical Activity on Self-Esteem: A Comparative Study

I hope this is worth it.

Regards

Nneoma

LikeLike

Nneomahttps://www.google.com/url?sa=t&source=web&rct=j&url=http://jhrba.com/en/articles/13221.html&ved=2ahUKEwjzgOaBjNLjAhUZHcAKHXz7DCYQFjABegQIARAB&usg=AOvVaw09rtKGB5HD72suzckaNxUS

I think this link is better.

LikeLike

drroopeshPost authorDear Nneoma,

Based on the details in the article, and assuming you intend to conduct a parallel trial with equal allocation, the estimated sample size (power 80%, alpha error 5%) would be 13 subjects in each arm.

Regards,

Dr. Roopesh

LikeLike

NneomaDear Roopesh

Thank you so much for your help, but I still need the formula or the textbook or even a link so that I will be able to reference it. Thank you once again.

Regards,

Nneoma.

LikeLike

drroopeshPost authorDear Nneoma,

Please mail me at communitymedicine4asses@yahoo.com for the above.

Regards,

Dr. Roopesh

LikeLike

monaeleselyI’m starting cross sectional interventional study to detect the effect of kinesiotape on proprioception post ACL reconstruction surgery. i can’t find relative literature . how can I calculate the sample size?

LikeLike

drroopeshPost authorDear Monaelesely,

A cross-sectional study will be inappropriate if you wish to establish/ investigate causality. A longitudinal study (or at least a pre-post design) is desirable.

Regards,

Dr. Roopesh

LikeLike

monaeleselyDear Dr. Roopesh,

i found this study, its almost same as the design that i want to do but they used convenient sample.

https://www.researchgate.net/publication/325275331_EFFECT_OF_KINESIOTAPING_ON_PROPRIOCEPTION_IN_PATIENTS_POST_ANTERIOR_CRUCIATE_LIGAMENT_RECONSTRUCTION_SURGERY

i would like to know if it is possible to do calculation for sample size for my study as it almost similar?

appreciation

Monaelesely

LikeLike

drroopeshPost authorDear Monaelesely,

The study has several flaws, the first being the study design. A cross-sectional study is one where each subject contributes a single observation only. In the study, subjects contributed more than one measurement. The study would be best described as quasi-experimental.

There should have been controls in order to establish that the change observed was on account of the tape, and not the subjects practicing tasks before the second measurement. As such, there is a strong risk of bias.

Convenient sampling further limits the generalizability of findings, since the sample wouldn’t be representative of the population from which it was drawn.

The sample size calculation would have to be for a non-inferiority RCT, not a cross-sectional study.

I recommend that you conduct a proper Randomized Controlled Trial, avoiding the errors committed by the authors of the article.

Regards,

Dr. Roopesh

LikeLike

Aremu Olalekanhello Dr. Roopesh,

i am working on a dissertation titled “assessment of quality of life and functional vision in children with visual impairment”. it is a cross sectional descriptive study. i will like to know the appropriate sample size i can use for this study and how do i go about calculating the sample size.

Thank you

LikeLike

drroopeshPost authorDear Aremu,

Please state your objectives in PICO format. Then use the formula 4*pq/ l^2 mentioned in the article, using prevalence values from existing literature as ‘p’. ‘q’ is simply (1-p); and ‘l’ is 20% of ‘p’.

Please go through the comment thread for details. If you still have doubts, feel free to let me know.

Regards,

Dr. Roopesh

LikeLike

iqra ishrathi dear dr. Roopesh

i am working on title ”correlation between serum albumin levels and grades of esophageal varices in patients with chronic liver disease”. its a cross sectional study but i don’t know how to calculate the sample size in this study? in different countries the prevalence of chronic liver diseases is different. in US 2million annually death occur and in china its about 400,000 patients die annually.

could you please guide me

thankyou.

LikeLike

drroopeshPost authorDear Iqra,

The calculation of sample size involves the estimation of a range of sample sizes, then choosing the most appropriate value based on feasibility, etc.

What you mention are the absolute number of deaths due to chronic liver disease.

What you need to calculate the sample size is the proportion of population with esophageal varices in chronic liver disease.

I will be better able to guide you if you provide your study objectives, study population and research question (in PICO format).

Regards,

Dr. Roopesh

LikeLike

Kaustav jainhello dr roopesh,

I am doing a study on TO DETERMINE THE RELATIONSHIP BETWEEN FRONTAL SINUS PNEUMATIZATION AND DIFFERENT ANATOMIC VARIANTS OF PARANASAL SINUSES ON MAXILLOFACIAL CT.

I would be taking maxillofacial CT scan of random patients and classify them on the basis of frontal sinus morphology (on CT scan) into 3 groups (aplasia/ hypoplasia, medium and hyperplasia

Then in each of the 3 groups look for variations(on CT scan) like Upper and middle concha pneumatisation , internal carotid artery dehiscence, nasal septal deviation etc.

Since i am doing this on normal individuals just correlating it between normal structures whether they coexist or not so there is no prevalence…(prevalence can be found in literature like prevalence of full pneumatization of frontal sinus with deviated nasal septum….but i would be dividing patients into 3 groups and look for multiple things in 1 group

Pls help me for taking approprite sample size or should i just take p as 0.5 and calculate sample size.

Thank you.

LikeLike

drroopeshPost authorDear Kaustav,

What is your research question, and what are your objectives? Sample size must be calculated for each objective separately.

Regards,

Dr. Roopesh

LikeLike

Abhijit DasDear Dr. Roopesh Sir,

One article on “Sample size calculation for agreement study, particularly cohen’s kappa estimation” will be beneficial.

Thank you, sir.

LikeLike

drroopeshPost authorDear Abhijit Das,

Thank you for the suggestion. I will write an article on that in the near future.

Regards,

Dr. Roopesh

LikeLike