Category Archives: Biostatistics

How to solve a 2×2 contingency Chi-square

Disclaimer: This article is in response to a request from a reader, and is deliberately limited to a 2×2 contingency Chi-square as that is taught to MBBS students in Community Medicine. However, others may find value in this article as well.

Background Information:

Chi-square (Chi is pronounced as ‘kai’, and rhymes with sky) test is employed when one wants to compare the distribution of two or more categorical variables. When dealing with two categorical variables (say sex, and marital status), each with two levels (Male/Female, and Married/Unmarried), the data can be presented in a 2×2 contingency table (that is, a table containing two rows and two columns).

In essence, a chi-square test involves comparing observed frequency with expected frequency to determine if the observed difference between the variables is by chance. When the difference cannot be explained by chance it is said to be statistically significant.

Procedure:

Performing chi-square test involves the following steps:

Step 1: Define the Null and Alternate Hypotheses

Step 2: Calculate the Observed and Expected frequencies

Step 3: Calculate the Test Statistic

Step 4: Find the critical value

Step 5: Reject or Fail to Reject the Null Hypothesis

Solved Example

Question

A researcher measured the Body Mass Index (BMI) of 200 boys and 250 girls. While 50 boys were overweight, 120 girls were overweight. Is there a relationship between sex and being overweight?

Solution

First, we state the Null and Alternate hypotheses.

The Null hypothesis (H0) is: There is no difference in proportion of overweight between boys and girls.

Alternate hypothesis (H1) is: There is a difference in proportion of overweight between boys and girls.

Note: The Null hypothesis for a chi-square test is always two-sided (that is, it does not state that one is greater than the other).

Let us set up a 2×2 contingency table based on the data provided.

Table 1. Observed frequencies based on the problem

Note: Chi-square test can be carried out only the actual numbers, not on percentages, proportions, means of observations, or other derived statistics.

Next, we must calculate Expected frequencies.

The general format of a 2×2 contingency table is shown below

To obtain the Expected Frequency for a cell in the table, simply multiply the corresponding Row Total with the appropriate Column Total, then divide the product by the Grand Total.

Using this method, the Expected Frequencies in the problem can be calculated as follows

Cell A: (200×170)/450 = 75.6

Cell B: (200×280)/450 = 124.4

Cell C: (250×170)/450 = 94.4

Cell D: (250×280)/450 = 155.6

We now have an Expected value to match each observed value. The sum of the expected values must equal the sum of the observed values, which is a useful check. Adding the expected values, we get the sum of Expected values as 449.8 (which is approximately equal to 450- the Grand Total).

To calculate Chi-square statistic, we need to know the difference between Observed and Expected Frequency for each value. For this purpose, we create a table as shown below.

Note: The sum of (O-E) always equals zero. This is a useful check.

The sum of (O-E)^2 gives the chi-square statistic. Here, the chi-square statistic = 8.66+5.26+6.94+4.21 = 25.07.

To determine if the chi-square statistic is statistically significant, we must next look up the chi-square distribution table and look up values corresponding to the appropriate degrees of freedom. The degrees of freedom equal (number of rows in the contingency table minus one) x (number of columns in the contingency table minus one) (not counting the row and column containing the totals). For a 2×2 contingency table the degrees of freedom equal (2-1) x (2-1)=1.

Statistical significance is usually said to be present when the p-value is less than 0.05. Thus, we need to look up the chi-square distribution table and see the table value corresponding to 1 degree of freedom and probability 0.05. From the chi-square distribution table, we see that the critical value is 3.84. Thus, if the chi-square test statistic is equal to or greater than 3.84, we must reject the Null hypothesis.

In the present problem the chi-square statistic is 25.07. Therefore, we reject the Null hypothesis that there is no difference in proportion of overweight between boys and girls. We may also state that there is a statistically significant difference in proportion of overweight between boys and girls; or that girls are significantly more likely to be overweight than boys.

Useful Links:

Link to document describing how to perform chi-squared test manually as well as using software:

https://web.pdx.edu/~newsomj/uvclass/ho_chisq.pdf

Link to BMJ article on Chi-squared test:

https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/8-chi-squared-tests

Link to previous post containing links to video tutorials on chi-squared test:

https://communitymedicine4all.com/2013/08/04/chi-square-test-tutorials/