Disclaimer: This article is primarily intended for an online group of post graduate students in Community Medicine that I am involved with. The group was created to provide supplemental instruction to members on topics of common interest. Instruction is in bite-sized portions, since all members are busy PG students. Conceptual understanding is emphasized. Membership to that (Whatsapp) group is through invitation only. However, others interested in participating in the discussions and related activities in Google classroom may indicate the same by sending me a message on Facebook.
So far we have looked at some attributes that form the basis of classification of variables. In order to choose the appropriate statistical test, one needs to consider the following:
Is the dependent variable continuous or categorical?
When examining the attributes of the dependent variable, one must first determine if it is continuous (numerical), or categorical (names/ labels).
Note: When assessing if a variable is numerical or not, ask yourself if any numbers associated with the variable are meaningful or not.
Explanation: Often, we assign numeric codes to represent the various responses to an item. This is done for ease of data entry, and does not alter the fundamental attributes of the variable. For instance, the variable ‘Sex’ having responses ‘Male’ and ‘Female’ may be coded as ‘1’ and ‘2’ respectively. Thus, the values associated with variable ‘Sex’ will be 1 and 2. However, these numbers don’t mean anything- one cannot meaningfully take the arithmetic mean of these values- since the numbers were (artificially) assigned to the variable (One could take the mean and report it, but how does one explain a value of 1.4 (say) for Sex?). One could just as well have assigned 555 and 666 to the responses, since the numbers by themselves are meaningless, and are useful only in that they signify particular responses. Therefore, by looking at the number one is able to determine the response.
However, weight of a person is meaningful by itself- as an expression of the amount of gravitational pull exerted by Earth on the person. It is a measure, not a code for something else. Therefore, one may meaningfully take the arithmetic mean of the weights of 100 people.
If continuous, does it follow a normal distribution?
Where the variable is continuous, one must determine whether it is normally distributed or not. This is required so that one may if the assumptions for parametric tests of significance are satisfied or not. (Parametric tests of significance are tests of significance applied when the data are normally distributed. By extension, non-parametric tests of significance are tests of significance applied when the data do not follow a normal distribution (skewed data)).
Is the independent variable continuous or categorical?
As for the dependent variable, one needs to ascertain if the independent variable is continuous or categorical.
If categorical, how many levels does the variable have?
When dealing with categorical variables, particular attention must be paid to the number of levels. In the case of variable ‘Sex’ described above, there were two levels- ‘Male’ and ‘Female’. Categorical variables with two levels are also called binary variables. Categorical variables may have more than two levels also- the variable ‘Blood Group’ has the following levels: ‘A’, ‘B’, ‘O’, ‘AB’. The choice of test of statistical significance depends upon the number of levels of the (independent) categorical variable.
Are the independent variable and dependent variable paired?
Does the response/ outcome/ dependent variable depend upon the explanatory/ independent variable? If there is a dependancy, then they are called ‘paired’. Typical examples of paired variables are pre-post variables. There, the pre-intervention values influence the post-intervention values.
Example: Weight before and after a diet program are ‘paired’. However, weight of male participants versus female participants in a weight-loss program are not ‘paired’.
In subsequent articles, we will see how these considerations come together to determine the appropriate statistical test for significance testing.
Links to previous article in this series: