Contents - Index


Non-parametric tests

 

Theory

Non-parametric tests have a great advantage over the t-tests because they are independent of the underlying distribution of the data population. However, because the t-tests, which are based on a normal distribution, usually are more powerful than the non-parametric tests, you may want to estimate your data's underlying distribution. The Kolomogorov-Smirnov test is available in GenEx and test if the data population confounds to a normal distribution. If it comes out as TRUE, it might be better to use a t-test instead, since non-parametric tests are not as powerful as parametric tests. If the data is not normally distributed, you might take the logarithm of the data an test again for normality. One should have at least 10 measurements in each group for the test to make decent approximation and at least 20 for the p-value to be reliable.

 

How to

Open the Statistics tab among the analyses tabs in the top of the main window, and press the Non-parametric tests button to load the analysis into the Control panel. There must be defined groups of samples in the data set to be able to load the analysis. If the data has been transposed, make sure that there are groups of genes. The tests compare the difference of means of two groups, and the groups are defined in the Data manager under the Groups tab.

 

    

 

There is a check box list with all defined groups to the left in the Control Panel. Select exactly two of these, which are the groups which means will be compared in the selected test. Also select the genes that you are interested in in the right check box list. If you are interested whether the two group means are equal or not, select a 2-tail test with the radio buttons. Select a 1-tail test if you are only interested in if one group has a higher expression than the other group. 

 

    

 

There are two different non-parametric test for comparing the means of two groups: Mann-Whitney's test is used for unpaired data, and Wilcoxon's test for paired data. Unpaired data is when the samples are independent of each other, e.g. if 100 subjects are distributed at random in two groups before they are treated (drug/placebo), one treatment per group, and tested once. Paired data is when there are pairs of samples that depend on each other, e.g. if 100 subjects are distributed at random in two groups, and tested both before and after they have received treatment (drug/placebo). This way, you have two paired measures for each subject and should therefore use Wilcoxon's test. Otherwise, use Mann-Whitney's test. Press the Run button the run the analysis.

 

The result table includes the results of a Kolomogorov-Smirnov test which test if the data population is normally distributed. KS is the Kolomogorov-Smirnov test statistic and KS P-Value its corresponding p-value. A high KS p-value indicates that the samples are indeed normally distributed, while a low p-value indicate that they are not. This is summarized in the row Norm. dist. (Normally distributed) which is green and states TRUE if Kolomogorov-Smirnov's test indicates that the data is normally distributed, or is red saying FALSE if not. The non-parametric tests does not assume anything about the underlying distribution of the data, but if the Kolomogorov-Smirnov test indicates that the data is normally distributed, it might be better to use a t-test which has more power.

 

A concept used in both tests is rank, which is calculated by ordering all values from smallest to largest. The smallest value will have rank 1, second largest rank 2 and so on. If two values are equal and they should have e.g. rank 25 and 26, then both will have rank 25.5. If  the Mann-Whitney test was selected, the report displays the mean of the ranks within each group (MeanRank A/B), the Mann-Whitney test statistic (U), a standard normal deviate (Z), and the sum of all the ranks within each group (Total Rank). The result table for Wilcoxon's test include the number of experimental measurements (Count), the sum of ranks higher than the hypothetical value (T+), the sum of ranks lower than the hypothetical value (T-), the sum of T+ and T- (W), a standard normal deviate (Z), and the Spearman correlation coefficient that indicates how the groups are correlated.

 

 

 

The most important row for practical purposes is probably the p-value, P (1-tail) or P (2-tail) depending on whether a 1-tail or 2-tail test was selected in the Control panel. The p-value is the probalility that, given that the null hypothesis is true, you would obtain data at least as extreme as the data that was actually observed. A low p-value indicates that the null hypothesis should be rejected, and that there is indeed a difference of means between the two groups. If more than one gene is tested at the same time, you are performing multiple testing with an increased risk of finding differences between groups purely by chance. Remember to use the corrected threshold p-value given in the message dialog.