Choose sample size
(This feature is only available in GenEx Pro/Enterprise)
Theory
For statistical reasons a large sample size is needed to be able to distinguish small population differences. This analysis performs a simulation based on estimated population standard deviations, desired significance level, and other parameters to estimate the necessary sample size in order to detect a specific difference. This can be useful e.g. to define a hypothesis for a confirmatory study based on preliminary results from an earlier exploratory study. The simulation setup is based on the assumption that the means of two sample data sets (Group A and Group B) will be compared and difference of the means will be estimated.
How to
Open the Exp. Design tab among the analysis tabs in the top of the main window, and press the Choose sample size for future experiment button to load the analysis.

Enter the standard deviations of the two sample data sets into the text boxes SD (Group A) and SD(Group B). These values have typically been obtained from an earlier exploratory study. In the text box Number of samples in each group you specify the group sizes to be reported in the resulting table (see below) separated by semicolons. The default values are: 5, 10, 15, 20, 30, and 50. There is also an option to select the power levels to be reported in the resulting figure and table; default values are: 99, 95, 90, and 80%. Available significance levels for the future experiment are 99%, 95% and 90%.
Choose a 1 tail test if you want to work under the assumption that the mean of one group is strictly larger than the other. Choose a 2 tail test if you do not want to make any such assumption and only evaluate the difference of the means. The latter is a more conservative approach and thus safer unless you are very confident in the assumptions of the 1 tail test. A Paired test is appropriate if each sample in Group A is related to a corresponding sample in Group B, e.g. if Group A contains samples from individuals before treatment and Group B contains samples from the same individuals after treatment. In other cases, use the Unpaired test.
The standard deviation is only strictly defined if the underlying distributions of the populations of Group A and Group B are normal distributed. If you have confirmed that your data in the exploratory study are normal distributed, or if you have strong reason to believe so, you may claim that the standard deviation is appropriately estimated and proceed with a t-test. Otherwise, running a Z-test is a more conservative and thus safer test, but the power of the Z-test is much weaker than the power of the t-test. However, the central limit theorem states that the t-test may be appropriate even if the underlying distributions are not normal distributed, if the sample size are large enough. For these reasons the t-test is usually appropriate for most data sets, and thus often the selected approach. Press the Run button to see the results in form of a graph and a table.

The smallest significantly detectable difference between the sample populations decreases with the number of samples used. Also, the higher the required power of the test, the larger the number of samples needed. The graph gives an overview of the necessary number of samples to be used for the hypothesis testing in a confirmatory study given the standard deviations etc.

The table holds the numerical values of the detectable difference between the groups with the given confidence level. The columns represents the power levels (Power [%]) specified in the dialog (see above), and the rows represents the specified sample sizes (Number of samples in each group). This gives more detailed values of the necessary numbers of samples, as a complement to the comprehensive overview as given in the graph above.
