Select Data for Analysis
Open the Data manager by either pressing the Data manager button, or choosing Data Manager in the Tools menu, both found in the main window. The selection of data is done under the Data selection tab in the Data manager which is open by default when the Data manager is opened. By default, all rows (samples) and columns (genes) are activated which is indicated by a green tick by each row/column name in the Row and Column tabs. You can inactivate a row/column by marking it in the Data manager and pressing the Deactivate sel. button to the right (sel.=selection). The inactivated rows/columns are now indicated with a red cross. Deactivated rows/columns are activated by marking them in the Data manager and pressing the Activate sel. button.
In the top right of the Data manager there are aids for selection. The Select all and Deselect all buttons lets you select/deselect all rows or columns depending which of the tabs is open. The Select 1,3,5,... and Select 2,4,6,... buttons lets you select every second row/column. If you have defined groups of rows, you can choose one of the groups in the drop-down list and press Select Group. By deselecting samples you can test the affects that suspected outliers have on the results of the analysis.

There is also an option of classifying rows as either Training or Test samples, where Training is the default setting. This is used in some classification methods where the models are based on the data in trainings samples, and tested on the test samples. You could e.g. have samples that you know are classified as either sick/healthy, and samples where the classification is unknown. The model should be based on the known samples with the gene expression as input and classification as output. The model can thereafter be used to classify the unknown samples. If you are not using these methods, then the Training/Test settings have no impact.
The columns, on the other hand, can be classified as either Predictor or Response, where Predictor is the default setting. This is also used in some classification methods to distinguish between columns containing input and output. Model will still be based on the Training samples but it will only use the data in columns classified as Predictors as input. The output of the model should mimic the data in the columns classified as Response. This is only used when the output data cannot be put in a classification column, e.g. because we want to scale the data, which is not possible in classification columns.

It is also possible to scale the data under the Data selection tab.
