Contents - Index


Support Vector Machine

(This feature is only available in GenEx Enterprise)

 

Theory

An SVM is a multivariate model that takes the expression profiles of one or several genes from the (training) samples as input, and the data in one or several classification columns as output. The purpose of a support vector machine (SVM) is to make a separation in the data space to be able to classify unknown samples. This is done by reducing the space dimensions by one from the original input (to e.g. create a 2D plane within a cube for a 3D system or a line within a square for a 2D system) and to maximize the distance of this subspace to the points in your data set. If the data has been transposed, exchange all instances of "gene" for "sample", and vice versa in the following description.

 

    

 

How to

Open the Networks tab among the analysis tabs in the top of the main window, and press the Support vector machine button to load the analysis into the Control panel. The basic settings in SVM includes the ability to set which classification column to use for training (Classification)  and the maximum number of iterations (Max iterations). The Max iterations is used to abort the process if it gets stuck in an infinite loop. There are also the buttons to Load network and Save network, which are used to classify new data without the need for new training.

 

By pressing the Advanced button you get to the advanced settings. Here, you have the ability to change the Kernel type, which describes how the subspace is calculated, and to define which Norm case should be used in the vector calculations (L2 is the Euclidian norm). The values can be described as follows.

 

 

    

 

Press the Run button to see the results. The results will be displayed as a list with the training data first and the test data with their corresponding classified value at the bottom. If you have two dimensional data, a plot will be displayed showing what the space looks like. In the example below, six samples were defined as test samples and they all got classified correctly (yellow cells). The test samples are shown as squares in the plot.