Self Organizing Map
(This feature is only available in GenEx Pro/Enterprise)
Theory
The basic idea behind a self-organizing map (SOM) is to setup a structure of interconnected processing units ("neurons") that compete for the signal.

The input is either genes or samples. If genes are classified, the input vectors are the expressions of the genes in the samples. If there are n samples, each input vector has elements:
Xgene = (x1, x2, x3 ,..., xn)
In the network, each node has a specific position (i, j)-coordinate and contains a vector of weights of the same dimension as the input vectors.
Wij = (w1, w2, w3 ,..., wn)
A SOM does not need a target output to be specified unlike many other types of networks. Instead, where the node weights match the input vector, the area of the lattice is selectively optimized to more closely resemble the data for the class that the input vector is a member of. From an initial distribution of random weights, and over many iterations, the SOM eventually settles into a map of stable zones. The zones are effective feature classifiers. Any new, previously unseen input vectors presented to the network will stimulate nodes in the zone with similar weight vectors.
Training the SOM to model the training data as correctly as possible occurs in several steps and over many iterations:
1. Each node's weights are initialized.
2. A vector is chosen at random from the set of training data and presented to the lattice.
3. Every node is examined to calculate which one's weight is most like the input vector's. The winning node is commonly known as the Best Matching Unit (BMU).
4. The neighbors to BMU are now identified. This is a value that starts large, typically set to the 'radius' of the lattice, but diminishes each iteration.
5. Each neighboring node's weights are adjusted to make them more like the input vector. The closer a node is to the BMU, the more its weights get altered.
6. Repeat from step 2 a fixed number of iterations.
The range of the neighborhood (step 4) as well as the amount of adjustment (step 5) decreases during the training from initial values set by the user. This ensures that there are coarse adjustments in the first phase of the training, while fine tuning occurs during the end of the training.
Because of the many random events in the training of SOM, every SOM will be different even for the same data, but the features and classification potential of the SOMs are typically preserved.
How to
Open the Networks tab among the analysis tabs in the top of the main window, and press the Kohonen Self Organizing Maps button to load the analysis into the Control panel.

The X size and Y size text boxes defines the number of columns and rows in the map respectively. Each zone belongs to one row and column, and it represents a node. A good size to start with is with a square map (X size = Y size) with a side that equals the number of training samples. In essence a SOM can have much larger number of cells than there are variables, which will create a sparse SOM with empty regions between the classified objects. Alternative, a SOM with very few cells can be used, which will force objects to share cells thereby making groups.
The No. of Neighbors define how many of the neighboring node will be affected in the learning process when the winning node is updated. The more neighbors, the more the map will be affected in each iteration. The initial number of neighbors should typically be the same as the side of the SOM. a is the learning rate and should be between 0 and 1. A lower learning rate allows the network to converge more rapidly; but the chances of a non-optimal solution are larger. A higher value of alpha reduces the risk to find local minima, but on the expense of training rate. a = 0.4 is usually a good first choice. The No. of steps (iterations) depends on the number of training samples, the expected resolution and how smooth the surface needs to be. Often a few hundred steps are sufficient, but it depends on the data size.

You can deside which way you want to present the data in by ticking the check boxes in the Control panel. View grid lets you see the samples distributed in the different zones in a figure. The coloring of the samples is defined in the Data manager under the Colors & Symbols tab, though the symbol is always a circle. There are SOMs obtained for the same data below. Only training data are used to generate the SOM model. Once the model is built or loaded (see below), both training and test data are placed in the SOM.



A good SOM should be saved for future classification of test data by pressing the Save Network button in the Control panel. To use a network on test data, press the Load network button, select the wanted network and press Run. The result is displayed as a grid figure. It can also be loaded for further training when more data are available, and shared with other GenEx users.
If the Real-time view check box is ticked, you get to see the distribution of the samples in the map change during the learning phase. It is advised to turn off this feature when large data sets are used to train the SOM. Tick the Show labels check box if you want the sample/gene names to be visible in the grid.
Warning: If you train a successful SOM, don't forget to save it! You will never get the exact same map back in a new training.
References
A scholorpedia article on Kohonen network