Contents - Index
Hierarchical agglomerate clustering is the most common method for grouping data. The construction of a hierarchical agglomerative classification can be achieved by the following general algorithm:
1. Find the two closest objects and merge them into a cluster.
2. Find and merge the next two closest points, where a point is either an individual object or a cluster of objects.
3. If more than one cluster remains, return to step 2
Open the Cluster tab among the analysis tabs in the top of the main window, and press the Hierarchical Clustering button to load the analysis into the Control panel.
By pressing the Run button, the clustering is performed with the default settings which is average linkage as clustering method and Euclidean distances as the distance measure. Press the Advanced button to select a different clustering method or distance measure.
The following clustering methods are available.
The distances between objects can also be measured differently. Most common for continuous data, where we measure gene expression in copy number or Cq values, are the following.
The following are used to measure the distances for discrete data.
You can also use the following as distance measurements for both continues and discrete data.
There is a drop-down list under in the Advanced Control panel that lets you decide the orientation of the clustering tree. In the figures below, the default Left was used which means that the tree is branched to the left.
When performing hierarchical agglomerate clustering it is good practice to analyze the data with a few different methods to verify that the main clusters predicted are independent of method used, and also collect experience on what method suits the particular data best. Below, average linkage, complete linkage, and the Ward algorithm all predict three main clusters.
Note that data can be clustered as groups of genes or groups of samples. Genes that form a cluster have similar expression, while samples that are, e.g. negative and positive for a disease, should fall in different groups if proper expression markers are measured. Transpose the data to switch between classification of genes and classification of samples.
The nodes in the cluster trees are clickable, and clicking the nodes will switch the orientation of the branches in a way that will allow you to customize the dendrogram according to your own specific desire.
G.H. Lance and W.T. Williams (1966). A general theory of classificatory sorting strategies. I. Hierarchical Systems. The Computer Journal, 9(4), pp 373-380.
J. H. Ward (1963). Hierarchical grouping to optimize an objective function. Journal of Amer. Statist. Assoc. 58: pp 236-244.