Contents - Index


PCA

(This feature is only available in GenEx Pro/Enterprise)

 

Theory

We cannot plot more than three genes in a tradition scatter plot, because we have no way to visualize more than three dimensions. For studies based on more than three genes, we must, if we want to account for all of them in the analysis, use methods to collect the multidimensional information in a lower dimensional space, such as 2 and 3 dimensions. One way of doing this is by means of principal components (PC).

 

Principal components are orthogonal vector that define a space of lower dimension that accounts for the maximum variation in the original space. The original space has a higher dimensions and is made up of the gene expression profiles. It is therefore possible to classify samples, as well as genes, in scatter plots based on the principal components, and the scatter plots will reflect most of the information in the original data set of higher dimension. 

 

How to

Open the PCA tab among the analysis tabs in the top of the main window, and press the PCA button to load the analysis into the Control panel.

 

    

 

In the drop-down list in the Control panel, it is possible to select the number of principal components to be calculated. By choosing Auto, GenEx uses statistical indicators to estimate the optimum number of principal components that adequately describe the data. If the Plot principal components check box is ticked, plots showing the scores and loadings will be presented with the result. Press the Run button to see the results.

 

    

 

The result is shown in a plot with all sample scores (or genes' scores if the data set is transposed) plotted against the first two principal components PC1 and PC2. In the figure below, a PCA plot from a study where several gene expressions were measured as a function of time after a perturbation. The data in the scatter plot are not clustered, but rather reveals continuous changes. 

 

    

 

After analysis, some additional features appear in the Control panel. There are radio buttons for Scores and Loadings, and depending on which one of them is selected, different buttons are appear below. 

 

    

 

The Labels button toggles between showing and hiding the sample/gene names. You can alos hold the mouse pointer over a single symbol to reveal the sample/gene name. The symbols and colors in the scatter plots are as specified in the Data manager under the Colors & Symbols tab. 

 

You can easily get a PCA plot showing the genes' scores without transposing the data (or samples' scores if the data set is already transposed). Select the Loadings radio button and press the Run button. The scatter plot reveals three tight clusters of genes colored in blue, red and green, respectively.

 

    

 

Pressing the View Scores button shows the scores of the principal components in a table. These reflect the importance of the samples (genes if data set is transposed) in defining the principal components.

 

    

 

To view the loadings, select the Loadings radio button and press the Show Loadings that appears below. The loadings reflect the importance of the genes (samples if data set is transposed) in defining the principal components.

 

    

 

Pressing the Reconstruction button shows the reconstruction of measured data. The measured data is the blue line and the reconstructed data the green, and the more they overlap the better the reconstruction of the data is. Reconstruction improves with increasing number of principal components used, which can be set manually in the No. of components drop-down list.

     

    

 

The amount of information in the original data that is accounted for in a space of lower dimension is contained in the eigenvalues. Pressing the View Eigenvalues shows the eigenvalues as well as the accounted variation in two tables. In this particular example, 61 % of the variation in the data is accounted for by PC1, 90.45 % by PC1 and PC2, 96.82 % by PC1 and PC2 and PC3, etc. 

 

    

 

Different PCA plots can be shown by plotting the data against different principal components. Select the principle components in the two upper drop-down lists and press the 2D plot button next to them. If at least three PC's have been calculated GenEx offers to show 3-dimensional scatter plots. Select which of the principal components you want to plot the data against and press the 3D plot button. A PC1 vs. PC2. vs. PC3 plot of the genes in this study is shown below

 

    

 

For projects that contain multiple data files, files are analyzed simultaneously by augmented principal component analysis. Results are shown for the data file selected in the drop-down list at the bottom right.