Contents - Index


Reference Genes

 

When comparing different samples it is usually needed to normalize the expression data, which can be done in a number of ways. One can normalize with the sample amount e.g. volume of blood or serum, number of cells, or the number of genomic copies which is becoming more popular. Since the amount of RNA per cell varies a lot, an alternative method is to normalize with the amount of total RNA. However, most RNA is ribosomal RNA and only a few percent is mRNA. Further, none of the normalization procedures above accounts for variation in RNA extraction yield and reverse transcription efficiency. Therefore, most experts in the field recommend to normalize with the expression of reference genes. The first step is to identify which gene or genes to use as references. Experience show there is no such thing as a general reference gene that is suitable for every type of tissue, in every stage of development, and under different conditions such as disease. It is therefore necessary to identify proper reference genes for every new study.

 

The common approach to this problem is to measure the expression of several candidate reference genes in a number of representative samples, and select the gene(s) that show least variation as reference(s) for the particular study. Some companies, like TATAA Biocenter, offer panels of candidate reference genes for different species, making such studies more straight forward. The expressions of the reference gene candidates are then compared by various statistical methods to identify the most suitable reference gene(s) for the particular study. GenEx offer two such methods: geNorm and Normfinder.

 

geNorm calculates and compares the so called M-value of all candidate genes, eliminate the gene with highest M-value, and repeats the process until there is only two genes left. An M-value describes the variation of a gene compared to all other candidate genes. The last pair of candidates remaining is recommended as the optimum pair of reference genes (Vandesompele et al. (2002), Genome Biology 3 0034.1-0034.11). It is assumed that the candidate genes are not co-regulated. 

 

NormFinder is another algorithm that attempts to find the optimum reference genes out of a group of candidate reference genes. It can also, in contrast to geNorm, take information of groupings of samples into account, such as untreated/treatment1/treatment2 or different stains. It calculates both within group variance, describing the stability of the gene expressions within each group, as well as the between group variance, that describes the stability of the gene expressions between the groups. The result is an optimum pair of reference genes. The resulting pair might have compensating expression, so that one gene e.g. is slightly overexpressed in one group, but the other gene is correspondingly underexpressed in the same group (Andersen et al. (2004), Cancer Research 64 5245-5250). 

 

Note: There is a difference between the GenEx algorithm and the NormFinder Microsoft Excel script. In GenEx, it is assumed that the data is on logarithmic scale such as Cq values or fold changes, which is most common among GenEx users. If the data is on linear scale you must indicate this, see Normfinder. The NormFinder Microsoft Excel script, on the other hand, assumes that the data is on linear scale, and by default converts data to logarithmic scale, unless you change the settings.