This is Steve Mount's web page for gene annotation and splice site selection. The page was inspired by my review article in the American Journal of Human Genetics (PubMed; Journal)

I provide links on gene annotation (including cDNA alignment), genefinding, splicing signals (including splice site prediction), and specific comments on splice site consensus, ESEs and microexons. These pages, collectively accessible as www.RNAinfo.org, are a bit out of date right now (April, 2005) but under active revision.

Gene annotation incorporates cDNA data (including ESTs); sequence similarity; and computational predictions based on the recognition of probable splice sites and coding regions. The state of the art was recently surveyed by the Gene Annotation Assessment Project (GASP1), the results of which were published in a special issue of Genome Research.

Ensembl - Ensembl is a joint project between EMBL-EBI and the Sanger Centre to develop a software system which produces and maintains automatic annotation on eukaryotic genomes.

Gene Ontology Consortium - The goal of the Gene Ontology Consortium is to produce a dynamic controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

Oak Ridge National Laboratory Computational Biosciences Section - A project whose stated mission is to address fundamental questions in the life sciences and provide information and analytical resources to the wider biology research community.

TIGR Databases - The Institute for Genomic Research TIGR Databases are a collection of curated databases containing DNA and protein sequence, gene expression, cellular role, protein family, and taxonomic data for microbes, plants and humans.

Celera - Celera genomics, a division of PE Corporation, which produced the fruit fly and human genomes. They are currently working on the mouse, and have announced a plan to move into proteomics.

SIM4 - SIM4 is described by Florea et al..

The Intronerator - a collection of tools for exploring the molecular biology and genomics of C. elegans with a special emphasis on alternative splicing. This is specific to C. elegans, and does more than align cDNAs.

A list of selected documented microexons is available. Very small exons, or microexons, pose special problems for gene annotation. They are difficult to recognize using computational genefinding methods, and can even confound the alignment of cDNA and genomic sequences. Furthermore, because microexons are very often the site of alternative splicing, an understanding of how they are recognized (and regulated) is key to understanding gene expression.