RNAinfo: Splicing signals Genefinding Splice site consensus
ESEs Genome Annotation Alternative Splicing RNA links

RNA Links:
Miscellaneous Links:
RNA Society
Chris Burge's lab
Chris Lee's lab
M. Zhang's lab
The Black lab

RNA Companies:

Genefinding on Yahoo!
Yi Xing's blog
Steve Mount's blog

Steve's Links:
Home Page
email - Steve Mount
Model Organisms
Quick Links
BSCI410 (class)

Splice Site Consensus (jump to matrices)

It is well-established that nearly all splice sites conform to consensus sequences (matrices). These consensus sequences include nearly invariant dinucleotides at each end of the intron, GT at the 5' end of the intron, and AG at the 3' end of the intron.

Splice site consensus sequences for U2 (major class) introns in pre-mRNA generally conform to the following consensus sequences:
3' splice sites: CAG|G
5' splice sites: MAG|GTRAGT where M is A or C and R is A or G

The most common class of nonconsensus splice sites consists of 5' splice sites with a GC dinucleotide (Wu and Krainer 1999). GC sites conform extremely well to the standard consensus sequences at other positions. 42 of 44 sites have a consensus G residue at both position -1 and position 5, and GC-AG introns have an enhanced match to the 3' splice site consensus (Thanaraj and Clark, 2001; PubMed). It is reasonable to assume that GC sites are recognized by the standard (U2-dependent) spliceosome.

The second class of exception to splice site consensus is U12 introns, a minor class of rare introns with splice site sequences that are very different from the standard consensus, but which are very similar to each other (reviewed by Burge et al 1999 and Tarn and Steitz 1997. U12 introns can be identified by highly conserved sequences at the 5' splice site, (RTATCCTY; R = A or G; Y = C or T); and branch site (TCCTRAY). U12 introns are found in many eukaryotes, including Drosophila melanogaster and Arabidopsis, but not C. elegans.

Finally, there are a small number of nonconsensus sites that fit into neither of the two categories mentioned above. Many reports of such variant splice sites can be traced to errors in annotation or interpretation, polymorphic differences between the sources of cDNA and genomic sequence, inclusion of pseudogene sequences, or failure to account for somatic mutation. However, there are many examples of sites that match the consensus very poorly, including cryptic splice sites which are used when a nearby natural site is inactivated by mutaiton (see, for example, Roca et al. 2003), and experimental work has established that splicing in vivo is possible even without the core dinucleotide GU or AG.

Splice site consensus sequences vary by species. Consensus sequences for humans and flies are reported here. A survey (making the point that species with more introns have weaker splice sites) was published by Irimia et al. (2007).

Splice site predictors are available on the web.
I recommend SplicePort.
In addition to splice site prediction, the web site allows you to browse the features that contribute to the strength (or weakness) of any given site. Right now, feature browsing is only available for mammalian sites (using a classifier trained on human data), but you can carry out splice site prediction on Arabidopsis as well.
For high throughput assessment of splice sites I recommend GeneSplicer. For analysis of other species on the web I recommend NetGene (available through the Center for Biological Sequence Analysis at the Department of Biotechnology, The Technical University of Denmark). These programs use information in the region flanking a splice site. If you wish to evaluate only the core splice site in order to assess its strength indpendent of additional signals, then I recommend MaxEntScan, which looks at nine nucleotides at the 5' splice site or 23 nucleotides at the 3' splice site.