RNAinfo: Splicing signals Genefinding Splice site consensus
ESEs Genome Annotation Alternative Splicing RNA links

RNA Links:
Miscellaneous Links:
RNA Society
Chris Burge's lab
Chris Lee's lab
M. Zhang's lab
The Black lab

RNA Companies:

Genefinding on Yahoo!
Yi Xing's blog
Steve Mount's blog

Steve's Links:
Home Page
email - Steve Mount
Model Organisms
Quick Links
BSCI410 (class)

Splicing Signals

Splicing signals can be divided into those at the splice stes per se and auxiliary or contextual signals, such as exonic splicing enhancers (ESEs), intronic splicing enhancers, exonic splicing silencers (ESSs) and intronic splicing enhancers.

It is well-established that nearly all splice sites conform to consensus sequences . These consensus sequences include nearly invariant dinucleotides at each end of the intron, GT at the 5' end of the intron, and AG at the 3' end of the intron, and generally resemble MAG|GTRAGT at the 5' splice site and CAG|G at the 3' splice site.

The most common class of nonconsensus splice sites consists of 5' splice sites with a GC dinucleotide (Wu and Krainer 1999). GC sites conform extremely well to the standard consensus sequences at other positions. 42 of 44 sites have a consensus G residue at both position -1 and position 5. It is reasonable to assume that GC sites are recognized by the standard (U2-dependent) spliceosome.

The second class of exception to splice site consensus is U12 introns, a minor class of rare introns with splice site sequences that are very different from the standard consensus, but which are very similar to each other (reviewed by Burge et al 1999 and Tarn and Steitz 1997. U12 introns can be identified by highly conserved sequences at the 5' splice site, (RTATCCTY; R = A or G; Y = C or T); and branch site (TCCTRAY). U12 introns are found in many eukaryotes, including Drosophila melanogaster and Arabidopsis, but not C. elegans.

Finally, there are a small number of nonconsensus sites that fit into neither of the two categories mentioned above. Many reports of such variant splice sites can be traced to errors in annotation or interpretation, polymorphic differences between the sources of cDNA and genomic sequence, inclusion of pseudogene sequences, or failure to account for somatic mutation. However, there are many examples of sites that match the consensus very poorly, and experimental work has established that 5' splice sites do not absolutely require GT, and 3' splice sites do not absolutely require AG, to be recognized in vivo.

Splice site prediction
Splice site predictors are available on the web.
I recommend SplicePort.
In addition to splice site prediction, the web site allows you to browse the features that contribute to the strength (or weakness) of any given site. Right now, feature browsing is only available for mammalian sites (using a classifier trained on human data), but you can carry out splice site prediction on Arabidopsis as well.

For high throughput assessment of splice sites I recommend GeneSplicer.

For analysis of other species on the web I recommend NetGene (available through the Center for Biological Sequence Analysis at the Department of Biotechnology, The Technical University of Denmark).

These programs use information in the region flanking a splice site. If you wish to evaluate only the core splice site in order to assess its strength indpendent of additional signals, then I recommend MaxEntScan, which looks at nine nucleotides at the 5' splice site or 23 nucleotides at the 3' splice site.