CBCB at UM
Chris Burge's lab
Chris Lee's lab
M. Zhang's lab
The Black lab
Genefinding on Yahoo!
Yi Xing's blog
Steve Mount's blog
email - Steve Mount
Splicing signals can be divided into those at the splice stes per se and auxiliary or contextual signals, such as exonic splicing enhancers (ESEs), intronic splicing enhancers, exonic splicing silencers (ESSs) and intronic splicing enhancers.
It is well-established that nearly all splice sites conform to consensus sequences . These consensus sequences
include nearly invariant dinucleotides at each end of the intron, GT at
the 5' end of the intron, and AG at the 3' end of the intron, and
generally resemble MAG|GTRAGT at the 5' splice site and CAG|G at the 3'
The most common class of nonconsensus splice sites
consists of 5' splice sites with a GC dinucleotide (Wu
and Krainer 1999). GC sites conform extremely well to the standard consensus sequences at other positions. 42 of
44 sites have a consensus G residue at both position -1 and position
5. It is reasonable to assume that GC sites are recognized by the standard
The second class of exception to
splice site consensus is U12 introns, a minor class of rare introns with
splice site sequences that are very different from the standard consensus,
but which are very similar to each other (reviewed by Burge
et al 1999 and Tarn
and Steitz 1997. U12 introns can be identified by highly conserved
sequences at the 5' splice site, (RTATCCTY; R = A or G; Y = C or T); and
branch site (TCCTRAY). U12 introns are found in many eukaryotes, including
Drosophila melanogaster and Arabidopsis, but not
Finally, there are a small number of nonconsensus sites
that fit into neither of the two categories mentioned above. Many reports
of such variant splice sites can be traced to errors in annotation or
interpretation, polymorphic differences between the sources of cDNA and
genomic sequence, inclusion of pseudogene sequences, or failure to account
for somatic mutation. However, there are many examples of sites that match
the consensus very poorly, and experimental work has established that 5'
splice sites do not absolutely require GT, and 3' splice sites do not
absolutely require AG, to be recognized in vivo.
Splice site prediction
Splice site predictors are available on the web.
I recommend SplicePort.
In addition to splice site prediction, the web site allows you to browse the features that contribute to the strength (or weakness) of any given site. Right now, feature browsing is only available for mammalian sites (using a classifier trained on human data), but you can carry out splice site prediction on Arabidopsis as well.
For high throughput assessment of splice sites I recommend GeneSplicer.
For analysis of other species on the web I recommend NetGene (available through the Center for Biological Sequence Analysis at the Department of Biotechnology, The Technical University of Denmark).
These programs use information in the region flanking a splice site. If you wish to evaluate only the core splice site in order to assess its strength indpendent of additional signals, then I recommend MaxEntScan, which looks at nine nucleotides at the 5' splice site or 23 nucleotides at the 3' splice site.