How can I search for an AS event by its VastDB ID?
You can search AS events by ID using the search box in the top menu bar. Alternatively, you can use the search boxes in the home page to search events by gene, or by genomic coordinates.
How is the inclusion level (PSI) of a given AS event quantified?
AS event quantification is performed using vast-tools. vast-tools uses different modules to quantify cassette exons, microexons, alternative 5' and 3' splice sites and intron retention (reflected in the 'vast-tools module' field in the ‘VastDB Features’ section of each event). For detailed information about how the quantification works, please refer to the Supplementary Information of Irimia et al., Cell 2014.
Why are there some events detected in vast-tools, but not included in VastDB?
Not all the events included in the vast-tools library are included in VastDB. This database contains only a selection of AS events displaying a certain level of alternativity, and therefore it can happen that some of your events of interest from vast-tools are not displayed here. For more details, please see the next question.
What AS events are displayed in VastDB?
VastDB displays AS events detected and quantified in vast-tools that show a minimal level of alternative usage. This is defined following Irimia et al., Cell 2014: a given sequence was considered alternatively spliced if its inclusion level (PSI) was 10 ≤ PSI ≤ 90 in at least 10% of the samples with sufficient read coverage, and/or have a range of PSIs ≥ 25 across all samples with sufficient read coverage.
What do the colors and block thickness in the UCSC track mean?
The colors signify the different types of AS events, whereas the block thickness inform about the type of sequence.
- For any individual cassette exon event (including microexons), each C1, A and C2 exons are represented. The alternative exon (A) thus corresponds to the exon in between.
- Blue: simple cassette exon. “Simple” is defined as cassette exons for which ≥95% of the reads used to quantify their PSI come from the three reference exon-exon junctions, which are C1A, AC2 and C1C2. It corresponds to “S” or “MIC_S” in ‘Average complexity’.
- Purple: cassette exon event of intermediate complexity. This is defined as those alternative exons for which ≥50% and ≤95% of the reads used to quantify their PSI come from the three reference exon-exon junctions. Corresponds to “C1” or “C2” in ‘Average complexity’.
- Red: complex cassette exon event, for which <50% of the reads used to quantify their PSI come from the three reference exon-exon junctions. Corresponds to “C3”, “ME” or “MIC_M” in ‘Average complexity’.
- Black: groups multiple neighboring cassette exon events. Black tracks are only informative and do not link to any page in VASTDB.
- For Intron Retention events: Orange track. Thick blocks correspond to the intronic sequence, and the thin blocks to the adjoining exons (C1 and C2).
- For Alternative 3' and 5' splice site choice event: Dark Green and Light Green, respectively. In both cases, thick block corresponds to the alternative sequence, whereas the thin blocks are the constant exonic sequences (C1 and C2). For these events, at least two tracks are shown: for sequence exclusion (the most internal splice site; EventID-1/N) and for sequence inclusion.
How are the splice site scores calculated?
These scores were calculated using score5.pl and score3.pl from Yeo and Burge, 2004 . This method uses a position weight matrix and calculates deviation from the consensus. For 5’ splice sites, three exonic and six intronic positions surrounding the exon-intron junction were analyzed, and for the 3’ splice sites, 20 intronic and 3 exonic positions were analyzed.
How is the ORF impact predicted?
The pipeline to predict ORF impact is described in Irimia et al., 2014. Several things must be kept in mind when using this information as is:
- The prediction is based on the impact that the specific alternative sequence is likely to have when included or excluded from the transcript in isolation. That is, if there are other associated AS events (e.g. mutually exclusive or coordinated exons) the prediction may not be accurate.
- ORF impact descriptions for Alternative Splice Sites (ALTA/ALTD events) are still under development and are thus less reliable. Specifically, only the location of the alternative sequence (CDS, UTRs) and whether or not it causes a frameshift when included/excluded (i.e. if its length is multiple of three) are considered for these predictions at the moment. As with any other dataset in VastDB, use at your own risk.
How should I interpret the domain information?
"Domain information is currently only available for cassette exons."
When an exon (either C1, A or C2) overlap a PROSITE or PFAM domain, it shows the following information:
Dom_ID = Dom_Name = Type_Overlap(%Dom_Overlap = %Exon_Overlap)
The meaning of each field is explained below:
- Dom_ID: Domain ID in either PROSITE or PFAM databases. For PROSITE, domains with ID P0* (high frequency motifs) are excluded.
- Dom_Name: Domain name as provided by PROSITE or PFAM databases.
- Type_Overlap: There are four possible ways in which an exon can overlap a protein domain:
- The whole exonic sequence fully overlaps with a domain (FE, Full Exon).
- The whole domain is fully encoded within an exon (WD, Whole Domain).
- The upstream (5') of the exon overlaps the domain (PU, Partial Upstream).
- The downstream (3') of the exon overlaps the domain (PD, Partial Downstream).
- %Dom_overlap: percent of the domain encode by the exon.
- %Exon_overlap: percent of the exon that overlaps the domain.
How are the primers for RT-PCR validation designed?
Primers are designed automatically using Primer3 (optimal primer lenght = 21 nt; optimal Tm = 61 ºC). As a general rule, primers are located in the C1 and C2 exonic sequences, so two RT-PCR products will be produced: a shorter one (from C1 to C2, skipping the A sequence) and a longer one (including the A sequence). This is provided in ‘Band lengths’.
To minimize PCR amplification bias towards shorter amplicons (i.e. over-representation of the skipping form) and, at the same time, optimize the visualization in agarose gels, primers are designed based on the size relationship between the two predicted amplicons. This is based on the following rules:
- Alternative sequence LE < 15 nt => optimal skipping band size = 100 nt.
- Alternative sequence 15 ≤ LE < 25 nt => optimal skipping band size = 110 nt.
- Alternative sequence 25 ≤ LE < 40 nt => optimal skipping band size = 120 nt.
- Alternative sequence 40 ≤ LE < 65 nt => optimal skipping band size = 140 nt.
- Alternative sequence 65 ≤ LE < 100 nt => optimal skipping band size = 175 nt.
- Alternative sequence 100 ≤ LE < 200 nt => optimal skipping band size = 250 nt.
- Alternative sequence 200 ≤ LE < 300 nt => optimal skipping band size = 300 nt.
- Alternative sequence 300 ≤ LE < 1000 nt => optimal skipping band size = 350 nt.
- Alternative sequence LE > 1000 nt => primers not designed. A three-primer strategy is recommended.
What are the quality scores (QC) in the PSI plots?
As provided by vast-tools; from the README: Quality scores, and number of corrected inclusion and exclusion reads (qual@inc,exc):
- Score 1: Read coverage, based on actual reads (as used in Irimia et al., Cell 2014:
- For EX: OK/LOW/VLOW: (i) ≥20/15/10 actual reads (i.e. before mappability correction) mapping to all exclusion splice junctions, OR (ii) ≥20/15/10 actual reads mapping to one of the two groups of inclusion splice junctions (upstream or downstream the alternative exon), and ≥15/10/5 to the other group of inclusion splice junctions.
- For EX (microexon module): OK/LOW/VLOW: (i) ≥20/15/10 actual reads mapping to the sum of exclusion splice junctions, OR (ii) ≥20/15/10 actual reads mapping to the sum of inclusion splice junctions.
- For INT: OK/LOW/VLOW: (i) ≥20/15/10 actual reads mapping to the sum of skipping splice junctions, OR (ii) ≥20/15/10 actual reads mapping to one of the two inclusion exon-intron junctions (the 5' or 3' of the intron), and ≥15/10/5 to the other inclusion splice junctions.
- For ALTD and ALTA: OK/LOW/VLOW: (i) ≥40/20/10 actual reads mapping to the sum of all splice junctions involved in the specific event.
- For any type of event: SOK: same thresholds as OK, but a total number of reads ≥100.
- For any type of event: N: does not meet the minimum threshold (VLOW).
- Score 2: Read coverage, based on corrected reads (similar values as per Score 1).
- Score 3: Read coverage, based on uncorrected reads mapping only to the reference C1A, AC2 or C1C2 splice junctions (similar values as per Score 1). Always NA for intron retention events.
- Score 4: Imbalance of reads mapping to inclusion splice junctions (only for exon skipping events quantified by the splice site-based or transcript-based modules; For intron retention events, numbers of reads mapping to the upstream exon-intron junction, downstream intron-exon junction, and exon-exon junction in the format A=B=C)
- OK: the ratio between the total number of reads supporting inclusion for splice junctions upstream and downstream the alternative exon is < 2.
- B1: the ratio between the total number of reads supporting inclusion for splice junctions upstream and downstream the alternative exon is > 2 but < 5.
- B2: the ratio between the total number of reads supporting inclusion for splice junctions upstream and downstream the alternative exon is > 5.
- Bl/Bn: low/no read coverage for splice junctions supporting inclusion.
- Score 5: Complexity of the event (only for exon skipping events quantified by the splice site-based or transcript-based modules); For intron retention events, p-value of a binomial test of balance between reads mapping to the upstream and downstream exon-intron junctions, modified by reads mapping to a 200-bp window in the centre of the intron (see Braunschweig et al., 2014).
- S: percent of complex reads (i.e. those inclusion- and exclusion-supporting reads that do not map to the reference C1A, AC2 or C1C2 splice junctions) is < 5%.
- C1: percent of complex reads is > 5% but < 20%.
- C2: percent of complex reads is > 20% but < 50%.
- C3: percent of complex reads is > 50%.
- NA: low coverage event.
- inc,exc: total number of reads, corrected for mappability, supporting inclusion and exclusion.
Where do the protein structures come from and what do the different colors mean?
ENSEMBL protein isoforms including at least one of the C1, A and C2 exons for cassette exon events have been mapped to protein structures from the same gene in the Protein Data Bank using sequence alignment. The best structural match is shown on the database, prioritizing structures containing the A exon.
For cassette exon events with no PDB hits, the structure of the longest ENSEMBL protein isoform was modeled using Phyre2 (Kelley et al. 2015).
Red residues correspond to the A exon of the event, while bright orange corresponds to the C1 exon and pale orange to the C2 exon. The rest of the protein is shown in grey in the case of structures retrieved from the PDB, and in light blue for models.