Post

VastDB 1.0 has been released

From VastDB

Published on 01 September 2017, 12:53 by Transdevo


Our article with the first full release of VastDB, together with an extensive analysis of the data, has been accepted for publication in Genome Research. The accepted manuscript can be found here.


RELEASE NOTES FOR VERSION 1.0:

Sample panel

This database release includes data for the following species and genome assemblies:

  • Chicken (galGal3):
    • Main dataset: 61 samples, grouped in 39 data points.

Features

AS event features

For each AS event, the following features are displayed:

  • General information:
    • Ensembl Gene ID
    • Gene symbol
    • Genomic coordinates of the alternative exon (A) and the flanking upstream (C1) and downstream (C2) constitutive exons.
    • Length of the alternatively spliced sequence.
    • Sequences of the 3' and 5' splice sites of the AS sequence.
    • Strength scores of the 3' and 5' splice sites (calculated according to Yeo and Burge, 2004)
    • Sequences of the alternative sequence (A) and the flanking upstream (C1) and downstream (C2) constitutive exons.
    • Vast-tools internal features (event complexity, mappability confidence).
    • Impact of the AS event in the ORF.
    • Domain overlap for PROSITE and PFAM domains, for the alternative sequence (A) and the flanking upstream (C1) and downstream (C2) constitutive exons.
    • Average intrinsic disorder rate of the protein region encoded by the alternative sequence (A) and the flanking upstream (C1) and downstream (C2) constitutive exons.
    • Protein structure with the alternative sequence (A) and the flanking upstream (C1) and downstream (C2) sequences highlighted. Where possible, structures were retrieved from the Protein Data Bank. Otherwise, they correspond to computational models produced using Phyre2 (Kelley et al., Nat. Prot., 2015)
    • Degree and betweenness centrality of the gene in protein-protein interaction networks (data from Ellis et al., Mol Cell, 2012)
    • Associated events (AS events with overlapping genomic coordinates)
    • Conservation relationships with other species (including genomic coordinates of the orthologous event, PSI pattern description and Event ID, if possible).
    • Suggestions for RT-PCR validation.
  • Genomic context, with screenshot and link to the VastDB track in the UCSC Genome Browser.
  • Inclusion profile across the sample catalogue, with interactive selection of samples and read coverage stringency.
Inclusion values are expressed as PSI (Percentage Spliced In), which is the percentage of transcript molecules for the gene that include the alternative sequence. Sample subgroups (data points) are coloured according to the biological origin of the samples in the subgroup. All PSI values have been quantified using vast-tools. PSI values and quality scores for the individual samples in each subgroup are displayed hovering with the mouse on the corresponding data point.
  • Inclusion profile in additional datasets, as static images.


Gene features

For each gene, the following features are displayed:

  • General information:
    • Biotype (from Ensembl)
    • Gene symbol
    • Gene name
    • Genomic coordinates
    • Genomic assembly
  • Genomic context, with screenshot and link to the VastDB track in the UCSC Genome Browser.
  • Expression profile acrross the sample catalogue, with interactive selection of samples.
Expression values are expressed as cRPKM (corrected Reads Per Kilobase per Million reads). Sample subgroups (data points) are coloured according to the biological origin of the samples in the subgroup. Expression values in cRPKM and raw read counts for individual samples can be seen hovering with the mouse on the corresponding data point.
  • Expression profile in additional datasets, as static images.
  • Summary of AS events in the gene, with two sections:
    • A list of the events, with their length, genomic coordinates.
    • An interactive plotting area to compare the PSI profiles of two or more events.
Events are toggled on/off with the tick boxes on the list, or the quick buttons to select or unselect all events. Samples are toggled on/off using the tissue catalog immediately above the plotting area, or with the quick buttons to select or unselect all samples.
  • Orthologous genes in other VastDB species.