Pipeline Description
This page describes the dataflow of the distributed pipeline and lists the analyses that are involved and the maintainers of those analyses.
For an overview of the structure and functioning of the central file repository, including validation services and status/control services, see PipelineRepository.
For a step-by-step guide for analysis implementors, see AnalysisImplementationHowTo.
Latest pipeline: Pipeline000
View pipeline running statuses
Initial Steps
Order |
Wiki Page |
Tag |
Who? |
Description |
1 |
- |
Seq.Center |
BAC sequences are uploaded to Genbank, and a genbank accession is obtained. |
|
2 |
- |
Seq.Center |
The BAC is uploaded to SGN. |
|
3 |
- |
SGN |
SGN generates a batch of sequences to annotate. |
|
4 |
seq |
SGN runs vector screens and contamination screens (chloroplast, mitochondrial and human sequences), and does other quality control, such as comparison of in vitro (from FPC data) vs in silico restriction fragment sizes. The actual submission to Genbank will also be quality checked, sequences compared and the presence of the keywords (ITAG and TOMGEN) ensured. |
||
5 |
repeats |
SGN runs RepeatMasker with tomato-derived and other repeat databases. This comes before the other pipeline steps so that some of them have the option of using the repeat-masked BAC sequence. |
Distributed Analyses
Initial
Order |
Wiki Page |
Tag |
Who? |
Description |
6 |
tblastx_mimulus, tblastx_potato |
TBLASTX versus mimulus and potato sequences |
||
|
blastx_ath, blastx_swissprot, blastx_sol, blastx_plants, blastx_uniprot, blastx_pfamb, blastx_sptg |
Ghent |
BLASTF (script from WUR) versus protein data sets |
|
|
blastn_ecoli, blastn_chloro, blastn_mito, blastn_human |
BLASTN nucleotide seqs versus e. coli, chloroplast, mitochondria, and human |
||
|
transcripts_tomato, transcripts_sol |
GenomeThreader BAC sequences versus tomato and solanaceae transcript sequence |
||
|
sgn_unigenes |
GenomeThreader BAC/contig sequences versus all SGN unigene sequences |
||
|
sgn_markers |
GenomeThreader genomic sequences versus SGN marker sequences |
||
|
augustus |
AUGUSTUS gene finder, run in ab-initio mode |
||
|
genemark_ath |
Remy/MIPS |
GeneMark ab-initio gene finder |
|
|
glimmerhmm_ath, glimmerhmm_tomato |
erwin WUR |
GlimmerHMM ab-initio gene finder |
|
|
geneid_ath, geneid_tomato |
francisco/SGN |
GeneID ab-initio gene finder |
|
|
rfam |
RFAM |
||
|
trnascanse |
tRNAscan-SE tRNA finder |
Eugene
Order |
Wiki Page |
Tag |
Who? |
Description |
7 |
eugene |
Eugene gene predictor |
||
|
renaming |
MIPS/Manuel |
Renaming and versioning of Eugene gene models |
Functional
Order |
Wiki Page |
Tag |
Who? |
Description |
8 |
blastp_ath_pep, blastp_rice_pep, blastp_swissprot |
proteins from Eugene predictions functionally annotated with BlastP |
||
|
interpro |
proteins from Eugene predictions functionally annotated with Interpro |
||
|
go |
Assign GO terms to functional annotations |
||
|
targetp, signalp |
SGN |
Functional annotation with TargetP and SignalP |
|
|
rpsblast |
|
||
|
tmhmm |
|
||
|
sgn_loci |
functionally annotate with SGN genes DB |
Integration and Deliverables
Done by SGN. See ITAG pipeline releases
Comments
Infernal Comments
Infernal is likely to have too great a runtime to be tractably integrated into an iterative whole-genome annotation pipeline. The alternative is to either take the Rfam seed sequences and BLAST those against the genome sequence or just use the appropriate specialist software (tRNAscan, snoSCAN). Further comments can be found on the Infernal page. The Sanger Centre provides a perl script (Rfam Scan) to perform an Rfam blast pre-screen of any sequence that will be searched using Infernal. Using this significantly reduces the run time for using Rfam for annotation projects.
