Pipeline Description

This page describes the dataflow of the distributed pipeline and lists the analyses that are involved and the maintainers of those analyses.

For an overview of the structure and functioning of the central file repository, including validation services and status/control services, see PipelineRepository.

For a step-by-step guide for analysis implementors, see AnalysisImplementationHowTo.

Latest pipeline: Pipeline000

View pipeline running statuses

Initial Steps

Order

Wiki Page

Tag

Who?

Description

1

AnUploadGenbank000

-

Seq.Center

BAC sequences are uploaded to Genbank, and a genbank accession is obtained.

2

AnUploadSgn000

-

Seq.Center

The BAC is uploaded to SGN.

3

BatchFormat000

-

SGN

SGN generates a batch of sequences to annotate.

4

AnSgnChecks000

seq

SGN

SGN runs vector screens and contamination screens (chloroplast, mitochondrial and human sequences), and does other quality control, such as comparison of in vitro (from FPC data) vs in silico restriction fragment sizes. The actual submission to Genbank will also be quality checked, sequences compared and the presence of the keywords (ITAG and TOMGEN) ensured.

5

AnSgnRepmask000

repeats

SGN

SGN runs RepeatMasker with tomato-derived and other repeat databases. This comes before the other pipeline steps so that some of them have the option of using the repeat-masked BAC sequence.

Distributed Analyses

Initial

Order

Wiki Page

Tag

Who?

Description

6

AnTBlastX000

tblastx_mimulus, tblastx_potato

PSB Ghent

TBLASTX versus mimulus and potato sequences

AnBlastX000

blastx_ath, blastx_swissprot, blastx_sol, blastx_plants, blastx_uniprot, blastx_pfamb, blastx_sptg

Ghent

BLASTF (script from WUR) versus protein data sets

AnBlastN000

blastn_ecoli, blastn_chloro, blastn_mito, blastn_human

SGN

BLASTN nucleotide seqs versus e. coli, chloroplast, mitochondria, and human

AnEST000

transcripts_tomato, transcripts_sol

CAB Napoli

GenomeThreader BAC sequences versus tomato and solanaceae transcript sequence

AnSgnUnigenes000

sgn_unigenes

SGN

GenomeThreader BAC/contig sequences versus all SGN unigene sequences

AnSgnMarkers000

sgn_markers

SGN

GenomeThreader genomic sequences versus SGN marker sequences

AnAUGUSTUS000

augustus

SGN

AUGUSTUS gene finder, run in ab-initio mode

AnGeneMark000

genemark_ath

Remy/MIPS

GeneMark ab-initio gene finder

AnGlimmerHMM000

glimmerhmm_ath, glimmerhmm_tomato

erwin WUR

GlimmerHMM ab-initio gene finder

AnGeneID000

geneid_ath, geneid_tomato

francisco/SGN

GeneID ab-initio gene finder

AnRfam000

rfam

Imperial

RFAM

AnTRNAScanSE000

trnascanse

SGN

tRNAscan-SE tRNA finder

AnRnaseq454000

rnaseq_454

Imperial

Mapping 454 transcript reads

Eugene

Order

Wiki Page

Tag

Who?

Description

7

AnEugene000

eugene

PSB Ghent

Eugene gene predictor

AnRename000

renaming

MIPS/Manuel

Renaming and versioning of Eugene gene models

Functional

Order

Wiki Page

Tag

Who?

Description

8

AnBlastP000

blastp_ath_pep, blastp_rice_pep, blastp_swissprot

India

proteins from Eugene predictions functionally annotated with BlastP

AnInterpro000

interpro

Imperial

proteins from Eugene predictions functionally annotated with Interpro

AnGo000

go

MPIZ

Assign GO terms to functional annotations

AnHumanReadableDescription000

human_readable_description

MPIZ

Assign human readable descriptions to annotations

AnTargetSignalP000

targetp, signalp

SGN

Functional annotation with TargetP and SignalP

AnRPSBlast000

rpsblast

MPIZ

AnTmHmm000

tmhmm

SGN

AnSGNLoci000

sgn_loci

SGN

functionally annotate with SGN genes DB

Integration and Deliverables

Done by SGN. See ITAG pipeline releases

Comments

Infernal Comments


Pipeline000 (last edited 2010-12-07 10:12:49 by KathrinKlee)