Summary
Tag |
transcripts_tomato/transcripts_sol |
Owner |
CAB Group |
Input |
BAC sequences from seq |
Output |
GenomeThreader based analysis of tomato/other solanaceae ESTs, in GFF3 format |
This analysis runs the GenomeThreader program versus the BAC sequences, reporting the GenomeThreader output into GFF3.
Input
the BAC sequence FASTA files produced by AnSgnChecks000
Requirements
- files must be in ITAG standard FASTA format
Processing
GenomeThreader is used to create splice-alignments of each EST versus the S.lycopersicum BAC sequences that are made available in the GFF3 format.
Parameter settings
- We filtered out alignments with 98% identity and 95% coverage for both tomato species and other solanaceae (as announced by e-mail on the 22 Jan 2007 to sol-bioinformatics, itag and to the sol-steering).
GFF3 release
The GFF3 format is validated by the online GMOD service available at: http://www.gmod.org/gff3.[[BR]] The BAC GenBank id was used for naming each GFF3 file, as reported in the following examples:
BACGenBankid.transcripts_tomato.itag000.batch001.v1.gff3
BACGenBankid.transcripts_sol.itag000.batch001.v1.gff3
for the itag pipeline 000 and for the bach file 001.
Filenames
- Two GFF3 files for each BAC sequence in the submission, respectively described spliced alignments of tomato and other Solanaceae ESTs, named as:
<acc>.<ver>.transcripts_tomato.spliced_alignment.itag<pipever>.batch<batchnum>.v<ver>.gff3
<acc>.<ver>.transcripts_sol.spliced_alignment.itag<pipever>.batch<batchnum>.v<ver>.gff3
Details of EST DATA SOURCE AND CAB PROCESSING
EST sequences are downloaded from the dbEST division of GenBank (current release is updated to October 2008). Up to now, sequences from different sources are available:
- TOMATO:
SOLLC = Solanum lycopersicum;
SOLHA = Solanum habrochaites;
SOLPN = Solanum pennellii;
SOLLP = Solanum lycopersicum X Solanum pimpinellifolium;
- OTHER_SOLANACEAE:
SOLTU = Solanum tuberosum;
SOLCH = Solanum chacoense;
CAPAN = Capsicum annuum;
CAPCH = Capsicum chinense;
TOBAC = Nicotiana tabacum;
NICBE = Nicotiana benthamiana;
NICSY = Nicotiana sylvestris;
NICAT = Nicotiana attenuata;
NICLS = Nicotiana langsdorffii x Nicotiana sandera;
PETHY = Petunia x hybrida;
- OTHER_RELATED_SPECIES (RUBIACEAE):
COFCA = Coffea canephora;
COFAR = Coffea arabica;
- Vector contaminations are trimmed using RepeatMasker for the detection and the masking of vectors using the NCBI's Vector database (update October 2008).
- Low complexity sub-sequences and simple repeats are masked using RepeatMasker Vector cleaned and masked ESTs for each dataset are spliced-aligned versus the S.lycopersicum BAC sequences.
Current BAC uploading and annotation at CAB
EST to BAC alignments are released to the SGN repository as requested according to the BAC BATCH files.
CAB releases at http://biosrv.cab.unina.it/GBrowse/ all S.lycopersicum BAC sequences annotated.
Currently, we update automatically the annotation of each BAC at each BAC released into the GenBank database.
On 22.01.07, 123 BACS are annotated.
On 03.05.07, 129 BACs are annotated.
On 15.10.07, 381 BACs are annotated.
On 21.10.07, 493 BACs are annotated.
On 22.01.08, 586 BACs are annotated.
On 29.10.09, 1307 BACs are annotated.
