This page contains minutes from meetings of the International Tomato Annotation Group.
Contents
March 29, 2007, VRVS virtual meeting
Minutes by Lukas Mueller.
The VRVS discussion revolved mainly around the mechanics of the pipeline, which seems to be almost completely hammered out by now by most participants. One of the limiting factors was that Eugene has not been run on the first batch of BACs, but it was decided that this will be done as an action item in the coming couple of weeks. The other line of discussion was the training dataset for gene finders. Again, Ghent is taking the lead there and producing a set that will have to be hand-verified by project members. Ghent will send more information when it is available.
The next meeting was scheduled for April 12, at 8am EST (which is an hour later than the last meeting).
April 12, 2007, VRVS virtual meeting
Status Reports
- Rob: The ITAG pipeline management software has been updated to include rigorous GFF3 file format validation, and some improvements to the pipeline status viewer page (order of analyses roughly corresponds to dependencies).
- Stephane: Eugene analysis is now implemented in the pipeline, uploading GFF3. Next week, will start also uploading multi-fasta files with the CDS and mRNA sequences.
- Remy: Genemark analysis. Waiting for the trained version.
Training Set
- enough data (BACs) should be available at this point. Ghent to generate a training set, will be verified in collaborative project by everybody.
Actions for next meeting
- Improve training set - Ghent to take the lead and send out annotations to verify.
- Prepare batch with all available bacs that can be sent to the pipeline (need Genbank id etc), will be used for training set.
- other groups to participate in manual validation of annotations!
- upload protein fasta and cds fasta from Eugene annotations
- finish batch 0 through the entire pipeline (finish protein steps).
- include a validation score with the gene annotation (abnormal length - too short, too long?). Ratio between protein length compared to best match in Arabidopsis.
- tRNAScan to be implemented by SGN
- not with high priority - BLASTIF to be implemented by SGN/Ghent/India.
- Erwin to produce a unirepeat set that is filtered for proteins.
- produce GFF with masked regions or softmasked (case in fasta file). Softmasking preferred. Both will be produced. An additional soft masked file will be made available.
- sequences with X in them should not be included in the batches. They need to be re-uploaded by the projects and obtain a new version number.
December 5, 2007, Pow Wow Now phone conference
- Stephane:
Training update - EuGene is still performing at 60% accuracy Francesco has trained GeneID and will send the data to Stephane
- Lukas and Stephane:
fgenesh not worth the cost of training given how many other ab initio predictors will be available.
- Stephane and Daniel:
Can we have some gene models before the 14th? Stephane and Jeffery will try and have the predications by Friday so that the interpro can be run over the weekend.
- Lukas and Stephane:
Lets have another conference call on Monday just to check that things are fine.
All iTAG results will be put on the SGN and MIPS tom genome browsers in time for the 14th SGN browser will change it's tracks to itag tracks and hide other ab initio predictions by default just showing the eugene consesnus prediction
Gene IDs will be sequentially added and not based on the position on the sequence. With each new release new sequential numbers are used and no number is ever reused.
Ensembl
We should colaborate with them if they are going to do plants. we will lose visibility unless ensembl poperly integrates our data and makes itag visible. The plan is to arrange a phone conference with the ensembl group on the 13th.
