Summary
Tag |
human_readable_description |
Owner |
Kathrin Klee <klee AT SPAMFREE mpipz DOT mpg DOT de> |
Input |
Protein sequence fasta files from tag "renaming" |
Output |
fasta file format with human readable descriptions |
Processing
Tool to assign human readable descriptions: AHRD (Automatic assignment of Human Readable Descriptions)
AHRD is a tool to assign human readable descriptions to uncharacterized protein sequences.
The tool therefor is working with the BlastP results (Swissprot, TAIR, trEMBL), the InterProScan results and the predicted gene ontology terms.
The top 200 blast results (based on e-values) from each database (Swissprot, TAIR, trEMBL) are chosen. A scoring algorithm based on a lexical scoring of individual "words" and on predicted GO terms gives a ranking for all description lines. The best scoring description line is selected. The domain names are extracted from InterProScan results and appended to the description line.
In the end a description line is selected for each protein sequence that:
- comes from a high-scoring BLAST match
- contains words occurring frequently in the descriptions of highest scoring BLAST matches
- does not contain meaningless "fill words"
- contains words also occurring in any GO terms assigned to the query protein
Input
- Protein sequences
- BlastP results from Swissprot, TAIR, trEMBL
InterProScan results
- Gene ontology terms
Output
Output format: fasta
Within the description line of each protein there is an evaluation of the found human readable description.
Here an example and how to interpret it:
>Solyc00g005910.1.1 Endonuclease/exonuclease/phosphatase (AHRD V1 *-*- A2Q500_MEDTR); contains Interpro domain(s) IPR015706 RNA-directed DNA polymerase (reverse transcriptase), related
The evaluation can be found within the brackets: (AHRD V1 *-*- A2Q500_MEDTR)
Interpretation:
[AHRD V1]: Version of the tool (AHRD) to assigned human readable descriptions
[*-*-]: Significance of HRD (interpretation see below)
[A2Q500_MEDTR]: Protein Identifier from where HRD is transferred
Significance of HRD [*-*-]:
Character |
Criteria |
Criteria fulfilled |
Criteria not fulfilled |
|||
1 |
Bit score of the blast result is >50 and e-value is <e-10 |
* |
- |
|||
2 |
Overlap of the blast result is >60% |
* |
- |
|||
3 |
Top token score of assigned HRD is >0.5 |
* |
- |
|||
4 |
Gene ontology terms found in description line |
* |
- |
|||
