Summary

Tag

human_readable_description

Owner

Kathrin Klee <klee AT SPAMFREE mpipz DOT mpg DOT de>

Input

Protein sequence fasta files from tag "renaming"
BlastP results from Swissprot, TAIR, trEMBL
InterProScan results
Gene ontology terms

Output

fasta file format with human readable descriptions

Processing

Tool to assign human readable descriptions: AHRD (Automatic assignment of Human Readable Descriptions)

AHRD is a tool to assign human readable descriptions to uncharacterized protein sequences.
The tool therefor is working with the BlastP results (Swissprot, TAIR, trEMBL), the InterProScan results and the predicted gene ontology terms.

The top 200 blast results (based on e-values) from each database (Swissprot, TAIR, trEMBL) are chosen. A scoring algorithm based on a lexical scoring of individual "words" and on predicted GO terms gives a ranking for all description lines. The best scoring description line is selected. The domain names are extracted from InterProScan results and appended to the description line.

In the end a description line is selected for each protein sequence that:

Input

Output

Output format: fasta

Within the description line of each protein there is an evaluation of the found human readable description.
Here an example and how to interpret it:

>Solyc00g005910.1.1 Endonuclease/exonuclease/phosphatase (AHRD V1 *-*- A2Q500_MEDTR); contains Interpro domain(s)  IPR015706  RNA-directed DNA polymerase (reverse transcriptase), related 

The evaluation can be found within the brackets: (AHRD V1 *-*- A2Q500_MEDTR)
Interpretation:

Significance of HRD [*-*-]:

Character

Criteria

Criteria fulfilled

Criteria not fulfilled

1

Bit score of the blast result is >50 and e-value is <e-10

*

-

2

Overlap of the blast result is >60%

*

-

3

Top token score of assigned HRD is >0.5

*

-

4

Gene ontology terms found in description line

*

-

AnHumanReadableDescription000 (last edited 2010-12-07 10:51:47 by KathrinKlee)