Summary
Tag |
repeats |
Owner |
SGN |
Input |
the vector-screened sequences, from the seq analysis |
External Data |
MIPS repeat collection, SGN UniRepeats collection |
Output |
repeat-masked sequences |
The repeats analysis repeat-masks each sequence in the batch using RepeatMasker. It makes hard and soft-masked versions of the output of two different RepeatMasker runs.
One run, the "regular" run, uses the ftp://ftp.sgn.cornell.edu/tomato_genome/repeats/mipsREdat_8.8_solanaceae_TE.tfa.gz repeat library, and includes the -nolow option to disable masking of low-complexity repeats. The other run, called "stringent", uses as its repeat library ftp://ftp.sgn.cornell.edu/tomato_genome/repeats/mipsREdat_8.8_eudico_TEs.tfa.gz and does not include the -nolow option.
It was suggested by Stephane that the less-stringently-masked "regular" output would be more appropriate for input to gene finders.
Input
BAC sequences from the seq analysis.
Filenames
masked:fasta |
softmasked:fasta |
masked_stringent:fasta |
softmasked_stringent:fasta |
Requirements
- Input files must be in ITAG standard FASTA format
Processing
This analysis runs RepeatMasker on each BAC sequence file, which creates a new FASTA sequence file containing the BAC sequence, with repetitive sequences masked as either N's (for hard-masked) or lower-case (for soft-masked).
Description
RepeatMasker version open-3.1.5
Parameters
normal: run as RepeatMasker -q -nolow -gff -lib mips_repeat_set.fasta -xsmall -parallel 2 bac_seq_file stringent: run as RepeatMasker -q -gff -lib mips_and_uni_combined.fasta -xsmall -parallel 2 bac_seq_file
Output
- repeat-masked sequence, in both hard-masked and soft-masked multi-fasta formats
Filenames
- masked:fasta
- softmasked:fasta
Requirements
- Hard-masked output files are formatted as:
>AC123456.1 C01HBa0001A01.1 TAGGAAAACGAAAGTCTAAACATCGTCTCAAAGACATGTAGTACATAAGATTGATTAGGG TGTAGCTTGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTCATCCATAAAACATTTGAAAACAGGCTCA AAAGGGATATTTTTTATGGTATGCAATGACCAAGTTAAAAAATGACATAAAAGGGCTAGA
Soft-masked output files are formatted as:>AC123456.1 C01HBa0001A01.1 TAGGAAAACGAAAGTCTAAACATCGTCTCAAAGACATGTAGTACATAAGATTGATTAGGG TGTAGCTTGTaactgacgctcgatgcacgtaatgatagcgctcgctagcatcgatatgca actgactatgccgcgctagcgccgccgcgcgcgctagctacgatgcatcaatatttttta ctgactatgcatcgatcgcTTCATCCATAAAACATTTGAAAACAGGCTCAAAGGGATATT TTTTATGGTATGCAATGACCAAGTTAAAAAATGACATAAAAGGGCTAGA
Comments
CategoryAnalysisDescription CategoryAnSgnRepmask (XXX = toolname minus the number)
