Differences between revisions 13 and 14

Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
The `repeats` analysis repeat-masks each sequence in the batch using RepeatMasker with the MIPS repeat dataset (available at ftp://ftp.sgn.cornell.edu/tomato_genome/repeats/mips_repeat_collection_current.tfa.gz) and/or the SGN Tomato UniRepeats database (available at ftp://ftp.sgn.cornell.edu/tomato_genome/repeats/repeats.master.current). The `repeats` analysis repeat-masks each sequence in the batch using RepeatMasker. It makes hard and soft-masked versions of the output of two different RepeatMasker runs.
Line 11: Line 11:
It makes hard and soft-masked versions of the output of two different RepeatMasker runs. One run, the "regular" run, uses only the MIPS repeat set as a repeat library, and includes the `-nolow` option to disable masking of low-complexity repeats. The other run, called "stringent", uses as its repeat library the union of the MIPS repeat set and the SGN UniRepeats set, and does '''not''' include the `-nolow` option. One run, the "regular" run, uses the ftp://ftp.sgn.cornell.edu/tomato_genome/repeats/mipsREdat_8.8_solanaceae_TE.tfa.gz repeat library, and includes the `-nolow` option to disable masking of low-complexity repeats. The other run, called "stringent", uses as its repeat library  ftp://ftp.sgn.cornell.edu/tomato_genome/repeats/mipsREdat_8.8_eudico_TEs.tfa.gz and does '''not''' include the `-nolow` option.

Summary

Tag

repeats

Owner

SGN

Input

the vector-screened sequences, from the seq analysis

External Data

MIPS repeat collection, SGN UniRepeats collection

Output

repeat-masked sequences

The repeats analysis repeat-masks each sequence in the batch using RepeatMasker. It makes hard and soft-masked versions of the output of two different RepeatMasker runs.

One run, the "regular" run, uses the ftp://ftp.sgn.cornell.edu/tomato_genome/repeats/mipsREdat_8.8_solanaceae_TE.tfa.gz repeat library, and includes the -nolow option to disable masking of low-complexity repeats. The other run, called "stringent", uses as its repeat library ftp://ftp.sgn.cornell.edu/tomato_genome/repeats/mipsREdat_8.8_eudico_TEs.tfa.gz and does not include the -nolow option.

It was suggested by Stephane that the less-stringently-masked "regular" output would be more appropriate for input to gene finders.

Input

  • BAC sequences from the seq analysis.

Filenames

masked:fasta

softmasked:fasta

masked_stringent:fasta

softmasked_stringent:fasta

Requirements

  • Input files must be in ITAG standard FASTA format

Processing

  • This analysis runs RepeatMasker on each BAC sequence file, which creates a new FASTA sequence file containing the BAC sequence, with repetitive sequences masked as either N's (for hard-masked) or lower-case (for soft-masked).

Description

Parameters

normal: run as RepeatMasker -q -nolow -gff -lib mips_repeat_set.fasta -xsmall -parallel 2 bac_seq_file stringent: run as RepeatMasker -q -gff -lib mips_and_uni_combined.fasta -xsmall -parallel 2 bac_seq_file

Output

  • repeat-masked sequence, in both hard-masked and soft-masked multi-fasta formats

Filenames

  • masked:fasta
  • softmasked:fasta

Requirements

  • Hard-masked output files are formatted as:
    >AC123456.1  C01HBa0001A01.1
    TAGGAAAACGAAAGTCTAAACATCGTCTCAAAGACATGTAGTACATAAGATTGATTAGGG
    TGTAGCTTGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTCATCCATAAAACATTTGAAAACAGGCTCA
    AAAGGGATATTTTTTATGGTATGCAATGACCAAGTTAAAAAATGACATAAAAGGGCTAGA
     
    Soft-masked output files are formatted as:
    >AC123456.1  C01HBa0001A01.1
    TAGGAAAACGAAAGTCTAAACATCGTCTCAAAGACATGTAGTACATAAGATTGATTAGGG
    TGTAGCTTGTaactgacgctcgatgcacgtaatgatagcgctcgctagcatcgatatgca
    actgactatgccgcgctagcgccgccgcgcgcgctagctacgatgcatcaatatttttta
    ctgactatgcatcgatcgcTTCATCCATAAAACATTTGAAAACAGGCTCAAAGGGATATT
    TTTTATGGTATGCAATGACCAAGTTAAAAAATGACATAAAAGGGCTAGA
     

Comments


CategoryAnalysisDescription CategoryAnSgnRepmask (XXX = toolname minus the number)

AnSgnRepmask000 (last edited 2010-06-28 20:06:08 by RobertBuels)