How To Implement a Pipeline Analysis

  1. Create an account on this wiki if you don't already have one.
  2. Make sure your name and organization is listed on the AboutItag page.

  3. If you have not already done so, email RobertBuels to arrange access to the PipelineRepository. In your email, choose a login name beginning with itag (example: itagmips).

  4. Choose an analysis tag, which is a string containing ASCII letters, numbers and underscores that provides a short name for your analysis. This string will be used in the filesystem, in status displays, etc. For example, fgenesh_tomato is the tag for the analysis that runs FGENESH that has been trained for tomato. For more examples, see Pipeline000.

  5. Make sure your analysis is listed on Pipeline000, and make a wiki page for it similar to the other analyses.

  6. Create and upload an analysis definition file for your new analysis in the central repository for the current version of the pipeline ( Pipeline000 ) An example file: itag/pipeXXX/analysis_defs/interpro.def.txt. The format of the analysis definition file is specified in PipelineRepository, Analysis Definition Files. This file specifies what files your analysis will produce, and which other analyses your analysis depends on for its input.

  7. Within 5 minutes of your creating the definition file, the pipeline management code will automatically create your analysis's directory for you and give you write permission for it. If the automatic creation does not work, email RobertBuels for a fix.

  8. Implement your analysis on your computer. Roughly, your analysis would probably do these things, in order:
    1. check the PipelineStatusWebService periodically until it reports that your analysis is ready. This means that all of your analysis's inputs (as you defined them in the definition file) are available.

    2. download your analysis's input files to a temporary directory on your computer (see ITAG email list for how to set up automatic authentication)
    3. Upload a control.txt file saying that your analysis is running (see PipelineRepository, Control Files)

    4. Run your analysis and format your output files, including correct filenames (see PipelineGeneral, File Naming)

    5. Upload the analysis result files to the correct directory in the repository (see PipelineRepository, Directory Structure)

    6. Generate and upload a manifest.txt and/or an md5sums.txt file (see PipelineRepository, Validation)

    7. Remove the running advisory from your analysis's control file, either by deleting the file or overwriting it with a new one that does not contain running.

  9. Test that the PipelineStatusWebService and human-readable pipeline status page are reporting the correct running status for your analysis based on what files you have uploaded and the contents of these files. Correct conditions for each running status are specified at PipelineStatusWebService, Data Format. If it does not seem to be reporting your analysis's status correctly, email RobertBuels and he will fix it.

  10. Tighten your server-side validation. Email RobertBuels with either a rigorous specification of the expected output of your analysis or a standalone perl script meant to be run to check its output files. He will integrate more rigorous checking of your output files into the pipeline management code.

AnalysisImplementationHowTo (last edited 2008-12-15 18:19:42 by RobertBuels)