Homework Assignment #3


The following step will help you learn about Galaxy tools for "next generation" sequence data.

  1. Obtain a Galaxy login name (rather, email address), and perform the following steps, which can be saved as a history item with name, say, "assignment 3".
  2. Upload the file of Illumina reads at:

    http://www.bx.psu.edu/~webb/reads.fastqsanger

    You must tell Galaxy that the file's format is "fastqsanger".

  3. Upload the FastA-format sequence file at:

    http://www.bx.psu.edu/~webb/chrM.fa

  4. Map the reads onto chrM using BWA.
  5. Select the reads that map uniquely (namely, have the string "XT:A:U").
  6. Convert SAM -> BAM.
  7. Generate pileup from BAM dataset.
  8. Filter pileup (under SAM Tools), where:
    "Only report variants?" -> Yes
    "Convert coordinates to intevals?" -> No
    "Print total number of differences?" -> Yes
    "Print quality and base string?" -> No
  9. Select positions where the reads have at least 3 differences from chrM.fa. Namely, use "Filter and Sort" -> Filter, with the condition "c10 > 2".
  10. Save your history by selecting "Create New" under "Options" (extreme upper right), then selecting "Saved Histories" under "Options", and change the name of the new history item.
  11. Turn the set of commands into a workflow; use "Extract workflow" under "Options" (top right). Give informative names, such as "reads" and "reference", to the two input sets in the workflow. This can be done by editing the workflow and clicking on the boxes for those operations. The purpose is so that the two kinds of input can be then be distinguished in the Step 14.
  12. Clone the workflow, and edit the next-to-last step ("Filter pileup") so that it will not report positions with coverage lower than 8.
  13. Get the file "reads.fq" (format is fastqsanger) from http://www.bx.psu.edu/~webb/.
  14. Run the modified workflow on that set of reads and the chrM.fa reference.
  15. Write (in a plain text file) a short paragraph describing the results of that computation and explain what you see. Send your report by email to Qingyu Wang (qzw102@psu.edu) by noon on Monday, September 13.