Study Guide for Bioinformatics I, Fall 2008

Overview: There will be two parts for the exam. One deals with phylogenetic reconstruction, and is not discussed here. Below is a list of topics relevant to the other part of the exam.

First, I promised I would ask you to use the dynamic programming algorithm to find an optimal global alignment of two short sequences (say, both of length 4) given particular scores for all possible alignment columns, and show your work. Second, the exam may well ask for short and not overly formal definitions of some terms, such as "computational problem", "instance of a computational problem", "greedy algoritm", "NP-complete problem", "performance guarantee", "alignment", "global alignment", and "local alignment". Finally, you will be asked to write an essay on one problem from a list of topics discussed in class, probably including the following. Thus, it might be a good idea to pick one of the subjects listed and prepare an answer in your less-busy time over the next 3-4 weeks, rather than trying to "cram" for the test at the last minute.

  • Consider the following informal problem: find a motif that appears in each member of a given set of DNA sequences. Give two distinct formalizations of the problem in terms of optimizing a specified score over a set of specified objects. Compare the computational efficiencies of a brute-force optimization for the two formalizations. Finally, sketch a greedy method for the same problem.
  • Sketch how blast and psi-blast work, describe the conditions under which psi-blast outperforms blastp, and explain the reasons for this difference.
  • Explain in detail how compute an optimal alignment of two sequences using space that is proportional to the length of the shorter sequence. Show that your procedure works as advertized.
  • Define the problem of sorting by reversals, and discuss at least two heuristics for its solution, including one with an instance-indendent performance guarantee that you verify. Also, show that the other heuristic has no such performance guarantee.
  • Discuss Hidden Markov Models, how they are analysed, and how they have been used in bioinformatics.
  • Explain how NP-completeness results and graph algorithms lead to a formalization of the Sequencing by Hybridization Problem (i.e., reconstructing a sequence from the set of all its k-mers for some k) that can be solved efficiently.