Study Guide for Bioinformatics I

  • Define computational problem, instance of a computational problem, and algorithm for a computational problem.
  • Informally, what is a greedy method, and what is a branch-and-bound method?
  • Consider the following informal problem: find a motif that appears in each member of a given set of DNA sequences. Give two distinct formalizations of the problem in terms of optimizing a specified score over a set of specified objects. Compare the computational efficiencies of a brute-force optimization for the two formalizations. Sketch a greedy method for the same problem.
  • Informally, what is meant by an NP-complete problem?
  • What is the problem of sorting by reversals?
  • Give a simple greedy algorithm for sorting by reversals and show that it has no (instance-independent) performance guarantee.
  • (Advanced.) Sketch a method for sorting by reversals that has a (constant) performance guarantee, and verify that guarantee.
  • What is an alignment between two sequences? What is the difference between a global and a local alignment?
  • What is the general principle behind dynamic programming algorithms? Explain a dynamic progamming algororithm for aligning two DNA sequences.
  • Be able to apply the dynamic programming algorithm to two sequences (say, of lengths around 4), given particular scores for all possible alignment columns.
  • Be able to sketch how blast works. Under what conditions will psi-blast outperform blastp? Explain
  • Explain how NP-completeness results and graph algorithms lead to a formalization of the Sequencing by Hybridization Problem (i.e., reconstructing a sequence from the set of all its k-mers for some k) that can be solved efficiently.
  • What is a Hidden Markov Model, and what is the decoding problem for HMMs. Be able to give precise definitions. Discuss two examples of bioinformatics problems where HMMs provide a good approach.