## Study Guide for Bioinformatics I

Define
*computational problem*,
*instance* of a computational problem, and
*algorithm* for a computational problem.
Informally, what is a *greedy* method, and what is a
*branch-and-bound* method?
Consider the following informal problem: find a motif that appears in each
member of a given set of DNA sequences.
Give two distinct formalizations of the problem in terms of optimizing a
specified score over a set of specified objects.
Compare the computational efficiencies of a brute-force optimization for the
two formalizations.
Sketch a greedy method for the same problem.
Informally, what is meant by an *NP-complete* problem?
What is the problem of *sorting by reversals*?
Give a simple greedy algorithm for sorting by reversals and show that
it has no (instance-independent) *performance guarantee*.
(Advanced.) Sketch a method for sorting by reversals that has a (constant)
performance guarantee, and verify that guarantee.
What is an *alignment* between two sequences?
What is the difference between a *global* and a *local* alignment?
What is the general principle behind *dynamic programming* algorithms?
Explain a dynamic progamming algororithm for aligning two DNA sequences.
Be able to apply the dynamic programming algorithm to two sequences (say, of
lengths around 4), given particular scores for all possible alignment columns.
Be able to sketch how *blast* works. Under what conditions will
*psi-blast* outperform *blastp*? Explain
Explain how NP-completeness results and graph algorithms lead to a
formalization of the Sequencing by Hybridization Problem (i.e., reconstructing
a sequence from the set of all its *k*-mers for some *k*)
that can be solved efficiently.
What is a *Hidden Markov Model*, and what is the
*decoding problem* for HMMs. Be able to give precise definitions.
Discuss two examples of bioinformatics problems where HMMs provide a good
approach.