## Study Guide for Bioinformatics I, Fall 2008

*Overview:*
There will be two parts for the exam. One deals with phylogenetic
reconstruction, and is not discussed here. Below is a list of topics relevant
to the other part of the exam.

First, I promised I would ask you to
use the dynamic programming algorithm to find an optimal global alignment
of two short sequences (say, both of length 4) given particular scores
for all possible alignment
columns, and show your work. Second, the exam may well ask for
short and not overly formal definitions of some terms, such as
"computational problem",
"instance of a computational problem",
"greedy algoritm",
"NP-complete problem",
"performance guarantee",
"alignment",
"global alignment", and
"local alignment".
Finally, you will be asked to write an essay on one problem from a list of
topics discussed in class, probably including the following.
Thus, it might be a good idea to pick one of the subjects listed and prepare an
answer in your less-busy time over the next 3-4 weeks, rather than trying to
"cram" for the test at the last minute.

Consider the following informal problem: find a motif that appears in
each member of a given set of DNA sequences.
Give two distinct formalizations of the problem in terms of optimizing a
specified score over a set of specified objects.
Compare the computational efficiencies of a brute-force optimization for the
two formalizations. Finally, sketch a greedy method for the same problem.
Sketch how *blast* and *psi-blast* work, describe the
conditions under which *psi-blast* outperforms *blastp*, and
explain the reasons for this difference.
Explain in detail how compute an optimal alignment of two sequences using
space that is proportional to the length of the shorter sequence.
Show that your procedure works as advertized.
Define the problem of sorting by reversals, and discuss at least two heuristics for its solution, including one with an instance-indendent performance
guarantee that you verify. Also, show that the other heuristic has no such
performance guarantee.
Discuss Hidden Markov Models, how they are analysed, and how they have been
used in bioinformatics.
Explain how NP-completeness results and graph algorithms lead to a
formalization of the Sequencing by Hybridization Problem (i.e., reconstructing
a sequence from the set of all its *k*-mers for some *k*)
that can be solved efficiently.