Some Concepts from Chapters 4 and 5
Bioinformatics typically deals with combinatorial
(also called discrete) problems, rather than continuous
problems as in calculus.
For each instance of a problem, the set of possible solutions is finite
(though frequently exponentially larger than the size of the data for that
Given an informally stated computational problem, such as that of finding
the best motif that is shared by a set of given DNA sequences, we can (try to)
give a precise formulation of the problem, which frequently takes the form
of maximizing or minimizing a specific "score" or "cost" function that
associates a numerical quality measure with each possible solution.
(Indeed, for the motif-finding problem, Chapter 4 gives two different
We can frequently think of the process of "trying every possible solution"
as walking around a tree-shaped "solution space", some entire branches can
be skipped without sacrificing an optimal solution. (Sometimes called a
branch and bound strategy.)
For some of these combinatorial optimization problems, it may be that the only
algorithms that are guaranteed to give an optimal solution for any problem
instance sometimes take an amount of time that is exponentially larger than the
size of the data. A large number of combinatorial optimization problems have
been shown to be NP-complete, which probably implies that they cannot
be solved in time proportional to any fixed polynomial (such as the 17-th power)
of the instance size. For a more precise (but longer) discussion, see pp. 49-51
of the book.
For many bioinformatics problems, we can devise a "greedy" strategy, i.e.,
one that builds up a proposed solution by making early decisions that are not
later reversed. Typically (but not always) these methods produce sub-optimal
answers for many or most instances.
The hope is that they "work well in practice", though it may be very difficult
to show this convincingly.
Computational methods that do not always produce an optimal solution are
frequently called heuristic. (Often used as a noun, as in, "We solved
it with a heuristic.")
Some of these have a performance guarantee, e.g., they can be guaranteed
to produce output that is within a fixed percent of optimal.
Example. Consider the following problem (with no attempted biological
Given: A set of n positive numbers.
Desired output: Divide the numbers into two sets, where the
difference between the sums of the sets is as small as possible.
(Given a set of blocks of different heights, put them into two piles that are
as close as possible to the same height.)
Devise a greedy method that attempts to find a solution, and find an instance
where it gives a sub-optimal solution.