Our material on phylogenetics in
bioinformatics was roughly divided into five "units". Some of the topics that you should
especially focus on, with one or a few goals or questions follow each section.
1. concepts of trees and inferences based on trees
- trees as hypotheses of
evolutionary history and shared ancestory
- HOMOPLASY: convergence, parallelism, reversal
- gene trees I: orthology, paralogy
- monophyly
- inference of ancestral states
using ACCTRAN
*
be able to "read" a phylogenetic tree, and draw correct inferences
about the monophyly of groups of organisms or sequences
*
given a tree and a set of data for a given character, be able to infer the
ancestral states of the character using the method of ACCTRAN
2. methods of building phylogenetic trees
- parsimony, distance, likelihood
compared and contrasted
- the basic approaches,
similarities and differences
- standard (nonparametric)
bootstrap in phylogenies: use and interpretation
- strengths and weaknesses of each
of the major methods
- PHYLIP as an intro to computer
programs for phylogeny
*
be able to perform and interpret a small parsimony analysis by hand, as we did
in class, or using any of the main approaches including boostrap, with PHYLIP
3. distance models of sequence evolution
*
contrast the different distance models for sequence (or protein)
evolution. What are some
advantages and disadvantages?
4. maximum likelihood as a general tool for hypothesis testing
*
what is the likelihood ratio test and how is it used to test a wide variety of
possible hypotheses about sequence evolution, such as: rates of evolution, monophyly of group
or sequences, similarity of branching history of two trees, etc.
*
be able to outline or diagram the goals and basic steps in a parametric
bootstrap analysis, and it's use in
hypothesis testing in sequence studies.
5. further concepts and their application
- gene families II, reconciled
gene trees
- long branch attraction
conditions, causes
*
The example given in class of phylogenetic analyses of invertebrate animals was
a good example of a dataset where different methods gave different results, but
exploring the different results led to a better understanding of the history of
the sequences. What were some
"take home lessons" to be gained from this example?