Our material on phylogenetics in bioinformatics was roughly divided into five "units". Some of the topics that you should especially focus on, with one or a few goals or questions follow each section.
1. concepts of trees and inferences based on trees
- trees as hypotheses of evolutionary history and shared ancestory
- HOMOPLASY: convergence, parallelism, reversal
- gene trees I: orthology, paralogy
- inference of ancestral states using ACCTRAN
* be able to "read" a phylogenetic tree, and draw correct inferences about the monophyly of groups of organisms or sequences
* given a tree and a set of data for a given character, be able to infer the ancestral states of the character using the method of ACCTRAN
2. methods of building phylogenetic trees
- parsimony, distance, likelihood compared and contrasted
- the basic approaches, similarities and differences
- standard (nonparametric) bootstrap in phylogenies: use and interpretation
- strengths and weaknesses of each of the major methods
- PHYLIP as an intro to computer programs for phylogeny
* be able to perform and interpret a small parsimony analysis by hand, as we did in class, or using any of the main approaches including boostrap, with PHYLIP
3. distance models of sequence evolution
* contrast the different distance models for sequence (or protein) evolution. What are some advantages and disadvantages?
4. maximum likelihood as a general tool for hypothesis testing
* what is the likelihood ratio test and how is it used to test a wide variety of possible hypotheses about sequence evolution, such as: rates of evolution, monophyly of group or sequences, similarity of branching history of two trees, etc.
* be able to outline or diagram the goals and basic steps in a parametric bootstrap analysis, and it's use in hypothesis testing in sequence studies.
5. further concepts and their application
- gene families II, reconciled gene trees
- long branch attraction conditions, causes
* The example given in class of phylogenetic analyses of invertebrate animals was a good example of a dataset where different methods gave different results, but exploring the different results led to a better understanding of the history of the sequences. What were some "take home lessons" to be gained from this example?