Properties of Phylogenetic Methods

Consistency
The tendency of the method to converge on the correct answer as the sample size is increased (in phylogenetics, consistent methods converge on the correct tree as more characters are added to the alignment).
Efficiency
The ability of the method to recover the correct answer with limited amounts of data. If the sample size is held constant, the more efficient estimator will have smaller variance.
Power
1) synonomous with efficiency, or 2) The probability that a test will reject a hypothesis when it is false.
Lack of Bias
The tendency of the method to produce with limited data results that vary in a random distribution around the correct answer.
Tolerance
The ability of a method to recover the correct answer under conditions that are not consistent with its a priori assumptions.
Informativenss
The ability of the method to provide informative descriptive measures. For phylogentic methods these measures might include tree topology, branch lengths, and measures of reliability.
Scalability
The ability to perform well with both small and large datasets, both in terms of numbers of taxa and of characters.
Speed
The lenth of time a method takes to solve a problem.
Practicality
Some methods with otherwise desirable properties may not be practical because they require excessive computation, very large memory spaces, or other resources that make them impractical for routine use.

Consistency

Felsenstein, 1978 documented the inconsistency of parsimony and uncorrected distance methods under certain conditions.

Those conditions involve unequal branch lengths with short internal branches and some long external branches.

The "Felsenstein Zone" - If one thinks of an abstract "tree space" as the set of all possible tree topologies (including branch lengths) for a given phylogeny, the Felsenstein Zone is the region where a phylogenetic method becomes inconsistent.

Consider nucleotide evolution on a four-taxon tree:

Under some conditions convergent evolution (i.e., homoplasy) can be anticipated and expected.

When the number of possible character-states is limited and external branches are long (i.e., have many character-state changes for each character), convergent evolution is expected.

Character-state changes that occur on branch xy will give rise to synapomorphic character states shared by (BC)

    We only know that these are synapomorphies for (BC) because we assume that O is the outgroup, but the mapping of characters won't be affected by this. So if we think of the tree as a four-taxon unrooted tree, character-state changes on xy support the bipartition ((A,O)(B,C)).

    Remember that we normally only have direct information about the character states at the branch ends, i.e., for A,B,C and O.

On tree 1, because the branches yB and yC are not long compared with branch xy, many of the character states present at y should still be present in the sequences determined for B and C.

By contrast, on tree 2, the branch xy is short compared with the branches yB and xA.

    The total number of character-state changes between O and C will be small

    The there will be a large number of character-state changes on branches xA and yB.

    Because nucleotide data include only four character states, 25% of all random mutations will produce a character-state that matches the character-state in one of the other taxa.

    If branches xA and yB become long enough, the character states in A and B will begin to resemble each other.

Because parsimony favors the reconstruction that requires the smallest number of character-state changes, under these conditions parsimony will favor a tree that places A and B together.

This inconsistency of phylogenetic methods under conditions of greatly unequal branch lengths was described by Felsenstein (1978). It is often called long branch attraction, although the effects are much more complex than simply a tendence for long branches to group together.

However, if branches xA and yB are not saturated for base substitutions, B and C will still share character states that resulted from character-state changes that occurred on xy.

Efficiency

An efficient phylogenetic method can find a useful phylogenetic tree with a small number of characters.

Every parameter that needs to be estimated requires a certain amount of information to be estimated. Thus parameter-rich methods require more data than do few-parameter methods.

Lake's method of invariants seems to solve some phylogenetic problems, but requires enormous amounts of data to find a tree.

Bias

Tolerance

Informativeness

Practicality

Even when a method has desirable properties in other ways, it may not be practical to use under all conditions. The most commonly recognized problem that can make some methods impractical is that of computational speed. Methods that require a large number of calculations, such as maximum likelihood, tend to require so much computer time that they can be applied in practice only to small- to medium- sized datasets. Several factors influence whether or not a given

Other methods perform well with small numbers of taxa, but are not practical for large numbers of taxa. Hadamard conjugation requires the calculation of matrices that increase in size with the number of taxa. At present these matrices are practical to compute only with relatively few taxa.

Scalability

Speed


* Felsenstein, J. 1978. Cases in which parsimony or compatability methods will be positively misleading. Syst. Zool. 27:401-410.

Felsenstein, J. 1983. Models for inferring phylogenies: a statistical view. Pp. 315-334 in J. Felsenstein (ed.), Numerical Taxonomy: Proceedings of a NATO Advanced Study Institute. Springer-Verlag, Berlin.

Hillis, D.M., J.P. Huelsenbeck, and C.W. Cunningham. 1994. Application and accuracy of molecular phylogenies. Science 264:671-676.

* Huelsenbeck, J.P., and D.M. Hillis. 1993. Success of phylogenetic methods in the four-taxon case. Syst. Biol. 42:247-264.

Huelsenbeck, J.P., and D.M. Hillis. 1992. Signal, noise, and reliability in molecular phylogenetic analyses. J. Heredity 83:189-195.

Porkess, R. 1991. The Harper Collins Dictionary of Statistics. Harper Collins, New York, NY.