Methods and theory of phylogenetic inference
Davis, Jerrold I. , Nixon, Kevin C. , Little, Damon P. .
The limits of conventional cladistic analysis - the 500-terminal rbcL matrix, sources of variation in the estimation of branch support, and some gentle criticisms of supertree methods.
Two published analyses of the 500-terminal rbcL matrix (known as zilla) failed to yield most-parsimonious trees, and although analytical methods developed since that time easily yield such trees, there are widespread perceptions that shortest trees for this matrix cannot be discovered by conventional search methods in a reasonable period of time with a single PC. This matrix is amenable to analysis by conventional methods, and relatively short preliminary analyses can help to identify the most efficient settings for such searches. Two-stage analyses of the zilla matrix, involving relatively intensive branch-swapping conducted on the best tree sets obtained from a large number of less intensive searches, are consistently more efficient than one-stage searches. For the zilla matrix, one of the most efficient searches is a two-stage analysis with 50 trees held during the first stage, with second-stage swapping (with 2000 trees held in memory) conducted on the most optimal 5% of all tree sets obtained during the first stage. Under these conditions, about one fourth of all tree sets subjected to second-stage swapping yield most-parsimonious trees. Although data sets with substantially more terminals than the three-gene matrix are beyond the current limits of conventional cladistic analysis on a single PC, these methods are likely to be of continued utility when employed in combination with more recently developed methods such as tree fusion, sectorial searches, and the parsimony ratchet. Estimates of branch support, using methods such as the bootstrap and the jackknife, are sensitive to a variety of analytical factors. Thus, it is problematic to compare support for similar groupings obtained from analyses that include partially overlapping sets of taxa, and that have been conducted with nonidentical settings. Supertree methods using surrogate data of this sort introduce error while failing to circumvent the analytical challenges associated with large matrices.
1 - Cornell University, L.H. Bailey Hortorium, Department of Plant Biology, Ithaca, New York, 14853, U.S.A.
Presentation Type: Symposium
Location: Alpine A (Snowbird Center)
Date: Tuesday, August 3rd, 2004
Time: 4:15 PM