Methods and theory of phylogenetic inference
Sanderson, MJ , Driskell, Amy .
King Kong versus Godzilla: supertree and supermatrix phylogenetics.
Available sequence data is not uniformly distributed among taxa. Complete genomes or large numbers of sequences are available for a few model organisms, whereas only one or a few genes are available across large numbers of species. Construction of synthetic phylogenetic hypotheses from these data is confounded by this sample bias regardless of whether a supermatrix or supertree approach is used. In the former, all sequences are concatenated into a single matrix, with many missing entries; in the latter, a tree is constructed from each gene but the taxa are only partly shared between trees. We examined the efficacy of these two strategies in a sample of 185,000 green plant proteins from GenBank. Sequences were culled to 853 putative single-copy phylogenetically informative sets of homologs. From these, a smaller subset of 254 were extracted, spanning 69 green plant species. A protein parsimony bootstrap majority rule tree was constructed for each gene and these were used as inputs to MRP supertree analysis. A single large concatenated data set was also constructed consisting of ~96,000 sites, including 84% missing data, and also analyzed with protein parsimony. Both analyses recovered many conventionally supported nodes in green plants, but each included some ambiguous results and clear mistakes. Overall, despite the remarkably high level of missing information, the supermatrix tree was more in line with present understanding of green plant relationships.
1 - University of California, Davis, Plant Biology, One Shields Ave., Davis, California, 95616, USA
Presentation Type: Symposium
Location: Alpine A (Snowbird Center)
Date: Tuesday, August 3rd, 2004
Time: 4:45 PM