Journal: Systematic biology
It is thought that speciation in phytophagous insects is often due to colonization of novel host plants, because radiations of plant and insect lineages are typically asynchronous. Recent phylogenetic comparisons have supported this model of diversification for both insect herbivores and specialized pollinators. An exceptional case where contemporaneous plant-insect diversification might be expected is the obligate mutualism between fig trees (Ficus species, Moraceae) and their pollinating wasps (Agaonidae, Hymenoptera). The ubiquity and ecological significance of this mutualism in tropical and subtropical ecosystems has long intrigued biologists, but the systematic challenge posed by >750 interacting species pairs has hindered progress toward understanding its evolutionary history. In particular, taxon sampling and analytical tools have been insufficient for large-scale cophylogenetic analyses. Here, we sampled nearly 200 interacting pairs of fig and wasp species from across the globe. Two supermatrices were assembled: on an average, wasps had sequences from 77% of 6 genes (5.6 kb), figs had sequences from 60% of 5 genes (5.5 kb), and overall 850 new DNA sequences were generated for this study. We also developed a new analytical tool, Jane 2, for event-based phylogenetic reconciliation analysis of very large data sets. Separate Bayesian phylogenetic analyses for figs and fig wasps under relaxed molecular clock assumptions indicate Cretaceous diversification of crown groups and contemporaneous divergence for nearly half of all fig and pollinator lineages. Event-based cophylogenetic analyses further support the codiversification hypothesis. Biogeographic analyses indicate that the present-day distribution of fig and pollinator lineages is consistent with a Eurasian origin and subsequent dispersal, rather than with Gondwanan vicariance. Overall, our findings indicate that the fig-pollinator mutualism represents an extreme case among plant-insect interactions of coordinated dispersal and long-term codiversification. [Biogeography; coevolution; cospeciation; host switching; long-branch attraction; phylogeny.].
Horseshoe crabs (Xiphosura) are traditionally regarded as sister group to the clade of terrestrial chelicerates (Arachnida). This hypothesis has been challenged by recent phylogenomic analyses, but the non-monophyly of Arachnida has consistently been disregarded as artifactual. We re-evaluated the placement of Xiphosura among chelicerates using the most complete phylogenetic data set to date, expanding outgroup sampling, and including data from whole genome sequencing projects. In spite of uncertainty in the placement of some arachnid clades, all analyses show Xiphosura consistently nested within Arachnida as the sister group to Ricinulei (hooded tick spiders). It is apparent that the radiation of arachnids is an old one and occurred over a brief period of time, resulting in several consecutive short internodes, and thus is a potential case for the confounding effects of incomplete lineage sorting (ILS). We simulated coalescent gene trees to explore the effects of increasing levels of ILS on the placement of horseshoe crabs. In addition, common sources of systematic error were evaluated, as well as the effects of fast-evolving partitions and the dynamics of problematic long branch orders. Our results indicated that the placement of horseshoe crabs cannot be explained by missing data, compositional biases, saturation, or ILS. Interrogation of the phylogenetic signal showed that the majority of loci favor the derived placement of Xiphosura over a monophyletic Arachnida. Our analyses support the inference that horseshoe crabs represent a group of aquatic arachnids, comparable to aquatic mites, breaking a long-standing paradigm in chelicerate evolution and altering previous interpretations of the ancestral transition to the terrestrial habitat. Future studies testing chelicerate relationships should approach the task with a sampling strategy where the monophyly of Arachnida is not held as the premise.
Dendroscope 3 is a new program for working with rooted phylogenetic trees and networks. It provides a number of methods for drawing and comparing rooted phylogenetic networks, and for computing them from rooted trees. The program can be used interactively or in command-line mode. The program is written in Java, use of the software is free, and installers for all 3 major operating systems can be downloaded from www.dendroscope.org. [Phylogenetic trees; phylogenetic networks; software.].
Genome-scale data offer the opportunity to clarify phylogenetic relationships that are difficult to resolve with few loci, but they can also identify genomic regions with evolutionary history distinct from that of the species history. We collected whole-genome sequence data from 29 taxa in the legume genus Medicago, then aligned these sequences to the M. truncatula reference genome to confidently identify 87,596 variable homologous sites. We used this data set to estimate phylogenetic relationships among Medicago species, to investigate the number of sites needed to provide robust phylogenetic estimates, and to identify specific genomic regions supporting topologies in conflict with the genome-wide phylogeny. Our full genomic data set resolves relationships within the genus that were previously intractable. Sub-sampling the data reveals considerable variation in phylogenetic signal and power in smaller subsets of the data. Even when sampling 5,000 sites, no random sample of the data supports a topology identical to that of the genome-wide phylogeny. Phylogenetic relationships estimated from 500-site sliding windows revealed genome regions supporting several alternative species relationships among recently-diverged taxa, consistent with the expected effects of deep coalescence or introgression in the recent history of Medicago.
A general morphometric method for describing shape variation in a sample consisting of landmarks and multiple outline shapes is developed in this article. A distance metric is developed for such data and is used to embed the data in a low-dimensional Euclidean space. The Euclidean space is used to generate summary statistics such as mean and principal shape variation which are implicitly represented in the original space using elements of the sample. A new distance metric for outline shapes is proposed based on Procrustes distance that does not require the extraction of discrete points along the curve. The outline distance metric can be naturally combined with distances between landmarks. A method for aligning outlines and multiple outlines is developed that minimizes the distance metric. The method is compared with semilandmarks on synthetic data and 2 real data sets. Outline methods produce useful and valid results when suitably constrained by landmarks and are useful visualization aids, but questions remain about their suitability for answering biological questions until appropriate distance metrics can be biologically validated. [Morphometrics; outline analysis; semilandmark.].
Correct rooting of the angiosperm radiation is both challenging and necessary for understanding the origins and evolution of physiological and phenotypic traits in flowering plants. The problem is known to be difficult due to the large genetic distance separating flowering plants from other seed plants and the sparse taxon sampling among basal angiosperms. Here, we provide further evidence for concern over substitution model misspecification in analyses of chloroplast DNA sequences. We show that support for Amborella as the sole representative of the most basal angiosperm lineage is founded on sequence site patterns poorly described by time-reversible substitution models. Improving the fit between sequence data and substitution model identifies Trithuria, Nymphaeaceae, and Amborella as surviving relatives of the most basal lineage of flowering plants. This finding indicates that aquatic and herbaceous species dominate the earliest extant lineage of flowering plants. [Trithuria inconspicua; chloroplast genome; angiosperm origins; heterotachy; base compositional heterogeneity; data model fit.].
A key challenge for biologists is to document and explain global patterns of diversification in a wide range of environments. Here, we explore patterns of continental-scale diversification in a groundwater species-rich clade, the superfamily Aselloidea (Pancrustacea, Isopoda). Our analyses supported a constant diversification rate during most of the course of Aselloidea evolution, until 4-15 Ma when diversification rates started to decrease. This constant accumulation of lineages challenges the view that groundwater species diversification in temperate regions might have been primarily driven by major changes in physical environment leading to the extinction of surface populations and subsequent synchronous isolation of multiple groundwater populations. Rather than acting synchronously over broad geographic regions, factors causing extinction of surface populations and subsequent reproductive isolation of groundwater populations may act in a local and asynchronous manner, thereby resulting in a constant speciation rate over time. Our phylogeny also revealed several cases of parapatric distributions among closely related surface-water and groundwater species suggesting that species diversification could also arise from a process of disruptive selection along the surface-subterranean environmental gradient. Our results call for reevaluating the spatial scale and timing of factors causing diversification events in groundwater.
The evolution of cetaceans, from their early transition to an aquatic lifestyle to their subsequent diversification, has been the subject of numerous studies. However, while the higher-level relationships among cetacean families have been largely settled, several aspects of the systematics within these groups remain unresolved. Problematic clades include the oceanic dolphins (37 spp.), which have experienced a recent rapid radiation, and the beaked whales (22 spp.), which have not been investigated in detail using nuclear loci. The combined application of high-throughput sequencing with techniques that target specific genomic sequences provide a powerful means of rapidly generating large volumes of orthologous sequence data for use in phylogenomic studies. To elucidate the phylogenetic relationships within the Cetacea, we combined sequence capture with Illumina sequencing to generate data for ∼3200 protein-coding genes for 68 cetacean species and their close relatives including the pygmy hippopotamus. By combining data from >38,000 exons with existing sequences from 11 cetaceans and seven outgroup taxa, we produced the first comprehensive comparative genomic dataset for cetaceans, spanning 6,527,596 aligned base pairs and 89 taxa. Phylogenetic trees reconstructed with maximum likelihood and Bayesian inference of concatenated loci, as well as with coalescence analyses of individual gene trees, produced mostly concordant and well-supported trees. Our results completely resolve the relationships among beaked whales as well as the contentious relationships among oceanic dolphins, especially the problematic subfamily Delphininae. We carried out Bayesian estimation of species divergence times using MCMCTree, and compared our complete dataset to a subset of clocklike genes. Analyses using the complete dataset consistently showed less variance in divergence times than the reduced dataset. In addition, integration of new fossils (e.g., Mystacodon selenensis) indicate that the diversification of Crown Cetacea began before the Late Eocene and the divergence of Crown Delphinidae as early as the Middle Miocene.
Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered vs. filtered alignments in the context of single gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated datasets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. While our result suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.
Modeling discrete phenotypic traits for either ancestral character state reconstruction or morphology-based phylogenetic inference suffers from ambiguities of character coding, homology assessment, dependencies, and selection of adequate models. These drawbacks occur because trait evolution is driven by two key processes - hierarchical and hidden - which are not accommodated simultaneously by the available phylogenetic methods. The hierarchical process refers to the dependencies between anatomical body parts, while the hidden process refers to the evolution of gene regulatory networks underlying trait development. Herein, I demonstrate that these processes can be efficiently modeled using structured Markov models equipped with hidden states, which resolves the majority of the problems associated with discrete traits. Integration of structured Markov models with anatomy ontologies can adequately incorporate the hierarchical dependencies, while the use of the hidden states accommodates hidden evolution of gene regulatory networks and substitution rate heterogeneity. I assess the new models using simulations and theoretical synthesis. The new approach solves the long-standing “tail color problem,” in which the trait is scored for species with tails of different colors or no tails. It also presents a previously unknown issue called the “two-scientist paradox,” in which the nature of coding the trait and the hidden processes driving the trait’s evolution are confounded; failing to account for the hidden process may result in a bias, which can be avoided by using hidden state models. All this provides a clear guideline for coding traits into characters. This paper gives practical examples of using the new framework for phylogenetic inference and comparative analysis.