Angiosperm Anchored Phylogenomics
Targeted enrichment of DNA libraries prior to next generation sequencing permits the assembly of datasets with 100s or 1000s of putatively orthologous loci from across the genome for sophisticated species tree inference. A potentially transformative goal for the angiosperm phylogenomics community is agreement on a common set of target loci, since it would potentially remove the marker development step and maximize data combinability across studies. With collaborators (especially Alan and Emily Lemmon) we have identified and validated the candidate set of loci suitable for target enrichment across angiosperms. We began with the 959 genes previously identified by other researchers as apparently single-copy in the genomes of Arabidopsis, poplar, grape, and rice, identified the corresponding exons in Arabidopsis, found that 3050 of the exons were above a threshold size necessary for enrichment, and narrowed these down further to 1721 exons that are ≥55% similar between Arabidopsis and rice. Using these two taxa as a reference we then identified orthologous regions from 33 complete angiosperm genomes (representing the phylogenetic breadth of angiosperms, including Amborella) and nine low-coverage genomes that we produced for non-model angiosperms. 499 of the exons had an average copy number ≤1.2 and occurrence in ≥85% of the genomes. The sequences for those 499 exons in 26 genomes were used to produce a custom Agilent SureSelect Target Enrichment Kit. We enriched genomic libraries for 50 angiosperms, representing orders from across the phylogeny (Poales, Brassicales, Dipsacales, Caryophyllales, Proteales, and Magnoliales). The kit had an average enrichment success of 93.6% of the targets across the 50 species (range = 82.4–98.2%), and the assembled locus length averaged 903 bases (range = 428–2766) for the 50 species. Variability in the 499 target regions (the exons; with an average pairwise distance of 75.3%) produced a phylogeny for 91 taxa (the 50 enriched species, the 33 angiosperms with complete genomes used in the earlier step, and nine non-model angiosperm with low-coverage genomes) that was broadly consistent with previously published whole plastid phylogenies when those agreed with previous functional-nuclear-phylogenomics results, but it echoed (mostly) the latter when the previous plastid and nuclear results did not agree. With the addition of the flanking regions relationships in each of seven shallow clades were fully resolved and strongly supported. We are actively collaborating with a number of labs to produce data for their study systems. Please send an email to Austin if you are interested in establishing a collaboration.
Mast, A. R., Lemmon, A., Buddenhagen, C. and Lemmon, E. August 2014. Anchored phylogenomics in angiosperms: Maximizing data combinability through coordinated locus selection. Presented at Botany 2014 Conference, Boise, ID.
Buddenhagen, C., Lemmon, A., Lemmon, E., and A. R. Mast. August 2014. Anchored phylogenomics in angiosperms: Maximizing data combinability through coordinated locus selection. Presented at Evolution 2014 Conference, Raleigh, NC.
Mast, A. R., Buddenhagen, C. and Lemmon, A. June 2013. Anchored Phylogenomics in the Angiosperms. Presented at Evolution 2013 Conference, Snowbird, UT.