Phylogenetic systematics of Juncaceae, alignments associated with phylogenetic trees
Data files
Feb 16, 2026 version files 72.48 MB
-
353_gene_fastas.zip
1.15 MB
-
353_iqtree_genes_cat
10.66 MB
-
plast_genes.zip
818.71 KB
-
plast_iq-concat
19.85 MB
-
plast_maxtips_aln_trim_concat
39.63 MB
-
plast_maxtips_aln_trim.zip
351.51 KB
-
README.md
8.66 KB
-
sample_information.csv
6.47 KB
Abstract
Juncaceae has needed a taxonomic revision for some time. Specifically, the genus Juncus s.l. is known to be paraphyletic, as five small southern hemisphere genera have been shown repeatedly to be nested within it. In 2022, a new classification was proposed, based on phylogenies built from one nuclear and three chloroplast regions sequenced across much of Juncaceae, meant to resolve the paraphyletic nature of Juncus s.l. It created six new genera and was criticized on the basis that the genera proposed are not necessarily monophyletic, given the limited nature of the phylogenetic analyses available and the fact that the generic circumscriptions draw heavily on the morphology-based classifications found in the latest monograph of Juncaceae from 2002. We assembled a dataset consisting of hundreds of chloroplast and nuclear loci, using the Angiosperms 353 target capture probe set, to assess the monophyly of the newly proposed genera of Juncaceae. We found that the proposed genera mostly represent monophyletic groups, but that the proposed genus Juncinella is nested within the proposed genus Boreojuncus. We additionally found that Juncinella capitata was placed as sister to either Luzula or Oreojuncus and should be recognized as a monotypic genus. Finally, Australojuncus cyperoides and Verojuncus chlorocephalus were recovered outside of their morphologically assigned genera and require further investigation to be placed confidently. Here, we propose taxonomic revisions to rectify the issues stated above, but also find that further research is necessary, particularly to correctly place the South African annual taxa in Juncus s.l.
Dataset DOI: 10.5061/dryad.x3ffbg810
Associated publication: Kenny, R. J., L. Z. Drábková, and D. Potter. 2026. Phylogenetic systematics of Juncaceae. American Journal of Botany 113: XXXX.
Description of the data and file structure
This dataset contains multiple sequence alignments used as input for phylogenetic analyses of the rush family Juncaceae (Poales). The study used the Angiosperms 353 target capture probe set combined with genome skimming to generate nuclear and plastid sequence data across the family, with the goal of assessing the monophyly of genera proposed by Brožová et al. (2022) and Proćków and Záveská Drábková (2023). Data were generated from 81 species of Juncaceae plus outgroup taxa in Cyperaceae and Thurniaceae. DNA was extracted from herbarium specimens, silica-dried field collections, and sequences were downloaded from publicly available sequence read archives. Sequence reads were processed using HybPiper version 2.1.6 to extract Angiosperms 353 nuclear loci and plastid coding sequences. Alignments were generated using MAFFT version 7 (L-INS-i) and trimmed using trimAl version 1.4.rev15. Three separate datasets were analyzed: (1) Angiosperms 353 nuclear loci, (2) plastid coding sequences from off-target reads, and (3) an extended plastid dataset combining our plastid data with publicly available matK, rbcL, rpoC1, and trnA-psbA sequences downloaded from GenBank.
Individual sequences within alignment files are labeled with species names, accession numbers or sample codes that can be cross referenced using file sample_information.csv.
Files and variables
File: 353_gene_fastas.zip
Description: A zipped folder containing 199 individual aligned and trimmed FASTA files, one per locus, for the Angiosperms 353 nuclear gene analysis. Each file represents a single nuclear locus that passed filtering criteria (present in at least 50% of taxa, no paralog warnings). Alignments were generated using MAFFT version 7 (L-INS-i) and trimmed using trimAl with the -gappyout parameter. Sequences within each file are labeled by sample code. These alignments were used as input for individual gene tree estimation with IQ-TREE2 and subsequently summarized using ASTRAL III to produce the multispecies coalescent tree (Figure 1 in the associated publication). The dataset includes 70 ingroup species and 37 outgroup taxa, with 14,981 individual sequences across all loci, 127,894 alignment sites, 81% variable sites, 65% parsimony informative sites, and 6% missing data.
File: 353_iqtree_genes_cat
Description: A single concatenated FASTA file containing all 199 Angiosperms 353 nuclear loci, produced by concatenating the individual trimmed gene alignments from 353_gene_fastas.zip. This file was used as input for the concatenated maximum likelihood analysis in IQ-TREE2 version 2.2.6. The matrix was initially partitioned by gene, and ModelFinder was used to merge partitions and select substitution models. Clade support was assessed using 1000 ultrafast bootstraps. Sequences are labeled by sequence code.
File: plast_genes.zip
Description: A zipped folder containing 59 individual aligned and trimmed FASTA files, one per locus, for the plastid analysis. Each file represents a single plastid coding sequence extracted from off-target reads using HybPiper with a target file composed of 87 coding sequences from the Juncus effusus chloroplast genome (GenBank accession MW366789.1). Loci present in fewer than 50% of taxa were excluded. Alignments were generated using MAFFT version 7 (L-INS-i) and trimmed using trimAl with the -gappyout parameter. The dataset includes 81 ingroup species and 32 outgroup taxa. On average, locus alignments contained sequences from 86 individuals and 2,729 sites, with 25% parsimony informative sites. Sequences are labeled by taxon name.
File: plast_iq-concat
Description: A single concatenated FASTA file containing all 59 plastid loci, produced by concatenating the individual trimmed gene alignments from plast_genes.zip. This file was used as input for the concatenated maximum likelihood analysis in IQ-TREE2 version 2.2.6. One partition was used and ModelFinder selected the best-fit substitution model. Clade support was assessed using 1000 ultrafast bootstraps (Figure 2 in the associated publication). The alignment contains 174,096 sites, 44% variable sites, 25% parsimony informative sites, and 22% missing data. Sequences are labeled by sample code.
File: plast_maxtips_aln_trim.zip
Description: A zipped folder containing 59 individual aligned and trimmed FASTA files for the extended plastid analysis. This dataset combines our plastid sequences with publicly available matK, rbcL, rpoC1, and trnA-psbA sequences downloaded from GenBank using the R package phylotaR version 1.3.0. The dataset includes 730 samples representing 226 ingroup taxa (51% of species in Juncaceae), plus outgroup taxa. Alignments were generated using MAFFT version 7 (L-INS-i) and trimmed using trimAl with the -automated1 parameter. Recovery varied across loci: rbcL (689 sequences), matK (263), rpoC1 (189), and trnA-psbA (149). The alignment has 87% missing data because most publicly downloaded samples are represented only by the four commonly sequenced markers. Sample names and GenBank accession numbers for downloaded sequences are listed in Appendix S1 of the associated publication. Sequences are labeled by sample code.
File: plast_maxtips_aln_trim_concat
Description: A single concatenated FASTA file containing all 59 loci from the extended plastid analysis, produced by concatenating the individual trimmed gene alignments from plast_maxtips_aln_trim.zip. This file was used as input for the concatenated maximum likelihood analysis in IQ-TREE2 version 2.2.6. One partition was used and ModelFinder selected the best-fit substitution model. Clade support was assessed using 1000 ultrafast bootstraps (Figure 3 in the associated publication). The alignment contains 53,441 sites, 66% variable sites, 39% parsimony informative sites, and 87% missing data. Sequences are labeled by sample code.
File: sample_information.csv
Description: A csv file containing the sample codes (tube_lab), species, accession number, and herbarium voucher information that can be used for cross reference purposes.
Variables:
- Herbarium: Herbaruim code where the voucher specimen is stored, if applicable
- Sci_name: Scientific name of species sampled
- Voucher: Herbarium voucher information
- Source: Herbarium sample, silica dried tissue, or sequence download
- Accession: SRA accession number if applicable
- GenesWithSeqs_353: Number of sequences recovered for the Angionsperms 353 regions (NA- missing data)
- GenesWithSeqs_plast: Number of sequences recovered for the plastid regions
- Tube_lab: Sample code used for DNA extraction and analysis scripts
Code/software
Documentation for reproducing analyses using this data can be found at https://github.com/reedjohnkenny/Juncaceae_353_analysis/tree/master/
Key software used in generating these data files:
- HybPiper version 2.1.6 (Johnson et al., 2016): extraction and assembly of target loci from sequencing reads
- MAFFT version 7 (Katoh and Standley, 2013): multiple sequence alignment (L-INS-i method)
- trimAl version 1.4.rev15 (Capella-Gutiérrez et al., 2009): alignment trimming (-gappyout for 353 and plastid datasets; -automated1 for extended plastid dataset)
- IQ-TREE2 version 2.2.6 (Minh et al., 2020): maximum likelihood tree inference and ModelFinder model selection
- ASTRAL III (Zhang et al., 2018): multispecies coalescent species tree estimation from gene trees
- TreeShrink version 1.3.7 (Mai and Mirarab, 2018): detection and removal of outlier long branches in gene trees
- SPAdes (Bankevich et al., 2012): de novo assembly of loci
- Trimmomatic (Bolger et al., 2014): read trimming and adapter removal
- phylotaR version 1.3.0 (Bennett et al., 2018): retrieval of orthologous sequences from GenBank
Access information
Other publicly accessible locations of the data:
Raw demultiplexed and trimmed sequence reads can be found at GenBank Project PRJNA1196610. Voucher information and accession numbers are provided in Appendix 1 of the associated publication. Additional publicly available sequences used in the extended plastid analysis were downloaded from GenBank and the European Nucleotide Archive; accession numbers are listed in Appendix S1 of the associated publication.
