This README.txt file was generated on 23 October, 2020 by Joel Nitta ------------------- GENERAL INFORMATION ----------------- Title of Dataset: Data from: A taxonomic and molecular survey of the pteridophytes of the Nectandra Cloud Forest Reserve, Costa Rica Author Information Principal Investigator: Joel H. Nitta Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan joelnitta@gmail.com Associate or Co-investigator: Atsushi Ebihara Department of Botany, National Museum of Nature and Science, 4-1-1 Amakubo, Tsukuba 305-0005, Japan ebihara@kahaku.go.jp Associate or Co-investigator: Alan R. Smith University Herbarium, University of California, Berkeley. 1001 Valley Life Sciences Bldg. #2465. Berkeley, California, 94720, U.S.A. arsmith@berkeley.edu Date of data collection: 2008–2018 Geographic location of data collection: Nectandra Cloud Forest Reserve, Costa Rica Information about funding sources or sponsorship that supported the collection of the data: Funding provided in part by the Nectandra Institute and Japan Society for the Promotion of Science (Kakenhi grant no. 15K07204) -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: CC0 1.0 Universal (CC0 1.0) Recommended citation for the data: Nitta JH, Ebihara A, Smith AR (2020) Data from: A taxonomic and molecular survey of the pteridophytes of the Nectandra Cloud Forest Reserve, Costa Rica. Dryad Digital Repository. https://doi.org/10.5061/dryad.bnzs7h477 Citation for and links to publications that cite or use the data: Nitta JH, Ebihara A, Smith AR (2020) A taxonomic and molecular survey of the pteridophytes of the Nectandra Cloud Forest Reserve, Costa Rica. PLoS ONE. FIXME: ADD DOI WHEN AVAILABLE Code for analyzing the data is available on github: https://github.com/joelnitta/nectandra_ferns -------------------- DATA & FILE OVERVIEW -------------------- File list (filenames, directory structure (for zipped files) and brief description of all data files): • costa_rica_richness.csv: Data on species richness of pteridophytes in protected areas in Costa Rica. • cyatheaceae_rbcL.fasta: Aligned rbcL sequences of family Cyatheaceae from the Nectandra Cloud Forest Reserve, Costa Rica and all available sequences on GenBank in FASTA format. • cyatheaceae_rbcL.tre: Phylogenetic tree of family Cyatheaceae from the Nectandra Cloud Forest Reserve, Costa Rica and all available rbcL sequences on GenBank in Newick format. • grammitidoideae_rbcL.fasta: Aligned rbcL sequences of subfamily Grammitidoideae from the Nectandra Cloud Forest Reserve, Costa Rica and all available sequences on GenBank in FASTA format. • grammitidoideae_rbcL.tre: Phylogenetic tree of subfamily Grammitidoideae from the Nectandra Cloud Forest Reserve, Costa Rica and all available rbcL sequences on GenBank in Newick format. • JNG4254.fasta: DNA sequence in FASTA format of rbcL gene from Amauropelta atrovirens (C. Chr.) Salino & T.E. Almeida (Nitta 2237). • nectandra_gb_template.sbt: Plain text file (submit-block object) containing metadata related to GenBank submission. • nectandra_DNA_accessions.csv: DNA accession numbers and specimen accession numbers of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica. • nectandra_rbcL.fasta: Newly generated rbcL sequences of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica in FASTA format. • nectandra_rbcL.phy: Aligned rbcL sequences of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica in PHYLIP format. • nectandra_rbcL.treefile: Phylogenetic tree of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica in Newick format. • nectandra_specimens.csv: Specimen data of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica collected by Joel Nitta. • ppgi_taxonomy.csv: Taxonomic system of Pteridophyte Phylogeny Group I (2016) for pteridophytes at the genus level and above. • seqids.txt: Newly assigned GenBank accession numbers for sequences generated by this project. Additional related data collected that was not included in the current data package: • rbcL_clean_sporos.fasta: rbcL sequences of pteridophytes of Moorea, French Polynesia [Nitta et al. (2017); https://doi.org/10.5061/dryad.df59g]. • ESM1.csv: A list of native fern and lycophyte taxa (species, subspecies and varieties; 721 taxa total) in Japan [Ebihara and Nitta (2019); https://doi.org/10.5061/dryad.4362p32]. • FernGreenListV1.01E.xls: List of Japanese ferns and lycophytes species including scientific name, endemic status, conservation status, and other taxonomic data [Ebihara and Nitta (2019); https://doi.org/10.5061/dryad.4362p32]. • rbcl_mrbayes.nex: NEXUS file used for phylogenetic analysis of Japanese fern and lycophyte taxa with MrBayes [Ebihara and Nitta (2019); https://doi.org/10.5061/dryad.4362p32]. -------------------------- METHODOLOGICAL INFORMATION -------------------------- Description of methods used for collection/generation of data: Surveys of pteridophytes (i.e., ferns and lycophytes) were carried out over three field seasons (January 2008, 2011, and 2013; 37 days total) at the Nectandra Cloud Forest Reserve, Costa Rica. Most specimens were collected along trails through the reserve. Epiphytes were collected from fallen trees or tree branches, or up to 2 m on tree trunks. Permits for collection were obtained from the Costa Rican government (SINAC No. 04941 and Cites 2014-CR 1006/SJ (#S 1045)). The first set of voucher specimens was deposited at UC, with duplicates at CR, GH, and TI. Herbarium codes follow Thiers (2020). Leaf tissue was preserved on silica gel for DNA extraction. Spores of selected taxa were observed with a standard compound light microscope. DNA was extracted with the DNEasy plant mini kit following the manufacturer’s protocol (Qiagen). One species per taxon was sampled for morphologically distinct taxa, and up to five specimens per taxon for taxa that are more difficult to identify using standard keys and morphological characters. The plastid rbcL gene was amplified using PCR primers and thermocycler settings of Schuettpelz and Pryer (2007). PCR products were purified with Exo-STAR enzyme (GE Healthcare) and sequenced using the Big Dye Terminator v3.1 Cycle Sequencing Kit (ThermoFisher) with two internal primers, ESRBCL654R and ESRBCL628F (Schuettpelz and Pryer 2007) in addition to the amplification primers. The resulting AB1 trace files were imported into Geneious (Kearse et al. 2012), assembled into contigs, and the consensus sequences exported in FASTA format. A multi-sequence alignment was generated using MAFFT (Katoh et al. 2002), and a phylogenetic tree inferred using IQ-TREE with automatic model selection (Nguyen et al. 2015). For a small number of genera that were not supported as monophyletic in the original phylogenetic analysis (Cyathea and Lellingeria), all available rbcL sequences for closely related taxa (at the family or subfamily level, respectively) were downloaded from GenBank, aligned in combination with the newly generated sequences from Nectandra with MAFFT, and a phylogenetic tree inferred using FastTree on default settings (Price, Dehal, and Arkin 2009, 2010). Molecular analysis was performed under permits R-CM-RN-001-2014-OT-CONAGEBIO and R-CM-RN-002-2017-OT-CONAGEBIO. For additional methodological details, see Nitta JH, Ebihara A, Smith AR (2020). -------------------------- DATA-SPECIFIC INFORMATION -------------------------- costa_rica_richness.csv: Data on species richness of pteridophytes in protected areas in Costa Rica. Compiled by Joel Nitta based on references in the “citation” column. Number of variables: 12 Number of cases/rows: 6 Variable list: • name: Abbreviated name of site. • full_name: Full name of site. • min_el_m: Minimum elevation of site in meters. • max_el_m: Maximum elevation of site in meters. • area_ha: Area of site in hectares. • richness: Number of species occurring at the site. • richness_per_ha: Number of species per hectare occurring at the site. • holdridge_type: Holdridge (1967) life-zone type. • citation: Reference for data. • citation_number: Reference number in manuscript. • latitude: Latitude in decimal-degrees. • longitude: Longitude in decimal-degrees. Missing data codes: Missing data have no values (nothing entered between commas in the CSV file). Specialized formats or other abbreviations used: None. -------------------------- cyatheaceae_rbcL.fasta: Aligned rbcL sequences of family Cyatheaceae from the Nectandra Cloud Forest Reserve, Costa Rica and all available sequences on GenBank in FASTA format. Species from Nectandra in family Dicksoniaceae included as outgroup. Sequences aligned using MAFFT (Katoh et al. 2002). Numbers after species names are GenBank accession numbers for sequences downloaded from GenBank or J. H. Nitta specimen collection numbers for sequences newly obtained by this study. 323 sequences; 1309 bp; 185 parsimony-informative sites. -------------------------- cyatheaceae_rbcL.tre: Phylogenetic tree of family Cyatheaceae from the Nectandra Cloud Forest Reserve, Costa Rica and all available rbcL sequences on GenBank in Newick format. Species from Nectandra in family Dicksoniaceae included as outgroup. Tree inferred using FastTree (Price, Dehal, and Arkin 2009, 2010). Numbers after species names are GenBank accession numbers for sequences downloaded from GenBank or J. H. Nitta specimen collection numbers for sequences newly obtained by this study. Numbers at nodes indicate local support values computed with the Shimodaira–Hasegawa test. 323 tips; 281 internal nodes. -------------------------- grammitidoideae_rbcL.fasta: Aligned rbcL sequences of subfamily Grammitidoideae from the Nectandra Cloud Forest Reserve, Costa Rica and all available sequences on GenBank in FASTA format. Sequences aligned using MAFFT (Katoh et al. 2002). Species from Nectandra in subfamily Polypodioideae included as outgroup. Numbers after species names are GenBank accession numbers for sequences downloaded from GenBank or J. H. Nitta specimen collection numbers for sequences newly obtained by this study. 751 sequences; 1314 bp; 441 parsimony-informative sites. -------------------------- grammitidoideae_rbcL.tre: Phylogenetic tree of subfamily Grammitidoideae from the Nectandra Cloud Forest Reserve, Costa Rica and all available rbcL sequences on GenBank in Newick format. Species from Nectandra in subfamily Polypodioideae included as outgroup. Tree inferred using FastTree (Price, Dehal, and Arkin 2009, 2010). Numbers after species names are GenBank accession numbers for sequences downloaded from GenBank or J. H. Nitta specimen collection numbers for sequences newly obtained by this study. Numbers at nodes indicate local support values computed with the Shimodaira–Hasegawa test. 751 tips; 676 internal nodes. -------------------------- JNG4254.fasta: DNA sequence in FASTA format of rbcL gene from Amauropelta atrovirens (C. Chr.) Salino & T.E. Almeida (Nitta 2237). -------------------------- nectandra_DNA_accessions.csv: DNA accession numbers and specimen accession numbers of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica. Number of variables: 2 Number of cases/rows: 235 Variable list: • genomic_id: Genomic accession number assigned during DNA extraction, of the form “JNG” plus a four-digit number. Unique values. • specimen_id: Specimen accession number assigned to each specimen in nectandra_specimens.csv. Integer (not unique). Missing data codes: No missing data. Specialized formats or other abbreviations used: None. -------------------------- nectandra_gb_template.sbt: Plain text file (submit-block object) containing metadata related to GenBank submission (author names and contact information). Generated using template at https://submit.ncbi.nlm.nih.gov/genbank/template/submission/ Specialized formats or other abbreviations used: Submit-block object format. -------------------------- nectandra_rbcL.fasta: Newly generated rbcL sequences of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica in FASTA format. All species included occur at the Nectandra Cloud Forest Reserve, Costa Rica; a small number of sequences are from specimens collected elsewhere. Sequence names correspond to ‘genomic_id’ in nectandra_DNA_accessions.csv. 186 sequences; shortest sequence 466 bp; longest sequence 1309 bp; mean sequence length 1292 bp. Exported from Geneious project folder “Clean Sporos Trimmed Genbank Submission” (raw Geneious project file not included in this dataset). -------------------------- nectandra_rbcL.phy: Aligned rbcL sequences of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica in PHYLIP format. 191 sequences; 1309 bp; 591 parsimony-informative sites. -------------------------- nectandra_rbcL.treefile: Phylogenetic tree of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica in Newick format inferred with IQTREE (Nguyen et al. 2015). Values at each node indicate SH-aLRT support (%) / UFboot support (%). 191 tips; 189 internal nodes. -------------------------- nectandra_specimens.csv: Specimen data of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica collected by Joel Nitta. Formatting UTF-8. Number of variables: 23 Number of cases/rows: 320 Variable list: • specimen_id: Unique specimen identification number (integer). • specimen: Voucher specimen number. • genus: Genus • specific_epithet: Specific epithet. • infraspecific_rank: Infraspecific rank. • infraspecific_name: Infraspecific name. • certainty: Degree of taxonomic certainty if not completely certain. • species: Species (genus plus specific epithet). • taxon: Species plus infraspecific name. • scientific_name: Taxon plus its author. • author: Author of the species. • var_author: Author of the variety. • country: Country of origin. • locality: General area of collection. • site: Specific site where collected. • observations: Observations about specimen. • elevation: Elevation in m. • latitude: Latitude in decimal-degrees. • longitude: Longitude in decimal-degrees. • collector: Name of collector. • other_collectors: Names of other collectors if present. • herbaria: Codes of herbaria where voucher specimens are lodged. • date_collected: Date collected in YYYY-MM-DD format. Missing data codes: Missing or non-applicable data have no values (nothing entered between commas in the CSV file). Specialized formats or other abbreviations used: Herbaria codes follow Index Herbariorum (Thiers 2020), except for “Nectandra”, which indicates the private herbarium at the Nectandra Cloud Forest Reserve. -------------------------- ppgi_taxonomy.csv: Taxonomic system of Pteridophyte Phylogeny Group I (2016) for pteridophytes at the genus level and above. Updated with one new genus (Hiya). Number of variables: 6 Number of cases/rows: 338 Variable list: • class: Class. • order: Order. • suborder: Suborder. • family: Family. • subfamily: Subfamily. • genus: Genus. Missing data codes: Non-applicable data have no values (nothing entered between commas in the CSV file). Specialized formats or other abbreviations used: None. -------------------------- seqids.txt: Newly assigned GenBank accession numbers for sequences generated by this project. Received via email from GenBank admin (gb-admin@ncbi.nlm.nih.gov) 2020-10-21. Tab-separated text file without column names. Number of variables: 2 Number of cases/rows: 186 Variable list: • (first column): name of sequence submission file followed by genomic ID number separated by a space. • (second column): GenBank accession number. Missing data codes: No missing data. Specialized formats or other abbreviations used: None. -------------------------- CHANGE LOG --- 2020-10-23 costa_rica_richness.csv: Change richness for Nectandra from 176 to 175 after excluding a single non-native species, Macrothelypteris torresiana. Accordingly, change richness_per_ha for Nectandra from 1.113924 to 1.107595. Change reference numbers to reflect updated reference numbers in MS. cyatheaceae_rbcL.fasta: The previous version was not sequences of Cyatheaceae, but rather Grammitidoideae by mistake. Change to Grammitidoideae. cyatheaceae_rbcL.tre: Update tree file after re-running phylogenetic analysis. grammitidoideae_rbcL.fasta: Change name of sequence “Mycopteris_taxifolia_Nitta_707” to “Mycopteris_costaricensis_Nitta_707”. grammitidoideae_rbcL.tre: Update tree file after re-running phylogenetic analysis. nectandra_DNA_accessions.csv: Add GenBank accession numbers for sequences newly generated by this study (those starting with “MW”). nectandra_gb_template.sbt: Newly added file. nectandra_rbcL.fasta: Remove two sequences (“JNG3448”, “JNG3479”) that were excluded from the final analysis. nectandra_rbcL.phy: Update tree file after re-running phylogenetic analysis. nectandra_rbcL.treefile: Update tree file after re-running phylogenetic analysis. nectandra_specimens.csv: Add columns “author” (author of the species) and “var_author” (author of the variety). Change Mycopteris taxifolia (L.) Sundue to Mycopteris costaricensis (Rosenst.) Sundue. Change all instances of “TNS” in “herbaria” column to “TI”. Change value of “uncertainty” for “Polyphlebium sp1” (Nitta 123) from “aff” to nothing (NA entry). README.txt: Update README with these changes. seqids.txt: Newly added file. -------------------------- REFERENCES Ebihara, Atsushi, and Joel H. Nitta. 2019. “An Update and Reassessment of Fern and Lycophyte Diversity Data in the Japanese Archipelago.” Journal of Plant Research 132 (6): 723–38. https://doi.org/10.1007/s10265-019-01137-3. Holdridge, L. R. 1967. Life Zone Ecology. San José, Costa Rica: Tropical Science Center. Katoh, Kazutaka, Kazuharu Misawa, Keiichi Kuma, and Takashi Miyata. 2002. “MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform.” Nucleic Acids Research 30 (14): 3059–66. https://doi.org/10.1093/nar/gkf436. Kearse, Matthew, Richard Moir, Amy Wilson, Steven Stones-Havas, Matthew Cheung, Shane Sturrock, Simon Buxton, et al. 2012. “Geneious Basic: An Integrated and Extendable Desktop Software Platform for the Organization and Analysis of Sequence Data.” Bioinformatics 28 (12): 1647–9. https://doi.org/10.1093/bioinformatics/bts199. Nguyen, Lam-Tung, Heiko A. Schmidt, Arndt von Haeseler, and Bui Quang Minh. 2015. “IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies.” Molecular Biology and Evolution 32 (1): 268–74. https://doi.org/10.1093/molbev/msu300. Nitta, Joel H., Jean-Yves Meyer, Ravahere Taputuarai, and Charles C Davis. 2017. “Life Cycle Matters: DNA Barcoding Reveals Contrasting Community Structure Between Fern Sporophytes and Gametophytes.” Ecological Monographs 87 (2): 278–96. https://doi.org/10.1002/ecm.1246. Price, Morgan N., Paramvir S. Dehal, and Adam P. Arkin. 2009. “FastTree: Computing Large Minimum Evolution Trees with Profiles Instead of a Distance Matrix.” Molecular Biology and Evolution 26 (7): 1641–50. https://doi.org/10.1093/molbev/msp077. ———. 2010. “FastTree 2 - Approximately Maximum-Likelihood Trees for Large Alignments.” PLoS ONE 5 (3): e9490. https://doi.org/10.1371/journal.pone.0009490. Pteridophyte Phylogeny Group I. 2016. “A Community-Derived Classification for Extant Lycophytes and Ferns.” Journal of Systematics and Evolution 54 (6): 563–603. https://doi.org/10.1111/jse.12229. Schuettpelz, Eric, and Kathleen M Pryer. 2007. “Fern Phylogeny Inferred from 400 Leptosporangiate Species and Three Plastid Genes.” Taxon 56 (4). International Association for Plant Taxonomy: 1037–50. https://doi.org/10.2307/25065903. Thiers, Barbara. 2020. “Index Herbariorum: A Global Directory of Public Herbaria and Associated Staff.” NYBG Steere Herbarium. 2020. http://sweetgum.nybg.org/science/ih/.