Genomic diversity and population structure of teosintes (Zea spp.) and its conservation implications
Data files
Sep 20, 2023 version files 522.53 MB
-
meta_T3604_33929_all.txt
-
README.md
-
T3604_33929_all.bed
-
T3604_33929_all.bim
-
T3604_33929_all.fam
-
T3604_33929_all.map
-
T3604_33929_all.ped
Abstract
The wild species of the genus Zea commonly named teosintes, comprise nine different taxa, distributed from northern Mexico to Costa Rica. Although this genus of plants has been extensively studied from a morphological, ecogeographical and genetic point of view, most contributions have been limited to the study of a few populations and taxa. To understand the great variability that exists between and within teosinte species, it is necessary to include the vast majority of known populations. In this context, the objective of this work was to evaluate the diversity and genomic structure of 276 teosinte populations. Molecular analyzes were performed with 3,604 plants and with data from 33,929 SNPs. The levels of genetic diversity by taxonomic group show a marked difference between species, races and sections, where the highest values of genomic diversity were found in ssp. parviglumis and ssp. mexicana. The lower values were obtained for the Luxuriantes section as well as ssp. huehuetenagensis of the section Zea. The results of the structure show that there is a great genetic differentiation in all the taxonomic groups considered. For ssp. parviglumis and mexicana, which are the taxa with the largest number of populations, a marked genomic differentiation was found that is consistent with their geographic distribution patterns. These results showed a loss of diversity in several teosinte populations, making a strong case for further collection, and ex situ and in situ conservation. Also, this study highlights the importance of integrating genomic diversity and structure for the applications of conservation and management.
README: Genomic diversity and population structure of teosintes (Zea spp.) and its conservation implications
https://doi.org/10.5061/dryad.2547d7wxp
The Mexican Agreement on the Determination of Maize's Centers of Origin and Diversity states that research should be carried on to characterize and monitor maize wild relatives’ genetic diversity at the population level. The Teosintes Monitoring Program (https://biodiversidad.gob.mx/genes/monitoreo-teocintles) was conceived to fulfill this task, by focusing on the closest wild relatives of maize commonly known as teosintes. This data presented here is part of the Teosintes Monitoring Program.
Plant material for this study was obtained from 276 teosinte populations representing each of the known Zea species and subspecies (except Zea vespertilio, which was recently described and for which no seed samples were available for the present study,) and their races, throughout their entire geographical distribution from northern Mexico to western Nicaragua. The accessions were provided by Instituto de Manejo y Aprovechamiento de los Recursos Fitogenétios (IMAREFI) of the Centro Universitario de Ciencias Biológicas y Agropecuarias (CUCBA) of the Universidad de Guadalajara, Jalisco, Mexico and International Maize and Wheat Improvement Center (CIMMYT). Plants were grown from seeds in greenhouse conditions at CUCBA, Jalisco, Mexico during 2014 and 2015.
Description of the data and file structure
File contents are as follows:
T3604_33929_all* (plink files .bed, .bim, .fam, .ped, .map):
Genotyping data in Plink format (.bed, .bim, .fam, .ped, .map)
Individual sample name (e.g. "BIXC_100_10", second column of the .fam file) correspond to the column "Sample_name" in the metadata file (meta_T3604_33929_all.txt). The sample name is composed of a short alphabetical code for each sampling population (accession) followed by the DNA sample name, separated by "_".
Family ids (first column of the .fam file) correspond to the epithet of the species or subspecies name (e.g. "parviglumis" for Zea mays ssp. parviglumis, "perennis" for "Zea perennis").
meta_T3604_33929_all.txt
Metadata of each individual sample.
Column names as follows:
- DNASample_code: sample name of the DNA sample used in wetlab
- Library_plate: id of the library plate where the sample was sequenced
- POB_CODE: Short id of the sampling locality (single accesion) from where the sample originated
- POB_NUMBER: Numeric id of the sampling locality (single accesion) from where the sample originated
- Accesion: Accession id at the Seed Bank
- Race: Race for Zea mays ssp. parviglumis and Zea mays ssp. mexicana
- Taxon: Species or subspecies
- Locality: Locality description where the accession was sampled
- Municipality: Municipality where the accession was sampled
- State: State where the accession was sampled
- Country: Country where the accession was sampled
- Altitude: Altitude where the accession was sampled, in meters above sea level
- Latitude: Latitude where the accession was sampled, in decimal degrees
- Longitude: Lontigitude where the accession was sampled, in decimal degrees
- Sampling_date: Year when the accession was sampled
- Sample_name: Sample ID for each of the individual plants that were sequenced, composed by the POB_CODE and the DNASample_code, separated by "_".
Sharing/Access information
This study is part of the Teosintes Monitoring Program (https://biodiversidad.gob.mx/genes/monitoreo-teocintles). Creating and making this data available was conceived to monitor, manage and conserve teosintes genetic diversity.
Code/Software
The sequence data and the genotypic database of SNPs were processed in the Tassel-5-GBS Production Pipeline software, using as reference draft ZeaGBSv2.7. Further filtering was done with Plink 1.9.
Methods
Plant material for this study was obtained from 276 teosinte populations representing each of the known Zea species and subspecies (except Zea vespertilio, which was recently described and for which no seed samples were available for the present study,) and their races, throughout their entire geographical distribution from northern Mexico to western Nicaragua. The accessions were provided by Instituto de Manejo y Aprovechamiento de los Recursos Fitogenétios (IMAREFI) of the Centro Universitario de Ciencias Biológicas y Agropecuarias (CUCBA) of the Universidad de Guadalajara, Jalisco, Mexico and International Maize and Wheat Improvement Center (CIMMYT). The number of individual plants per population was 30 for 20 type populations and 15 for the rest (256 populations). Plants were grown from seeds in greenhouse conditions at CUCBA, Jalisco, Mexico during 2014 and 2015. The work of molecular biology was carried out by the Laboratorio de Genética de la Conservación at Jardín Botánico of Instituto de Biología, Universidad Nacional Autónoma de México (UNAM).
Library preparation and sequencing for Genotyping-By-Sequencing (GBS) was performed at the Institute for Genomic Diversity (Cornell University, Ithaca, NY, USA) following a GBS protocol. DNA was digested with the ApeKI methylation-sensitive 5 base-pair (bp) recognition site restriction enzyme. The resulting fragments were ligated to Illumina HiSeq 2500 sequencing adapters and to adapters with sequence barcodes unique to each individual sample. GBS libraries were made in 96-sample plates (96-plex with 95 samples and one empty random cell). The sequence data and the genotypic database of SNPs were processed in the Tassel-5-GBS Production Pipeline software. Using as reference draft ZeaGBSv2.7 Production (TOPM Tags On Physical Map); which contains genotypes from a collection of more than 60,000 maize samples. A total of 955,690 SNPs distributed throughout the genome were called, of which 955,120 mapped to chromosomes 1–10, and 570 did not map to any chromosome. These first SNP data were subsequently filtered in Tassel by: (1) number of reads (Set Low Depth Genos to Missing, with a minimum value of 2); (2) frequency of the minor allele of at least 5% (MAF> 0.05) and; (3) loci present in at least 60% of the individuals. The resulting data was of 136,212 SNPs, which went to another filtering stage with Plink 1.9, using the following criteria: keep only SNPs under linkage equilibrium and loci present in at least 80% of the individuals (–indep-pairwise 50 10 0.2 --gene 0.2). Quality control for teosinte individuals excluded duplicated individuals and individuals with the highest missing data. The final data used for downstream analyses and presented here in plink format included 33,929 SNPs of 3,604 teosinte plants.