Data from: Genome-wide diversity in lowland and highland maize landraces from southern South America: Population genetics insights to assist conservation
Data files
Nov 29, 2024 version files 64.68 MB
-
Complementary_tables_VCF_files_Dominguez_et_al.zip
64.68 MB
-
README.md
4.56 KB
Abstract
Maize (Zea mays ssp. mays L.) landraces are traditional American crops with high genetic variability that conform a source of original alleles for conventional maize breeding. Northern Argentina, one the southernmost regions of traditional maize cultivation in the Americas, harbours around 57 races traditionally grown in two regions with contrasting environmental conditions, namely the Andean mountains in the Northwest and the tropical grasslands and Atlantic Forest in the Northeast. These races encounter diverse threats to their genetic diversity and persistence in their regions of origin, with climate change standing out as one of the major challenges. In this work, we use genome-wide SNPs derived from ddRADseq to study the genetic diversity of individuals representing the five groups previously described for this area. This allowed us to distinguish two clearly differentiated gene pools, the Highland Northwestern maize (HNWA) and the Floury Northeastern maize (FNEA). Subsequently, we employed Essential Biodiversity Variables at the genetic level, as proposed by the Group on Earth Observations Biodiversity Observation Network (GEO BON), to evaluate the conservation status of these two groups. This assessment encompassed genetic diversity (Pi), inbreeding coefficient (F), and effective population size (Ne). FNEA showed low Ne values and high F values, while HNWA showed low Ne values and low Pi values, indicating that further genetic erosion is imminent for these landraces. Outlier detection methods allowed identification of putative adaptive genomic regions, consistent with previously reported flowering-time loci and chromosomal regions displaying introgression from the teosinte Zea mays ssp. mexicana. Finally, species distribution models were obtained for two future climate scenarios, showing a notable reduction in the potential planting area of HNWA and a shift in the cultivation areas of FNEA. These results suggest that maize landraces from Northern Argentina may be unable to cope with climate change. Therefore, active conservation policies are advisable.
https://doi.org/10.5061/dryad.5dv41nsg7
Description of the data and file structure
This dataset comprises four VCF files derived from ddRADseq data generated and analysed in Dominguez et al. Metadata of samples are included in Supplementary table 1 of Dominguez et al. (also available here). Raw data for this study are available at the Sequence Read Archive (SRA), PRJNA1073562.
Files and variables
File: Complementary_tables_VCF_files_Dominguez_et_al.zip
Description:
Complementary table A. Unfiltered VCF file obtained with Stacks v1.42 (Catchen et al., 2013). The parameters used were: -m 3 (minimum depth of coverage), -M 2 (distance allowed between stacks), -n 3 (distance allowed between catalog loci).
Complementary table B. Filtered VCF file. Filtering was performed with VCFtools (Danecek et al., 2011). The parameters used were: a maximum proportion of missing data of 35% (--max-missing 0.65); a minimum number of times that an allele appears over all individuals at a given site equal to 4 (--mac 4); a mean depth value greater than or equal to 8 per individual (--minDP 8); a minimum distance between sites equal to 200 bp (--thin200).
Complementary table C. Filtered and imputed VCF file. Imputation was carried out with Beagle (Browning et al., 2018).
Complementary table D. Filtered, imputed, and annotated VCF file. Annotation was performed with SnpEff (Cingolani et al., 2012).
Supplementary table 1. Data of individuals sequenced by ddRADseq. A priori classification was based on Lia et al. (2009), Bracco et al. (2016), López et al. (2021) and Rivas et al. (2022). Individuals unequivocally assigned to the FNEA and HNWA genetic clusters by STRUCTURE and DAPC methods (membership coefficients or assignment probabilities > 0.75) (Figure 2C and D) are marked in orange and green, respectively. FNEA: Floury maize of Northeastern Argentina. PNEA: Popcorn of Northeastern Argentina. HNWA: Highland maize of Northwestern Argentina. LNWA: Lowland maize of Western Argentina. PNWA: Popcorn of Northwestern Argentina. VAV: ID of the “N.I. Vavilov” Plant Genetic Resource Laboratory, Faculty of Agronomy, University of Buenos Aires. ARZM: ID of the “Banco Activo de Germoplasma INTA Pergamino”. Coordinates are provided in decimal degrees. n/a=not available.
References:
Bracco, M., Cascales, J., Hernández, J.C., Poggio, L., Gottlieb, A.M., & Lia, V.V. (2016). Dissecting maize diversity in lowland South America: Genetic structure and geographic distribution models. BMC Plant Biology, 16(1), 186. doi: 10.1186/s12870-016-0874-5.
Browning, B.L., Zhou, Y., & Browning, S.R. (2018). A One-Penny Imputed Genome from Next-Generation Reference Panels. American journal of human genetics, 103, 338–348.
Catchen, J., Hohenlohe, P.A., Bassham, S., Amores, A., & Cresko, W.A. (2013). Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124–3140.
Cingolani, P., Platts, A., Wang, L.L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., & Ruden, D.M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6, 80–92.
Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T., McVean, G., & Durbin, R. (2011). The variant call format and VCFtools. Bioinformatics, 27, 2156–2158.
Lia, V.V., Poggio, L., & Confalonieri, V.A. (2009). Microsatellite variation in maize landraces from Northwestern Argentina: Genetic diversity, population structure and racial affiliations. Theoretical and Applied Genetics, 119, 1053–1067.
López, M.G., Fass, M., Rivas, J.G., Carbonell-Caballero, J., Vera, P., Puebla, A., Defacio, R., Dopazo, J., Paniego, N., Hopp, H.E., & Lia, V.V. (2021). Plastome genomics in South American maize landraces: chloroplast lineages parallel the geographic structuring of nuclear gene pools. Annals of Botany, 128(1), 115-125.
Rivas, J.G., Gutiérrez, A.V., Defacio, R.A., Schimpf, J., Vicario, A.L., Hopp, H.E., Paniego, N.B., & Lia, V.V. (2022). Morphological and genetic diversity of maize landraces along an altitudinal gradient in the Southern Andes. Plos ONE, 17(12), e0271424. doi: 10.1371/journal.pone.0271424.
This dataset includes four VCF files generated from ddRADseq data analysed in Dominguez et al. The study sequenced 87 maize (Zea mays ssp. mays L.) individuals representing various genetic and morphological landrace groups from the Northeast and Northwest regions of Argentina. The raw sequencing data are accessible at the Sequence Read Archive (SRA) under the project ID PRJNA1073562.
SNP calling was conducted using Stacks v1.42 (Catchen et al., 2013), resulting in Complementary Table A. Reads from each sample were aligned to the maize B73 reference genome (version V4) available at MaizeGDB (https://www.maizegdb.org/genome/assembly/Zm-B73-REFERENCE-GRAMENE-4.0) using Bowtie 2 (Langmead et al., 2012). The initial VCF file was filtered with VCFTools (Danecek et al., 2011), producing Complementary File B.
The filtered VCF file was subsequently imputed with Beagle (Browning et al., 2018), generating Complementary File C, and annotated with SnpEff (Cingolani et al., 2012), resulting in Complementary File D.
