The heritability of size in a wild annual plant population with hierarchical size structure
Data files
May 11, 2023 version files 125.65 MB
-
chr3.vcf
878.49 KB
-
chr6.vcf
951.79 KB
-
chr9.vcf
482.36 KB
-
Phen367.txt
4.77 KB
-
Phenotypes_of_genotyped_plants.xlsx
19.53 KB
-
Phenotypes_QuadratsA-C.xlsx
13.92 KB
-
pop367.bed
935.74 KB
-
pop367.bim
294.82 KB
-
pop367.fam
7.34 KB
-
pop367.map
630.46 KB
-
populations.snps.vcf
121.41 MB
-
README.md
15.41 KB
Jun 22, 2024 version files 407.20 MB
-
Impatiens_367_genotyped_individuals.vcf
405.14 MB
-
Impatiens_capensis_geneome_assembly_MCG3086_report.pdf
2.03 MB
-
K_means.py
1.63 KB
-
LDAK_code.txt
1.13 KB
-
Linearmodel_wt_f(ht_lf_no).py
793 B
-
Phenotypes_Random_sample_n_97.xlsx
12.02 KB
-
Plant_weights_extreme_sample.txt
8.42 KB
-
Position_map.txt
5.21 KB
-
README.md
2.99 KB
-
Snakefile
2.45 KB
-
VCFR_code.R
921 B
Jul 19, 2024 version files 1.11 GB
-
Impatiens_367_genotyped_individuals.vcf
405.14 MB
-
Impatiens_capensis_geneome_assembly_MCG3086_report.pdf
2.03 MB
-
Impatiens_capensis_hirise.fasta
704.68 MB
-
K_means.py
1.63 KB
-
LDAK_code.txt
1.13 KB
-
Linearmodel_wt_f(ht_lf_no).py
793 B
-
Phenotypes_Random_sample_n_97.xlsx
12.02 KB
-
Plant_weights_extreme_sample.txt
8.42 KB
-
Position_map.txt
5.21 KB
-
README.md
3.30 KB
-
Snakefile
2.45 KB
-
VCFR_code.R
921 B
Abstract
The relative magnitude of additive genetic versus residual variation for fitness traits is important in models for predicting the rate of evolution and population persistence in response to changes in the environment. In many annual plants, lifetime reproductive fitness is correlated with end-of-season plant biomass, which can vary significantly from plant to plant in the same population. We measured end-of-season plant biomasses and obtained SNP genotypes of plants in a dense, natural population of the annual plant species Impatiens capensis with hierarchical size structure. These data were used to estimate the amount of heritable variation for position in the size hierarchy and for plant biomass. Additive genetic variance for position in the size hierarchy and plant biomass were both significantly different from zero. These results are discussed in relationship to theory for the heritability of fitness in natural populations and ecological factors that potentially influence heritable variation for fitness in this species.
*Title of Data Set: ** Heritability of fitness in a wild annual plant population with hierarchical size structure (Note: This data set was updated on 19 June 2024, after discovering several quality control issues with the genomic data. We apologize for any inconvenience*).
This data set includes measures of plant weight (fresh biomass in g) from a sample of 97 plants collected in three quadrats in the study population at random (random sample), and from 367 plants collected in five additional quadrats where large and small plants were sampled (extreme sample).
The 367 plants in the extreme sample were genotyped using genotyping-by-sequencing, and the vcf files for these plants and their SNP genotypes are also included in this DRYAD repository. The BAM files used for constructing this vcf file along with the Impatiens capensis reference genome are deposited in GenBank under SRA accession number PRJNA945897.
Description of the data and file structure
File: *Impatiens_capensis_geneome_assembly_MCG3086_report.pdf *Hi-rise scaffolding report for reference genome.
File: Phenotypes_Random_sample_N=97.xlsx Plant height (cm), Leaf number per plant, and plant weight (fresh biomass in g) from a sample of 97 plants of Impatiens capensis (random sample) in an Excel file.
File: Plant_weights_extreme_sample.txt * Plant weight (fresh biomass in g) from a sample of 367 plants (186 small and 181 large plants: see text of article) of *Impatiens capensis (extreme sample) in a text file with 3 columns. The first two columns combined are plant identifiers and correspond to plant identifiers in the file “Impatiens_367_genotyped_individuals.vcf”
File: Impatiens_367_genotyped_individuals.vcf Genotypic data used along with data in the file “Plant_weights_extreme_sample.txt” to estimate the heritability of plant weight and position in the size hierarchy using the scripts in the file ‘LDAK.code
File: Position_map.txt Positions (quadrats) of each genotyped plant.
Code
File: Simulation_Extreme_Sampling.R R program to simulate extreme sampling to examine the potential bias in estimating heritability and its significance level. Uses data from
File: *LDAK_code *Code used with the LDAK software version 5.2 (https://dougspeed.com/ldak/) to estimate the heritability of plant weight and position in the size hierarchy.
File: K_means.py Code for K means analysis of plant weight data in random sample.
File: VCFR.py Code for quality control. See article.
File: Snakefile Code for processing raw sequence reads and producing bam files for input into STACKS. Used to automate the processing of the sequence data, including trimming adapters, creating sorted .bam files, and removing duplicates.
File: Linearmodel_wt_f(ht+lf_no).py. Used to analyze the relationship between individual plant biomass versus leaf number per plant via linear model analysis.
File: Impatiens_capensis_hirise.fasta: This file contains all assembled scaffolds for the Impatiens capensis genome assembly. Scaffolds 1 through 9, and scaffold 11 account for all but 27 of them 36,044 genotyped regions that map to the hirise assembly (with scaffolds 10 and 12 accounting for only 1 and 26, respectively).
Study population
The study population of Impatiens capensis is in Glen Sutton, Quebec, Canada (45o 02’ 37” N, 72o 32’ 57” W). The plants occur in damp soil within an irregularly shaped area of ca. 150 m2, beneath a canopy of a mixed, mature deciduous-evergreen (Acer saccharum-Tsuga canadensis) forest. I. capensis plants in this population form nearly pure stands that emerge as a near continuous carpet of seedlings on the forest floor. The density of individuals remains high (ca. 200-250 per m2) at the end of the season. Early season seedling density was at least twice as high. In I. capensis, the Pearson correlation between chasmogamous seed production and end of season biomass is r = 0.95, and between overall seed production and biomass is r = 0.92 (Waller, 1979). Small plants have been shown to produce no chasmogamous flowers or fruit at all (Waller, 1979), thus making position in the size hierarchy an interesting fitness component for study.
Impatiens capensis reference genome
From a single plant collected in the study population, 10 g of young leaves were harvested and frozen in liquid nitrogen. Genomic DNA was extracted from this tissue and sequenced by Dovetail Genomics/Cantata Bio LLC (Scotts Valley, California). For each Dovetail Omni-C library, chromatin was fixed in place with formaldehyde in the nucleus and then extracted. Fixed chromatin was digested with DNAse I, chromatin ends were repaired and ligated to a biotinylated bridge adapter followed by proximity ligation of adapter containing ends (Jordan Zhang—Dovetail Genomics, pers comm.). After proximity ligation, crosslinks were reversed, and the DNA purified. Purified DNA was treated to remove biotin that was not internal to ligated fragments, and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters (Jordan Zhang—Dovetail Genomics, pers comm.). Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The library was sequenced on an Illumina HiSeqX platform to produce approximately 30x sequence coverage.
The input de novo assembly and Dovetail OmniC library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies (Putnam et al., 2016). Dovetail OmniC library sequences were aligned to the draft input assembly using bwa (https://github.com/lh3/bwa). The separations of Dovetail OmniC read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative mis-joins, to score prospective joins, and make joins (Jordan Zhang—Dovetail Genomics, pers comm.).
Plant biomass distribution in a random sample
At the end of the 2020 growing season for Impatiens capensis (15 September), we recorded plant height and numbers of leaves from 97 plants collected at random in three, haphazardly placed 1 m2 quadrats within the study population. All 97 plants were weighed fresh to the nearest 0.01 g, and linear regression was used to establish a predictive relationship for plant biomass based on plant height and leaf number per plant (r2 = 0.72, P < 0.001).
Size (biomass) inequality was examined by ranking plants from lightest weight to heaviest and graphing cumulative biomass against rank order and calculating the Gini coefficient, a measure of inequality of resource distribution (Dorfman, 1979). K-means clustering (Pedregosa et al., 2011) of the biomasses of 97 randomly sampled plants was used to determine the level of support for distinct plant size clusters, as well as the biomass cutoff that separates the clusters.
Genotyping-by-sequencing and population genetics analyses
Leaf tissue from the small and large sampled plants was collected and preserved for DNA extraction using silica gel. DNA from samples of the preserved leaves was extracted and processed for genotyping-by-sequencing (GBS) (Elshire et al., 2011) at the University of Wisconsin Biotechnology Center. Genomic DNA was digested with the restriction enzyme ApeK1 and ligated to adapters and barcodes to create the GBS libraries. A NovaSeq6000 sequencer was used to obtain paired end (150 bp) sequence reads from the libraries. The depth of coverage was approximately 180x.
Raw reads were demultiplexed and filtered for Illumina adapter sequences and PCR duplicates with process_radtags and clone_filter (STACKS version 2.60; Rochette, Rivera-Colon and Catchen, 2019). The demultiplexed and filtered reads were then aligned to the reference genome using the Burrows-Wheeler Alignment (Li and Durbin, 2009) tool as implemented in bwakit version 0.7.12 and converted to bam files using SAMtools version 1.13 (Lin et al., 2009). SNPs with quality scores > 30 were identified with gstacks (STACKS version 2.60) and further processed with populations (STACKS version 2.60), which was used to filter out loci not found in at least 95% of individuals in each of the five quadrats sampled and where the minimum allele frequency for the less common SNP allele was < 0.01. The STACKS populations program was used to calculate population genetics statistics (nucleotide diversity and Wright’s Fst), for reporting the results of SNP filtering, and to produce a vcf file of SNP genotypes for each sampled individual. Prior to the estimation of the genomic relationship matrix (GRM; see below) we applied a vcfR (version 1.12; Knaus & Grunwald, 2001) and SNPfiltR (version 1.01; DeRaad, 2022) R version 4.1.1 (R Core Team, 2021) to filter out SNPs with low read depths (< 7). To avoid spurious associations that might inflate relationship estimates of the GRM, SNPs with r2 (squared correlation coefficient between the alleles at two loci) > 0.05 and within 1000 kb windows were filtered out with LDAK (version 5.2; https://dougspeed.com/ldak; Speed et al., 2020), which used bed genotype files as input, created from the original vcf file using Plink (version 1.9, Chang et al., 2015).
Analyses of genetic variance and heritable variation for position in the size hierarchy and plant biomass
We treated size either as a threshold trait and analyzed it on a liability scale (small plants versus large plants) as is done in some agricultural genetic studies for traits such as secondary compound content, and disease resistance (Merrick et al., 2023), or we directly assayed biomass. We used four different methods to estimate heritability of plant size. These are implemented in the LDAK. The first two methods, Phenotype Correlation–Genotype Correlation (PCGC) and TetraHer, use a liability threshold model, such that the binary outcome (small versus large) indicate whether the unobserved liability is above or below a threshold. Both PCGC and TetraHer estimate heritability on the liability scale by measuring the extent that the estimated relatedness between pairs of individuals correlates with their estimated liabilities. PCGC considers all pairs of individuals, and measures pairwise relatedness based on allelic correlations (inferred from the GRM). By contrast, TetraHer considered only the 3,606 pairs of individuals identified as having at least 17.5% (IBD) by descent using the kinship analysis KING software (Manichaikul et al., 2010). Both methods adjust for ascertainment (i.e., the fact that our sample was enriched for large plants, relative to the natural population). Prevalence of large plants was determined as the proportion of large plants observed in the random sample of 97 plants, as classified by K-means clustering—see above). We additionally obtained estimates of heritability on the observed scale (a continuous variable). For this we used restricted maximum likelihood (REML) and QuantHer, which are similar to PCGC and TetraHer.
We note that sampling weights from the smallest and largest plants for REML and PCGC constitutes “extreme sampling”, but this has been shown previously to have a minimal biasing effect on heritability estimation (Golan et al., 2014). Nevertheless, we also conducted our own simulation analyses to gauge the possible effects of sampling protocol on our own results (Supplementary Methods for details).
References
Anderson, J. T. (2016). Plant fitness in a rapidly changing world. New Phytologist 210:81-87.
Bérénos, C., Ellis, P. A., Pilkington, J. G., & Pemberton, J. M. (2014). Estimating quantitative genetic parameters in wild populations: A comparison of pedigree and genomic approaches. Molecular Ecology, 23(14), 3434-3451.
Bontemps, A., Lefèvre, F., Davi, H., & Oddou‐Muratorio, S. (2016). In situ marker‐based assessment of leaf trait evolutionary potential in a marginal European beech population. Journal of Evolutionary Biology, 29(3), 514-527.
Burt, A. (1995). The evolution of fitness. Evolution, 49(1), 1-8.
Castellanos, M. C., González‐Martínez, S. C., & Pausas, J. G. (2015). Field heritability of a plant adaptation to fire in heterogeneous landscapes. Molecular Ecology, 24(22), 5633-5642.
Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 4(1), s13742-015.
Chen, J., Glémin, S., & Lascoux, M. (2017). Genetic diversity and the efficacy of purifying selection across plant and animal species. Molecular biology and evolution, 34(6), 1417-1428.
Day, P. D., Pellicer, J., & Kynast, R. G. (2012). Orange balsam (Impatiens capensis Meerb., Balsaminaceae): A re-evaluation by chromosome number and genome size. The Journal of the Torrey Botanical Society, 139(1), 26-33.
DeRaad, D. A. (2022). SNPfiltR: An R package for interactive and reproducible SNP filtering. Molecular Ecology Resources, 22(6), 2443-2453.
Donohue, K. (2002). Germination timing influences natural selection on life‐history characters in Arabidopsis thaliana. Ecology, 83(4), 1006-1016.
Dorfman, R. (1979). A formula for the Gini coefficient. The Review of Economics and Statistics, 61, 146-149.
Elshire, R. J., Glaubitz, J. C., Sun, Q., Poland, J. A., Kawamoto, K., Buckler, E. S., & Mitchell, S. E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS One, 6(5), e19379.
Fisher R.A. (1930). The genetical theory of natural selection. Clarendon Press, UK.
Gienapp, P., Fior, S., Guillaume, F., Lasky, J. R., Sork, V. L., & Csilléry, K. (2017). Genomic quantitative genetics to study evolution in the wild. Trends in Ecology & Evolution, 32(12), 897-908.
Golan, D., Lander, E. S., & Rosset, S. (2014). Measuring missing heritability: Inferring the contribution of common variants. Proceedings of the National Academy of Sciences, 111(49), E5272-E5281.
Gomulkiewicz, R., & Holt, R. D. (1995). When does evolution by natural selection prevent extinction? Evolution, 49, 201-207.
Harper, J. L. (1977). Population biology of plants. New York, NY: Academic Press.
Hill, W. G., & Robertson, A. (1968). Linkage disequilibrium in finite populations. Theoretical and Applied Genetics, 38, 226-231.
Hill, W. G., & Zhang, X. S. (2009). Maintaining genetic variation in fitness. In J. van der Werf, H. U. Graser, R. Frankham, & C. Gondro (Eds.), Adaptation and Fitness in Animal Populations (pp. 67-85). Dordrecht: Springer. https://doi.org/10.1007/978-1-4020-9005-9_5
Hohenlohe, P. A., Bassham, S., Etter, P. D., Stiffler, N., Johnson, E. A., & Cresko, W. A. (2010). Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genetics, 6(2), e1000862.
Johnston, S. E., Chen, N., & Josephs, E. B. (2022). Taking quantitative genomics into the wild. Proceedings of the Royal Society B, 289, 20221930.
Knaus, B. J., & Grünwald, N. J. (2017). vcfr: A package to manipulate and visualize variant call format data in R. Molecular Ecology Resources, 17(1), 44-53.
Kruuk, L. E., Clutton-Brock, T. H., Slate, J., Pemberton, J. M., Brotherstone, S., & Guinness, F. E. (2000). Heritability of fitness in a wild mammal population. Proceedings of the National Academy of Sciences, 97(2), 698-703.
Kulbaba, M. W., Sheth, S. N., Pain, R. E., Eckhart, V. M., & Shaw, R. G. (2019). Additive genetic variance for lifetime fitness and the capacity for adaptation in an annual plant. Evolution, 73(9), 1746-1758.
Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14), 1754-1760.
Mackay, T. F., & Lyman, R. F. (2005). Drosophila bristles and the nature of quantitative genetic variation. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1459), 1513-1527.
Merilä, J., & Sheldon, B. C. (1999). Genetic architecture of fitness and nonfitness traits: Empirical patterns and development of ideas. Heredity, 83, 103-109.
Mitchell-Olds, T. (1986). Quantitative genetics of survival and growth in Impatiens capensis. Evolution, 40(1), 107-116.
Mitchell-Olds, T., & Bergelson, J. (1990). Statistical genetics of an annual plant, Impatiens capensis. II. Genetic basis of quantitative variation. Genetics, 124(2), 407-415.
Mousseau, T. A., & Roff, D. A. (1987). Natural selection and the heritability of fitness components. Heredity, 59(2), 181-197.
Peschel, A. R., Boehm, E. L., & Shaw, R. G. (2021). Estimating the capacity of Chamaecrista fasciculata for adaptation to change in precipitation. Evolution, 75(1), 73-85.
Peschel, A. R., & Shaw, R. G. (2024). Comparing the predicted versus realized rate of adaptation of Chamaecrista fasciculata to climate change. The American Naturalist, 203(1), 14-27.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, et al. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Putnam, N. H., O'Connell, B. L., Stites, J. C., Rice, B. J., Blanchette, M., Calef, R., & Green, R. E. (2016). Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Research, 26(3), 342-350.
Price, T., & Schluter, D. (1991). On the low heritability of life‐history traits. Evolution, 45(4), 853-861.
R Core Team (2021). RL A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. .
Rochette, N. C., Rivera‐Colón, A. G., & Catchen, J. M. (2019). Stacks 2: Analytical methods for paired‐end sequencing improve RADseq‐based population genomics. Molecular ecology, 28(21), 4737-4754.
Schmitt, J., Eccleston, J., & Ehrhardt, D. W. (1987). Dominance and suppression, size-dependent growth and self-thinning in a natural Impatiens capensis population. The Journal of Ecology, 75, 651-665.
Schwaegerle, K. E., & Levin, D. A. (1991). Quantitative genetics of fitness traits in a wild population of Phlox. Evolution, 45(1), 169-177.
Schwinning, S., & Weiner, J. (1998). Mechanisms determining the degree of size asymmetry in competition among plants. Oecologia, 113, 447-455.
Shaw, R. G., Platenkamp, G. A., Shaw, F. H., & Podolsky, R. H. (1995). Quantitative genetics of response to competitors in Nemophila menziesii: a field experiment. Genetics, 139(1), 397-406.
Shaw, R. G., & Etterson, J. R. (2012). Rapid climate change and the rate of adaptation: insight from experimental quantitative genetics. New Phytologist, 195(4), 752-765.
Shaw, R. G., & Shaw, F. H. (2014). Quantitative genetic study of the adaptive process. Heredity, 112(1), 13-20.
Sheth, S. N., Kulbaba, M. W., Pain, R. E., & Shaw, R. G. (2018). Expression of additive genetic variance for fitness in a population of partridge pea in two field sites. Evolution, 72(11), 2537-2545.
Seppey, M., Manni, M., & Zdobnov, E. M. (2019). BUSCO: assessing genome assembly and annotation completeness. Gene prediction: methods and protocols, 227-245.
Silvertown, J., & Charlesworth, D. (2009). Introduction to plant population biology. John Wiley & Sons.
Speed, D., Holmes, J., & Balding, D. J. (2020). Evaluating and improving heritability models using summary statistics. Nature Genetics, 52(4), 458-462.
Stanton‐Geddes, J., Yoder, J. B., Briskine, R., Young, N. D., & Tiffin, P. (2013). Estimating heritability using genomic data. Methods in Ecology and Evolution, 4(12), 1151-1158.
Stevens, L., Goodnight, C. J., & Kalisz, S. (1995). Multilevel selection in natural populations of Impatiens capensis. The American Naturalist, 145(4), 513-526.
Thomas, S. C., & Bazzaz, F. A. (1993). The genetic component in plant size hier-archies: norms of reaction to density in a Polygonum species. Ecological Monographs, 63(3), 231-249.
Turner, M. D., & Rabinowitz, D. (1983). Factors affecting frequency distributions of plant mass: the absence of dominance and suppression in competing monocultures of Festuca paradoxa. Ecology, 64(3), 469-475.
Waller, D. M. (1985). The genesis of size hierarchies in seedling populations of Impatiens capensis Meerb. New Phytologist, 100(2), 243-260.
Weiner, J. (1985). Size hierarchies in experimental populations of annual plants. Ecology, 66(3), 743-752.
Weiner, J. A. (1988) The influence of competition on plant reproduction” In J. L. Doust & J. D. Doust (Eds.), Plant reproductive ecology: patterns and strategies (pp. 228-245). Oxford Univ. Press, NY.
Weiner, J. (1990). Asymmetric competition in plant populations. Trends in Ecology & Evolution, 5(11), 360-364.
Yang, J., Lee, S. H., Goddard, M. E., & Visscher, P. M. (2011). GCTA: a tool for genome-wide complex trait analysis. The American Journal of Human Genetics, 88(1), 76-82.
LDAK v. 5.2 package of programs (https://dougspeed.com/ldak/).
STACKS version 2.60
bwakit v. 0.7.12 a
SAMtools v. 1.13
R version 4.1.1
vcfR v. 1.12 (an R package)
SNPfiltR v. 1.01 (an R package)
Plink v. 1.9