The role of cytonuclear interactions in plant adaptation across a Populus hybrid zone
Data files
Sep 23, 2025 version files 126.13 MB
-
Code_Populus_CNI.zip
126.12 MB
-
README.md
9.06 KB
Abstract
Co-adaptation of cytoplasmic and nuclear genomes is critical to physiological function for many species. Despite this understanding, hybridization can disrupt co-adaptation, leading to a mismatch between maternally-inherited cytoplasmic genomes and biparentally inherited nuclear genomes. Few studies have examined the consequences of cytonuclear interactions on physiological function across environments. Here, we quantify the degree of co-introgression between chloroplast and nuclear-chloroplast (N-cp) genes across repeated hybrid zones and their consequences to physiological function across environments. We use whole-genome resequencing and common garden experiments with clonally replicated genotypes sampled across the natural hybrid zone between Populus trichocarpa and P. balsamifera. We use geographic clines to test for co-introgression of the chloroplast genome with N-cp and non-interacting nuclear genes. Co-introgression of chloroplast and N-cp genes was limited, although contact zone-specific patterns suggest that local environments may influence co-introgression. Combining ancestry estimates with phenotypic data across common gardens revealed that mismatches between chloroplast and nuclear ancestry can influence physiological performance, but the strength and direction of these effects vary depending on the environment. Overall, this study highlights the importance of cytonuclear interactions to adaptation and the role of the environment in modifying the effect of those interactions.
The role of cytonuclear interactions in plant adaptation across a Populus hybrid zone
This repository contains the code, data, and analyses used in the study of how chloroplast and nuclear genome interactions contribute to adaptation across Populus hybrid zones. It includes chloroplast genome assembly, phylogenetic inference, environmental association tests, geographic cline analyses, and trait modeling in common garden experiments.
--------------------------------------------------------------------------------
Code_Populus_CNI.zip
1. Chloroplast Genome Assembly and Phylogeny
1.1 Assembly (NOVOPlasty)
Folder: 1.1_Assembly_NOVO_Plasty/ – Input files for NOVOPlasty assembly.
- config_poplar – Configuration file with assembly parameters:
- Type: chloro
- Genome range: 140,000–170,000 bp
- k-mer: 33
- Variance detection: Enabled
- Sequencing: Illumina PE, 151 bp reads, 300 bp insert, PCR-free
rbcL_seed.fasta – FASTA file used to initiate assembly; region NC_009143.1:55716–57143 from P. trichocarpa chloroplast genome.
P_trichocarpa_plastid.fasta – NC_009143.1 P. trichocarpa complete chloroplast genome used as reference.
Sequencing data:
Genomic libraries were sequenced on an S4 flow cell in 2x150bp format on an Illumina NovaSeq 6000 instrument with 64 samples per lane
--------------------------------------------------------------------------------
1.2 Phylogeny
Folder: 1.2_Phylogeny/ – Alignment files, scripts, and results for chloroplast genome phylogenetic inference.
Software versions:
-
MAFFT v7.481
-
IQ-TREE v1.0
Files:
-
Phylogeny_ALL.fasta – Chloroplast genome sequences for all genotypes.
-
Phylogeny_ALL_outputmafft.fasta – MAFFT-aligned chloroplast genomes.
-
Phylogeny_ALL_outputmafft2.fasta – Same alignment as above, with Salix group labeled as outgroup.
-
Phylogeny_ALL_outputmafft.fasta.contree – Consensus tree from IQ-TREE analysis.
-
iqtree.sh – Shell script for IQ-TREE:
\```bash
iqtree -s ALL_samples_Outgroups/Phylogeny_ALL_outputmafft2.fasta \
-m TVM+F+R2 \
-bb 1000 \
-redo
\```
\- `-s` – Input alignment file
\- `-m TVM+F+R2` – Substitution model
\- `-bb 1000` – Ultrafast bootstrap (1000 replicates)
\- `-redo` – Overwrite previous runs
--------------------------------------------------------------------------------
1.3 Fixed Differences
Folder: 1.3_Fixed_differences/ – Identifies fixed nucleotide and amino acid differences across chloroplast haplotypes.
Software: R v4.3.1
Packages:
-
Biostrings
-
seqinr
-
dplyr
-
GenomicRanges
-
rtracklayer
-
readr
Files:
-
Fixed_differences.R – Estimates fixed differences from alignment.
-
Poplar_540_plus_Reference.fasta – Alignment of all chloroplast genomes with reference sequence.
--------------------------------------------------------------------------------
2. Environmental Influence on Chloroplast Ancestry
Folder: 2_Environmental_influence_chloroplast_ancestry/ – Tests whether climate predicts chloroplast ancestry.
Software: R v4.3.1
Packages:
-
ggplot2
-
dplyr
-
logistf
-
gridExtra
Files:
-
Binomial_regressions.R – Logistic regression: chloroplast ancestry ~ climate variables.
-
Climate_chloroplast.xlsx – Environmental, geographic, and ancestry variables per genotype:
-
MAT – Mean Annual Temperature (°C)
-
TD – Continentality (°C)
-
MAP – Mean Annual Precipitation (mm)
-
PAS – Precipitation as Snow (mm)
-
CMD – Climatic Moisture Deficit (mm)
-
RH – Relative Humidity (%)
-
Chlorotype – Chloroplast haplotype (0 = P. balsamifera, 1 = P. trichocarpa)
-
Transect – Contact zone identifier
-
Distance_along_transect – Distance from reference point along transect (km)
-
Latitude, Longitude – Coordinates
-
P_trichocarpa_Ancestry – Proportion of P. trichocarpa nuclear ancestry
--------------------------------------------------------------------------------
3. Geographic Clines
3.1 Local Ancestry Inference
Folder: 3.1_Local_Ancestry_inference/ – Scripts for estimating local ancestry per transect using whole genome resequencing data and LOTER.
Software:
-
vcftools v0.1.17
-
LOTER v1.0 (CLI)
-
datamash v1.8
-
bash (GNU)
Files:
-
3.1.1_parsing_vcf_files.sh – Extracts reference individuals, SNP positions, and per-individual VCFs for admixed genotypes.
-
3.1.2_LOTER.sh – Runs LOTER for each admixed individual and chromosome.
-
3.1.3_diploid_matrix.sh – Converts LOTER outputs to 0/1/2 diploid matrices per chromosome.
-
3.1.4_combined_CHRs.sh – Combines per-chromosome matrices into a genome-wide file.
-
Admixed_IDs_all_transects, Balsam_IDs_all_transects, Tricho_IDs_all_transects – ID lists for each ancestry group.
--------------------------------------------------------------------------------
3.2 Estimating Ancestry per Gene
Folder: 3.2_Estimating_Ancestry_genes/ – Extracts ancestry for nuclear–chloroplast (N-cp) and non-interacting (non-N-cp) genes
Software: R v4.3.1
Packages:
-
dplyr
-
data.table
Files:
-
Ancestry_N_cp_genes.R – Estimates ancestry for N-cp genes.
-
Ancestry_non_interac_genes.R – Estimates ancestry for non-N-cp genes.
--------------------------------------------------------------------------------
3.3 Geographic Cline Modeling (HZAR)
Folder: 3.3_HZAR/ – Geographic cline modeling with HZAR.
Software: R v4.3.1
Packages:
-
MCMCpack
-
hzar
-
dplyr
-
parallel
Files:
-
3.3.1_Geo_clines_N_cps.R – Fits clines for N-cp genes.
-
3.3.2_Geo_clines_non_n_cp_genes.R – Fits clines for non-N-cp genes.
-
3.3.3_Mean_cline_param_non_n_cp.R – Calculates mean cline parameters for non-N-cp genes.
-
Data_frame_Ncp_clines_HZAR.csv – Geographic and ancestry data for N-cp genes: ID, Transect, Longitude, Latitude, Plasid_ID (0 = P. balsamifera, 1 = P. trichocarpa), Distance_along_transect, ancestry columns for each N-cp gene (0= P. balsamifera homozygous, 0.5 heterozygous, 1 = P. trichocarpa homozygous).
-
Data_frame_non_n_cp_genes_clines_HZAR.csv – Geographic and ancestry data for non-N-cp genes: ID, Transect, Longitude, Latitude, Plasid_ID, Distance_along_transect, ancestry columns for each non-N-cp gene (0= P. balsamifera homozygous, 0.5 heterozygous, 1 = P. trichocarpa homozygous).
--------------------------------------------------------------------------------
3.4 Co-introgression Test
Folder: 3.4_Co_introgression_test/ – Tests if N-cp genes have cline parameters more similar to the chloroplast cline.
Software: R v4.3.1
Packages:
-
MCMCpack
-
hzar
-
dplyr
-
parallel
Files:
-
3.4.1_Co_introgression.R – Performs co-introgression analysis.
-
df_for_clines_genome_wide_chloroplast.xlsx – Summary for all gene clines: ID, Transect, TD, Distance_along_transect, non.N.cp_genes_mean_ancestry, Chlorotype (0 = P. balsamifera, 1 = P. trichocarpa).
--------------------------------------------------------------------------------
Additional file:
- HaversineFormula.R – Computes pairwise geographic distances between sampling sites.
--------------------------------------------------------------------------------
4. Phenotypic Data – Linear Mixed Models
Folder: 4_Phenotypic_data_LMM/ – Mixed-model analysis of phenotypic traits in common gardens.
Software: R v4.3.1
Packages:
-
ggplot2
-
lme4
-
dplyr
-
sjPlot
Files:
-
4.1_Linear_Mixed_Models.R – Fits linear mixed-effects models for phenotypic traits across gardens.
-
4.2_Heritability.R – Estimates broad-sense heritability of traits using linear mixed models.
-
Phenotypic_Data_CNI.xlsx – Trait, ancestry, and collection metadata.
Columns:
- PLANT_ID – Individual plant identifier
- Transect_SL – Sampling contact zone name
- GARDEN – Common garden site
- BLOCK, mBlock – Garden block identifiers
- MAT – Mean Annual Temperature (°C)
- MAP – Mean Annual Precipitation (mm)
- TD – Continentality (°C)
- RH – Relative Humidity (%)
- CMD – Climatic Moisture Deficit (mm)
- PAS – Precipitation as Snow (mm)
- Nuclear_ancestry – Proportion of P. trichocarpa nuclear ancestry
- Chloroplast_Ancestry – Chloroplast haplotype (0 = P. balsamifera, 1 = P. trichocarpa)
- gsw – Stomatal conductance (mmol m⁻² s⁻¹)
- PS2 – Photosystem II quantum efficiency
- ETR – Electron transport rate (µmol m⁻² s⁻¹)
- Fs – Steady-state fluorescence
- Fm – Maximum fluorescence
- d13C – Carbon isotope discrimination (‰)
- N – Leaf nitrogen content (%)
