Characterizing a lethal mitonuclear incompatibility in naturally hybridizing Xiphophorus swordtails
Data files
Nov 20, 2023 version files 15.12 GB
-
ACUA_TLMC_CHAF.tar.gz
-
admixem_simulation_files.tar.gz
-
Ancestryinfer_files.tar.gz
-
ASE_F1s.tar.gz
-
CALL_admixmap.tar.gz
-
CALL_mother_embryo.tar.gz
-
CHPL.tar.gz
-
codeml.tar.gz
-
Data_Flowcytometry_xipho.csv
-
F2_embryo_morphometrics_respirometry.tar.gz
-
F2_NDUFA13_histology_slides.targ.gz
-
GREMLIN.tar.gz
-
mitoqpcr_ND1_Nup43_diluted.liver.DNA.csv
-
new_F2_juveniles.tar.gz
-
Oroboros.tar.gz
-
parental_species_embryo_staging.xlsx
-
phylogenetics.tar.gz
-
PRM_percentmalinche_WLLPQSGEQPYK.csv
-
RaptorX-Contact.tar.gz
-
README.md
-
RNAseq_data_Payne_etal_mitoDMI.csv
-
rotenone_summary.csv
-
SIFT.tar.gz
-
structural_models.tar.gz
Abstract
The evolution of reproductive barriers is the first step in the formation of new species and can help us understand the diversification of life on Earth. These reproductive barriers often take the form of “hybrid incompatibilities,” where alleles derived from two different species no longer interact properly in hybrids. Theory predicts that hybrid incompatibilities may be more likely to arise at rapidly evolving genes and that incompatibilities involving multiple genes should be common, but there has been sparse empirical data to evaluate these predictions. Here, we describe a mitonuclear incompatibility involving three genes in physical contact within respiratory Complex I of naturally hybridizing swordtail fish species. Individuals homozygous for mismatched protein combinations fail to complete embryonic development or die as juveniles, while those heterozygous for the incompatibility have reduced Complex I function and unbalanced representation of parental alleles in the mitochondrial proteome. We find that the impacts of different genetic interactions on survival are non-additive, highlighting subtle complexity in the genetic architecture of hybrid incompatibilities. Finally, we document the evolutionary history of the genes involved, showing signals of accelerated evolution and the first case of an incompatibility transferred between species via hybridization.
README: Characterizing a lethal mitonuclear incompatibility in naturally hybridizing Xiphophorus swordtails
https://doi.org/10.5061/dryad.j3tx95xmx
This dataset includes a variety of data types, all of which aim to comprehensively explore a complex, lethal mitonuclear incompatibility in hybridizing swordtails of the genus Xiphophorus. This includes large amounts of local ancestry information from Xiphophorus hybrids, simulation input and output files, structures from in silico modelling of proteins, phylogenetic analyses, and miscellaneous tabular data from organismal and morphological measurements.
In general, the local ancestry information is either presented as the raw output of the ancestryinfer pipeline (https://doi.org/10.1111/1755-0998.13175), or a processed genotype file. The former is identified with the naming convention "ancestry-probs-par[1 or 2].txt", where 1 is birchmanni and 2 is malinche. The latter is identified by the convention "genotypes" and include coordinates from the X. birchmanni reference genome as columns, individuals as rows, and genotypes as 0 (birchmanni), 1 (heterozygote), or 2 (malinche or cortezi, as applicable).
Description of the data and file structure
Data are grouped into folders based on the relevant biological group and analysis, as listed below:
- ACUA_TLMC_CHAF - ancestryinfer output for hybrid populations Acuapa (ACUA), Tlatemaco (TLMC), and Chahuaco Falls (CHAF), which are included in Extended Data and Supplemental Figures.
- admixem_simulation_files - input files for simulations performed using the program admixem (https://doi.org/10.1093/bioinformatics/btv700).
- Ancestryinfer_files - files necessary to run ancestryinfer, thus connecting the raw fastq files present in the NIH Sequence Read Archive (SRA) to the ancestry-probs files presented here. Includes two parental genomes and ancestry-informative marker (AIM) files.
- ASE_F1s - read count information used to test for allele-specific expression (ASE) of the genes ndufs5 and ndufa13 in liver samples of first-generation (F1) hybrids.
- CALL_admixmap - ancestry probabilities and genotype files from the Calnali Low (CALL) hybrid population, referred to as the "admixture mapping population" in the manuscript. These data were used in partial correlation tests of mitonuclear associations, as well as the inference of selection and dominance coefficients of the identified incompatibilities.
- CALL_mother_embryo - data used in connecting developmental stage to mitonuclear genotypes in the CALL population. This includes ancestry probabilities and genotype information for mothers and dissected embryos, plus developmental stages assigned to each of those embryos.
- CHPL - new data from an X. birchmanni x X. cortezi hybrid zone named Chapulhuacanito (CHPL). Other X. birchmanni x X. cortezi hybrid data used in the paper are available in an existing Dryad repository (https://doi.org/10.5061/dryad.pzgmsbcmn).
- codeml - output files from the command line tool's inference of evolutionary rates of mitonuclear genes in the genus Xiphophorus.
- F2_embryo_morphometrics_respirometry - files relevant to the combined analysis of morphometrics and measurement of oxygen consumption via Loligo respirometry in whole X. birchmanni x X. malinche F2 hybrids. Data include raw ancestry probability files from three sequencing runs, an ancestry HMM configuration file used in local ancestry calling, a genotype csv summarizing ancestry at the relevant mitonuclear loci, phenotype information gained from pictures and video through a dissecting microscope, the output of the MicroResp software accompanying the Loligo respirometer (named SDR_800_<F2brood>_trimmed_blanksubtract_CALC.xlsx), and a summary file of all the respirometry data.
- F2_NDUFA13_histology_slides - .svs files (viewable in QuPath, https://qupath.github.io/) of H&E-stained slides showing whole-body sagittal sections of hybrids with varying ndufa13 genotype, plus a tabular file of the measurements performed on these slides. Note that given large file sizes, only slides included in statistics in the paper are included, but all measurements are included in the csv.
- GREMLIN - alignments of mitonuclear genes generated by iterative PSI-BLAST and quality filtering, used as input to the GREMLIN web portal to test for signals of intra/intermolecular coevolution.
- new_F2_juveniles - ancestry data from second-generation (F2) X. birchmanni x X. malinche hybrids sequenced as part of the project. Note that other F2 hybrid genotypes included in this paper are available in another Dryad dataset: https://doi.org/10.5061/dryad.z8w9ghx82.
- Oroboros - information in tabular format from the Oroboros O2K respirometry trials of parental and F1 mitochondria, including the time it took for each individual to reach a peak in respiration after relevant substrate additions (O2K_time_to_peak.csv), the raw oxygen consumption rates at each stage in the multi-substrate, uncoupler, and inhibitor titration (SUIT) protocol (oroboros-raw-April_2021_050721.xlsx) and the flux control factor (FCF) ratios calculated based on those data (oroboros-FCF_data_051021_wdate.csv).
- phylogenetics - alignments for four mitonuclear genes (two duplicated with and without introns), plus a whole-mtDNA alignment, and the RAxML output files generated from those alignments.
- RaptorX-Contact - input fasta and output folder from the coevolution-based protein structure prediction performed by RaptorX-Contact.
- SIFT - input alignments of Complex I genes used for the prediction of (un)tolerated mutations using the online SIFT server (https://sift.bii.a-star.edu.sg/www/SIFT_aligned_seqs_submit.html), as well as the input files used to specify the relevant substitutions between X. birchmanni and X. malinche.
- structural_models - PDB 3D structure files for Complex I genes predicted with RaptorX and MODELLER, housed separately in subfolders named as such. The MODELLER folder also includes summary files of the distance between residues with substitutions between X. birchmanni and malinche, calculated from the X. birchmanni structure.
In addition, there are some singleton files included without a folder:
- Data_Flowcytometry_xipho.csv - mean TMRE fluorescence intensity of F1 hybrids and parentals as identified by flow cytometry of dissociated fin cells, used to check mitochondrial membrane integrity.
- mitoqpcr_ND1_Nup43_diluted.liver.DNA.csv - results of qPCR on DNA isolated from livers of pure X. birchmanni, X. malinche, and F1 hybrids. Downstream analyses tested the ratio of a mitochondrial gene, ND1, with a single-copy nuclear gene, Nup43.
- parental_species_embryo_staging.xlsx - tabular data of developmental staging from X. birchmanni and X. malinche broods.
- PRM_percentmalinche_WLLPQSGEQPUK.csv - the proportion of X. malinche allele-specific peptides identified in a Parallel Reaction Monitoring (PRM) experiment conducted on five F1 hybrids.
- RNAseq_data_Payne_etal_mitoDMI.csv - normalized counts of four Complex I genes from RNAseq of pure X. birchmanni and X. malinche, plus F1 hybrids.
- rotenone_summary.csv - survival information over time of X. birchmanni and X. malinche fry exposed to rotenone, a Complex I inhibitor.
Sharing/Access information
Further data from the proteomics experiment are available on the PRIDE database at https:/doi.org/10.6019/PXD046217.
Raw FastQ files used for ancestry inference are available in the SRA archive under BioProjects PRJNA744894, PRJNA746324, PRJNA610049, PRJNA361133, and PRJNA745218.
Please contact the corresponding authors on the associated manuscript for any clarification or help with data sharing!
Code/Software
Code specifically associated with the manuscript can be found at https://github.com/Schumerlab/mitonuc_DMI and scripts more generally used by the authors for processing ancestryinfer files can be found at https://github.com/Schumerlab/Lab_shared_scripts.
Methods
This project was based genomic DNA samples collected from live fish in Hidalgo, Mexico and lab populations in Stanford, California, followed by low-coverage sequencing and local ancestry inference using an HMM-based approach. The resulting ancestry information was used for QTL and admixture mapping to identify mito-nuclear associations, and estimation of selection on particular genotype combinations. We also genotyped embryos dissected from in utero after performing developmental staging, respirometry measurements, and morphometrics. We isolated mitochondria from adult heterozygotes, then used these samples to perform respirometry using an Oroboros O2K respirometer, and a Parallel Reaction Monitoring proteomics experiment. Using inferred protein sequences from the relevant species, we performed in silico modeling in RaptorX and MODELLER of Complex I genes of interest, analyses of evolutionary rates using codeml, and tested for historical introgression using a combination of phylogentics on PacBio mitochondrial genomes and simulations of ILS vs. gene flow. Further details are available in the online-only methods and supplemental information of the associated manuscript.