Skip to main content
Dryad logo

Raw mass-standardized ionomic data from seven fish species and raw transcriptome sequences for mosquitofish inhabiting the Tar Creek Superfund Site in OK, USA


Coffin, John; Kelley, Joanna; Jeyasingh, Punidan; Tobler, Michael (2022), Raw mass-standardized ionomic data from seven fish species and raw transcriptome sequences for mosquitofish inhabiting the Tar Creek Superfund Site in OK, USA, Dryad, Dataset,


Our understanding of the mechanisms mediating the resilience of organisms to environmental change remains lacking. Heavy metals negatively affect processes at all biological scales, yet organisms inhabiting contaminated environments must maintain homeostasis to survive. Tar Creek in Oklahoma, USA, contains high concentrations of heavy metals and an abundance of Western mosquitofish (Gambusia affinis), though several fish species persist at lower frequency. To test hypotheses about the mechanisms mediating the persistence and abundance of mosquitofish in Tar Creek, we integrated ionomic data from seven resident fish species and transcriptomic data from mosquitofish to test hypotheses about the mechanisms mediating the persistence of mosquitofish in Tar Creek. We predicted that mosquitofish minimize uptake of heavy metals more than other Tar Creek fish inhabitants and induce transcriptional responses to detoxify metals that enter the body, allowing them to persist in Tar Creek at higher density than species that may lack these responses. Tar Creek populations of all seven fish species accumulated heavy metals, suggesting mosquitofish cannot block uptake more efficiently than other species. We found population-level gene expression changes between mosquitofish in Tar Creek and nearby unpolluted sites. Gene expression differences primarily occurred in the gill, where we found upregulation of genes involved with lowering transfer of metal ions from the blood into cells and mitigating free radicals. However, many differentially expressed genes were not in known metal response pathways, suggesting multifarious selective regimes and/or previously undocumented pathways could impact tolerance in mosquitofish. Our systems-level study identified well characterized and putatively new mechanisms that enable mosquitofish to inhabit heavy metal-contaminated environments.


The data comprising this dataset was collected in a two-part field experiment in the Tar Creek Superfund site in Ottawa County, Oklahoma, USA from 2017-2018. In the first experiment, we sampled the fish communities of Tar Creek (polluted) and a nearby unpolluted reference watershed (Coal Creek), and measured whole-body elemental composition of numerous biologically relevant elements (i.e., ionomes). In the second experiment, we sampled populations of Western mosquitofish (Gambusia affinis) in Tar Creek and in two unpolluted references (Coal Creek and Little Elm Creek), and generated transcriptomes to understand patterns of gene expression variation related to chronic heavy metal exposure.

The ionomic dataset was obtained by digesting dried fish specimens in a 2:1 mixture of trace metal grade HNO3 and H2O2 for analysis using an inductively coupled plasma optical emission spectrometer (ICP-OES), providing raw absolute abundances (in ug) of all elements surveyed. We then mass-standardized this data to obtain concentrations of each element in ug/mg of dried tissue, which is the data presented and archived in this dataset, under the "Ionomics_Data" tab.

The transcriptomic dataset was obtained by extracting RNA from gills, liver, and brain tissues of female G. affinis, and reverse transcribing it into cDNA, resulting in transcriptomes containing paired-end 101 bp raw reads. Raw transcriptomes are archived on GenBank (BioProject accession: PRJNA707024). We then trimmed low quality reads and adapter sequences and removed reads shorter than 50 bp. To reduce bias due to differences in library sizes, we randomly subsampled reads from the samples with abnormally high read counts. These remaining reads were mapped to the Xiphophorus maculatus reference genome and we generated a count matrix of read counts for each of the 27,266 genes found in each sample. The raw counts matrix is presented and archived in this dataset in columns A-BE of the "Gene_Exp_Counts_Matrix" tab. We also used likelihood ratio tests to compare expression of each gene between Tar Creek and both unpolluted sites, and the results and pertinent statistics for the LRT for each gene are also presented in this dataset ("Gene_Exp_Counts_Matrix" tab) in columns BF-CI.

Usage Notes

A README file has been uploaded as a separate tab along with the ionomic and transcriptomic data. Please refer to this file for explanations of both datasets.


National Science Foundation, Award: IOS-1557860

National Science Foundation, Award: IOS-1931657