Data from: Introgression between invasive and native blue mussels (genus Mytilus) in the central California hybrid zone.

Saarman, Norah P.1; Pogson, Grant H.2

Published Jul 31, 2015 on Dryad. https://doi.org/10.5061/dryad.53d34

Data files

Jul 31, 2015 version files 41.22 MB

adegenet_package_v.1.4-2_input_file.csv

1.31 MB
INTROGRESS_v.1.22_input_file_(admix_data).csv

1.53 MB
INTROGRESS_v.1.22_input_file_(locus_data) .csv

14.12 KB
INTROGRESS_v.1.22_input_file_(parent_1_data).csv

71.12 KB
INTROGRESS_v.1.22_input_file_(parent_2_data).csv

110.13 KB
NewHybrids_v.1_input_file.txt

286.23 KB
SNPs_from_STACKS_populations.pl_script.tsv.zip

3.31 MB
STRUCTURE_v.2.3.4_input_file.tsv

1.45 MB
tags_from_STACKS_populations.pl_script.tsv.zip

33.14 MB

Abstract

The ecological and genetic factors determining the extent of introgression between species in secondary contact zones remain poorly understood. Here, we investigate the relative importance of isolating barriers and the demographic expansion of invasive Mytilus galloprovincialis on the magnitude and the direction of introgression with the native Mytilus trossulus in a hybrid zone in central California. We use double-digest restriction-site-associated DNA sequencing (ddRADseq) to genotype 1337 randomly selected single nucleotide polymorphisms and accurately distinguish early and advanced generation hybrids for the first time in the central California Mytilus spp. hybrid zone. Weak levels of introgression were observed in both directions but were slightly more prevalent from the native M. trossulus into the invasive M. galloprovincialis. Few early and advanced backcrossed individuals were observed across the hybrid zone confirming the presence of strong barriers to interbreeding. Heterogeneous patterns of admixture across the zone of contact were consistent with the colonization history of M. galloprovincialis with more extensive introgression in northern localities furthest away from the putative site of introduction in southern California. These observations reinforce the importance of dynamic spatial and demographic expansions in determining patterns of introgression between close congeners, even in those with high dispersal potential and well-developed reproductive barriers. Our results suggest that the threat posed by invasive M. galloprovincialis is more ecological than genetic as it has displaced, and continues to displace the native M. trossulus from much of central and southern California.

NewHybrids_v.1_input_file

This is the input file for NewHybrids v1 (Anderson & Thompson 2002; Anderson 2008) analyses, used to identify pure species and hybrids and to classify hybrids into categories (i.e., F1, F2, backcross with M. galloprovincialis, or backcross with M. trossulus). This file includes only diagnostic SNPs. We used Jeffreys-type priors and a burn in of 10,000 sweeps followed by 50,000 sweeps in five separate runs. Convergence of Q across runs were again checked.

INTROGRESS_v.1.22_input_file_(locus_data)

This is file 1 (locus data) of 4 input files needed for running the R package INTROGRESS v1.22 (Gompert & Berkle 2010). These files together allow maximum-likelihood estimate of the ancestry (multilocus hybrid score) of each individual, and genomic cline analysis, which compares admixture at a single locus to average admixture across the rest of the genome. We used the parametric approach, first estimating a multilocus hybrid index and then fitting clines in genotype frequencies at individual SNPs as a function of the neutral expectation derived from the genome-wide hybrid index.

INTROGRESS_v.1.22_input_file_(parent_1_data)

This is file 2 (parent 1 data) of 4 input files needed for running the R package INTROGRESS v1.22 (Gompert & Berkle 2010). These files together allow maximum-likelihood estimate of the ancestry (multilocus hybrid score) of each individual, and genomic cline analysis, which compares admixture at a single locus to average admixture across the rest of the genome. We used the parametric approach, first estimating a multilocus hybrid index and then fitting clines in genotype frequencies at individual SNPs as a function of the neutral expectation derived from the genome-wide hybrid index.

INTROGRESS_v.1.22_input_file_(parent_2_data)

This is file 3 (parental 2 data) of 4 input files needed for running the R package INTROGRESS v1.22 (Gompert & Berkle 2010). These files together allow maximum-likelihood estimate of the ancestry (multilocus hybrid score) of each individual, and genomic cline analysis, which compares admixture at a single locus to average admixture across the rest of the genome. We used the parametric approach, first estimating a multilocus hybrid index and then fitting clines in genotype frequencies at individual SNPs as a function of the neutral expectation derived from the genome-wide hybrid index.

INTROGRESS_v.1.22_input_file_(admix_data)

This is file 4 (admix data) of 4 input files needed for running the R package INTROGRESS v1.22 (Gompert & Berkle 2010). These files together allow maximum-likelihood estimate of the ancestry (multilocus hybrid score) of each individual, and genomic cline analysis, which compares admixture at a single locus to average admixture across the rest of the genome. We used the parametric approach, first estimating a multilocus hybrid index and then fitting clines in genotype frequencies at individual SNPs as a function of the neutral expectation derived from the genome-wide hybrid index.

adegenet_package_v.1.4-2_input_file

This is the input file needed to performed a principle components analysis with the "adegenet" package version 1.4-2 (Jombart et al 2008) in the R version 3.1.0 environment (R Core Team, 2013). The results from this analysis were used to inform definitions of parental genotypes.

STRUCTURE_v.2.3.4_input_file

This is the input file needed to run STRUCTURE v2.3.4 (Pritchard et al 2000; Falush et al 2003; Falush et al 2007), which we used to identify pure species and hybrids. STRUCTURE jointly assigns individuals probabilistically to the two parental classes without prior input. STRUCTURE analyses used all 1,337 SNPs that passed our primary filters. We used a burn in of 50,000 sweeps and then ran 100,000 sweeps in five separate runs checking for convergence of the estimated membership coefficient (Q) across runs.

tags_from_STACKS_populations.pl_script

This was an output of the populations.pl script from the Stacks pipeline, used to filter SNPs scored in at least 75% of samples (parameter -r 0.75). This yielded a total of 1,337 SNPs. The original file name was "batch.8.catalog.tags.tsv". More information on the file format can be found at http://catchenlab.life.illinois.edu/stacks/comp/populations.php.

SNPs_from_STACKS_populations.pl_script.tsv

This is the output of the populations.pl script from the Stacks pipeline, used to filter SNPs scored in at least 75% of samples (parameter -r 0.75). This yielded a total of 1,337 SNPs. The original file name was "batch.8.catalog.snps.tsv". More information on the file format can be found at http://catchenlab.life.illinois.edu/stacks/comp/populations.php. We compressed the file for convenience.