Files for phylogenetics, structure, and migration rate analyses for the bivalve Aequiyoldia eightsii
Data files
Sep 09, 2020 version files 249.87 MB
-
DAPC_script.r
-
Input1_COI_sequences.nex
-
Input2_NextRad_all.phy
-
Input3_NextRad_All.stru
-
Input4_NextRad_mag.stru
-
Input5_NextRad_antarctica.stru
-
Input6_NextRad_genepop
-
Input7_NextRad_All_DAPC.str
-
populations_yoldias.txt
Abstract
The Antarctic Circumpolar Current (ACC) dominates the open-ocean circulation of the Southern Ocean, and both isolates and connects the Southern Ocean biodiversity. However, the impact on biological processes of other Southern Ocean currents is less clear. Adjacent to the West Antarctic Peninsula (WAP), the ACC flows offshore in a northeastward direction, whereas the Antarctic Peninsula Coastal Current (APCC) follows a complex circulation pattern along the coast, with topographically-influenced deflections depending on the area. Using genomic data, we estimated genetic structure and migration rates between populations of the benthic bivalve Aequiyoldia eightsii from the shallows of southern South America and the WAP to test the role of the ACC and the APCC in its dispersal. We found strong genetic structure across the ACC (between southern South America and Antarctica) and moderate structure between populations of the West Antarctic Peninsula. Migration rates along the WAP were consistent with the APCC being important for species dispersal. Along with supporting current knowledge about ocean circulation models at the WAP, migration from the tip of the Antarctic Peninsula to the Bellingshausen Sea highlights the complexities of Southern Ocean circulation. This study provides novel biological evidence of a role of the APCC as a driver of species dispersal and highlights the power of genomic data for aiding in the understanding of complex oceanographic processes.
Methods
Genomic DNA was extracted with the Qiagen DNeasy Blood & Tissue kit, following the manufacturer's protocol. Mitochondrial DNA data were collected to conduct a preliminary assessment of phylogeographic patterns and to confirm a single evolutionary lineage, given that multiple lineages have been documented previously [12]. A 629 bp fragment of the cytochrome c oxidase subunit I (COI) gene was amplified using universal primers from [17] with final concentrations for PCR components per 25 μL reaction as follows: 25 ng template DNA, 0.25 μM of each primer, 0.625 units of GoTaq DNA polymerase (Promega, Madison, WI, USA), 0.1 mM of each dNTP, 2.5 μL of 10 reaction buffer and 2.5 mM MgCl2. Amplification parameters were as follows: 95 °C for 2 min followed by 35 cycles of 95 °C for 30 s, 48 °C for 30 s, 72 °C for 90 s, and 72 °C for 7 min. Purification and sequencing were conducted at MACROGEN Inc. (South Korea). Chromatograms were edited in CodonCode Aligner 8.0.2 (Dedham, MA, USA). Sequences were imported to BioEdit 7.0.5.2 [18], aligned using the Clustal W algorithm, available within BioEdit, and checked by eye. All sequences were deposited in GenBank (accession numbers MT176643-MT176683).
Genome-wide data were obtained through the Nextera-tagmented reductively-amplified DNA protocol (NextRad; SNPSaurus LLC, Eugene Oregon) of [19] to provide the adequate genetic resolution at the spatial scale analyzed. Genomic DNA was converted into NextRAD genotyping-by-sequencing libraries. Genomic DNA was first fragmented with Nextera DNA FLex reagent (Illumina, Inc), which also ligates short adapter sequences to the ends of the fragments. The Nextera reaction was scaled for fragmenting 24 ng of genomic DNA. Fragmented DNA was then amplified for 27 cycles at 74 degrees, with one of the primers matching the adapter and extending 10 nucleotides into the genomic DNA with the selective sequence GTGTAGAGCC. Thus, only fragments starting with a sequence that can be hybridized by the selective sequence of the primer will be efficiently amplified. The nextRAD libraries were sequenced on a HiSeq 4000 with one lane of 150 bp reads (University of Oregon). Reads obtained from the nextRAD protocol were processed using the ipyRAD pipeline ver. 7.30 [20]. Base calls with a quality score < 20 were converted into Ns, and any read with > 5 Ns was discarded. Illumina adaptors and restriction sequences were removed during filtering. Filtered reads within a sample were clustered using a threshold of 90%. Error rate and heterozygosity were estimated from the loci clusters for each individual, and the averages were used to establish consensus sequences. Clusters with a sequencing depth < 6 were discarded, and only clusters with two alleles (to avoid potential paralogous loci) were retained. Consensus loci built within samples were subsequently clustered among samples using a similarity threshold of 90% and then aligned (a maximum of eight indels allowed). Loci with heterozygous alleles shared across more than 50% of individuals were also discarded. Depending on the type of analysis and their sensitivity to missing data, different genomic data sets were produced to maximize the amount of genetic information available for each analysis by changing the levels of coverage/missing data allowed (changing the minimum number of samples per locus for output in ipyRAD’s step 7).
Usage notes
Description for input files:
"Input1_COI_Sequences.nex" contains mitochondrial DNA COI sequences used to build an haplotype network in POPART (Leigh and Bryant 2015)
"Input2_NextRad_All.phy" is a concatenated data set using full-length reads, that includes not only the SNPs, but also the invariable sites for 41 individuals. This data was used to estimate a ML tree with RAxML v.8.1.16 (Stamatakis 2014)
"Input3_NextRad_All.stru" is a Structure file created with the complete data set (41 individuals) and a minimum of 10 individuals with data for the locus to be called to the alignment and used to estimate population subdivision with the program STRUCTURE (Pritchard et al. 2000).
"Input4_NextRad_mag.stru" is a subset of individuals used to run STRUCTURE analysis on samples from South America only. It used a minimum of 7 individuals for a locus to be included in the data set.
"Input5_NextRad_antarctica.stru" is a subset of individuals used to run STRUCTURE analysis on samples from Antarctica only. It used a minimum of 7 individuals for a locus to be included in the data set.
"Input6_NextRad_all_genepop" is a genepop file created from file "Input3_NextRad_All.stru" and used to estimate asymmetrical gene flow among populations with the program divMigrate-Online (Sundqvist et al. 2016).
File "Input7_NextRad_All_DAPC.stru" is a .stru (STRUCTURE) input file that can be transformed using R in a genlight object to perform the discriminant analysis of principal components. This file contains 811 SNPs and was generated by filtering out all loci represented by less than 20 individuals.
File "populations_yoldias.txt" is a file containing population assignments to be used in DAPC analyses.