Skip to main content

Strong genetic structure in a widespread estuarine crab: A test of potential versus realized dispersal

Cite this dataset

Tepolt, Carolyn et al. (2020). Strong genetic structure in a widespread estuarine crab: A test of potential versus realized dispersal [Dataset]. Dryad.


Aim: Genetic structure has proven difficult to predict for marine and estuarine species with multi-day pelagic larval durations, since many disperse far less than expected based on passive transport models. In such cases, the gap between potential and realized dispersal may result from larval behaviors that evolved to facilitate retention and settlement in favorable environments. Behavior is predicted to play a particularly key role in structuring truly estuarine species, which often moderate their behavior to remain within their natal estuaries. In such systems, this restricted dispersal may lead to high divergence, local adaptation, and eventual speciation across their range. Here, we test whether a geographically widespread estuarine crab, known to have behavior promoting larval retention, exhibits high population structure despite a 2-4 week larval duration.

Location: Atlantic and Gulf Coasts of North America Taxon: White-fingered mud crab, Rhithropanopeus harrisii

Methods: Population genomic analyses across nine estuaries from New Hampshire to Louisiana using 12,638 transcriptome-derived SNPs.

Results: We found highly differentiated genetic signatures among all nine estuaries, separated by 200-5,000 km of coastline. Estimates of gene flow suggest that migration is low and largely symmetrical between sites. We also observed deep phylogenetic divides corresponding to major biogeographic breaks.

Main conclusions: These results indicate substantial and longstanding constraints to dispersal in the species’ native range, likely arising from the emergence of geological and oceanographic barriers and sustained by behavior that promotes estuarine retention during larval development. This work supports the idea that larval behavior promoting estuarine retention can be reflected in substantial genetic structure even in species with multi-week pelagic larval durations. Such behavior-restricted dispersal has implications for predicting adaptation and spread in estuarine species, many of which have been introduced outside their native ranges.


Sample Collection

White-fingered mud crabs (R. harrisii) were collected from nine estuaries across 5,000 km of their native North American range:Terrebonne Bay, LA; Apalachicola River, FL; St. Lucie River, FL; Pellicer Creek, FL; Ashley River, SC; Chesapeake Bay, VA; Mullica River, NJ; Moonakis River, MA; and Squamscott River, NH . Samples were collected between July-October 2015 in brackish reaches of estuaries, using passive collectors filled with oyster shells that were deployed in 0.5-3 m of water for ~2 months (for more sampling details, see Tepolt et al., 2019). Adult crabs from each site were returned to the laboratory, where they were kept in common conditions through two molts (45-90 days), to mediate some of the influence of the local environment on mRNA expression. During this time, crabs were maintained separately in Percival incubators set to 20°C and an 8am:8pm light:dark cycle. Each crab was kept in approximately 60 mL of 15 Instant Ocean seawater in one well of an 18-well plastic parts box and fed every other day with Hikari Crab Cuisine; 4-8 hours after feeding, the water was fully exchanged. Crabs were sacrificed by removing their carapaces and immediately preserving the ventral body of the crab (minus the carapace and most hepatopancreas/gonad tissue) in RNALater. Tissue was stabilized for at least 24 h at 4°C before archiving at -80°C.


Extraction & Sequencing

RNALater-preserved samples were thawed and the thoracic ganglia carefully dissected. Total RNA was extracted from ganglia using TRIzol (Invitrogen, Carlsbad, CA, USA) with 1-bromo-3-chloropropane per the manufacturer's recommended protocol (Simms, Cizdziel, & Chomczynski, 1993). RNA was quantified using the broad-range RNA assay on a Qubit 3.0 fluorometer (Invitrogen), and up to 4 μg of RNA were used to prepare cDNA libraries with Ilumina's TruSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, CA, USA). Libraries were sent to the University of Utah's High Throughput Genomics Core Facility, where they were quantified and pooled into groups of 12 multiplexed samples. Multiplexed samples were run on 7 lanes of an Illumina HiSeq 2000 in 50-bp single-end reads.


Sequence Processing and Transcriptome Assembly and Annotation

The resulting raw sequences were cleaned and trimmed using Trim Galore! v0.4.0 (, a wrapper for Cutadapt (Martin, 2011). A quality cutoff of Phred 20 was used, and reads were only retained if they were 20 bp long after adapter removal and quality trimming. A subset of 3-6 individuals per site was used to assemble a de novo R. harrisii transcriptome in Trinity v.2.0.6 (Grabherr et al., 2011; Haas et al., 2013). Reads were normalized in silico for a maximum coverage of 50, and a kmer size of 21 with a minimum kmer coverage of 2 reads was used. Only transcripts larger than 200 bp were retained. To assess the read support for these raw transcripts, we used bowtie2 v2.2.9 and RSEM v1.1.17 via Trinity to generate ExN50 data (Langmead & Salzberg, 2013; We retained only transcripts with at least 2 transcripts per kilobase million (TPM).


All transcripts in this set were annotated by comparing the translated transcript sequences in all 6 reading frames against GenBank's non-redundant protein database using BLAST v.2.2.29 (Altschul, Gish, Miller, Myers, & Lipman, 1990). We used an e-value cutoff of 0.00001 and took the top hit as the annotation. Blast2GO was used to add functional annotation (GO terms and functions) to all annotated transcripts (Conesa et al., 2005). After annotation, we further filtered the transcriptome to minimize contamination by the R. harrisii holobiont with MEGAN6 using default parameters (Huson, Auch, Qi, & Schuster, 2007), excluding transcripts with best hits to green plants, fungi, bacteria, viruses, archaea, and eukaryotic microorganisms. Finally, we retained only one representative isoform for each transcript.


Mapping & SNP Detection

Trimmed reads were mapped back to the filtered R. harrisii transcriptome using the Burrows-Wheeler Aligner v.0.7.12 with an allowed mismatch fraction of 0.01, seed length of 25, and a maximum of four allowed mismatches in the seed sequence (Li & Durbin, 2009). Picard v.2.5.0 was used to sort reads, identify and mark duplicate read sequences, and index the resulting bam files ( Deduplicated reads were merged and base quality scores were recalibrated to minimize lane- and machine-specific bias using the BQSR tool in the Genome Analysis Toolkit v.3.7 (GATK; McKenna et al., 2010). As known reference SNPs are unavailable for this species, we first generated a set of 262,192 high-confidence SNPs (Phred 20) from the unrecalibrated data using the HaplotypeCaller tool; these SNPs were then used as reference “known” SNPs per GATK recommendations for reference-free systems (DePristo et al., 2011). After recalibration, reads were split into individual sample files, indexed, and genotyped at all positions on a per-sample basis using GATK’s gVCF approach. All individual gVCF files were then used for joint genotyping across all samples, using a cutoff of Phred 20 to retain only high-quality SNPs (DePristo et al., 2011). These SNPs were further filtered with the SelectVariants tool to retain only biallelic SNPs, then hard-filtered using a custom filter per the GATK recommendations for samples where known reference SNPs are unavailable.


Hard-filtering resulted in 333,281 SNPs, which were exported from GATK while retaining individual genotype quality data. These SNPs were further filtered using a custom python script. First, only individual SNP genotypes with genotype quality Phred 20 and coverage ≥ 10 reads were retained; otherwise the site was coded as missing in that individual. After removing low-quality individual genotypes, SNPs were further filtered as follows: each SNP had to pass the quality filter above in all but 2 samples / population in all populations; the alternative allele had to occur ≥ 2 times over all individuals; and heterozygosity had to be ≤ 0.7 to screen out potential paralogs. In addition, we selected nuclear SNPs by removing all SNPs in a set of 31 mitochondrial transcripts identified by running a local blastn search against a database of 16 full-length Portunid mitochondrial genomes in BLAST+ (Camacho et al., 2009). At this point, we removed any individuals missing high-quality genotype data for >15% of SNPs. The resulting panel of high-quality SNPs was screened for linkage disequilibrium using the VCFtools --geno-r2 and --interchrom-geno-r2 options (Danecek et al. 2011). All but one SNP, chosen at random, from each group linked at R2 ≥ 0.9 were removed from the final SNP set used in downstream analysis.



Altschul, S., Gish, W., Miller, W., Myers, E., & Lipman, D. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410.

Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST plus: architecture and applications. BMC Bioinformatics, 10, 1.

Conesa, A., Götz, S., García-Gómez, J. M., Terol, J., Talón, M., & Robles, M. (2005). Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics, 21, 3674–3676.

Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., ... 1000 Genomes Project Analysis Group. (2011).The Variant Call Format and VCFtools. Bioinformatics, 27, 2156–2158.

DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V, Maguire, J. R., Hartl, C., … Daly, M. J. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43, 491–498.

Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., … Regev, A. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology, 29, 644–652.

Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., … Regev, A. (2013). De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols, 8, 1494–1512.

Huson, D., Auch, A., Qi, J., & Schuster, S. (2007). MEGAN analysis of metagenome data. Genome Research, 17, 377–386.

Langmead, B., & Salzberg, S. L. (2013). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9, 357–359.

Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760.

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal, 17, 10–12.

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., … DePristo, M. A. (2010). The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20, 1297–303.

Simms, D., Cizdziel, P. E., & Chomczynski, P. (1993). TRIzol: A new reagent for optimal single-step isolation of RNA. Focus, 15, 99–102.

Tepolt, C. K., Blakeslee, A. M. H., Fowler, A. E., Torchin, M. E., Darling, J. A., Miller, A. W., Ruiz, G. M. (2019) Recent introductions reveal differential susceptibility to parasitism across an evolutionary mosaic. Evolutionary Applications, 13, 545-558.


Methods text from the associated paper in the Journal of Biogeography ("Strong genetic structure in a widespread estuarine crab: A test of potential versus realized dispersal"), published by Wiley; this text is reproduced here with permission and falls under the journal's copyright restrictions.


Smithsonian Institute for Biodiversity Genomics, Award: Biodiversity Genomics Postdoctoral Fellowship

Smithsonian Institute for Biodiversity Genomics, Award: Biodiversity Genomics sequencing award

Smithsonian Institute for Biodiversity Genomics, Award: Biodiversity Genomics Postdoctoral Fellowship