Peripheral budding following range expansion explains diversity and distribution of one‐sided Livebearing fish
Data files
Feb 02, 2026 version files 6.81 MB
-
data.zip
6.19 MB
-
inputs.zip
591.49 KB
-
README.md
10.12 KB
-
scripts.zip
17.44 KB
Abstract
Peripheral budding occurs when populations diverge from a widespread parental population and speciate along its periphery, facilitated by the interaction of ecological and geographic barriers. This phenomenon results in species that contrast in range size and ecological tolerance and can lead to confounding phylogenies. Here we examine patterns of peripheral budding in the Jenynsia lineata species complex using a genomic approach via RAD sequences. The J. lineata species complex is a group of live‐bearing fish in South America that shows signals of peripheral budding through asymmetric range sizes, J. lineata being widespread, and with a confounding and unresolved phylogeny. Our goal was to adequately classify the J. lineata species complex, delimit species within the complex via multiple approaches, and identify signals of introgression to better understand the underlying evolutionary patterns. We collected 85 samples from the species complex for DNA extraction and performed RAD sequencing to generate genome‐wide molecular markers for phylogenetic analyses. We found evidence of six distinct genetic groups within the complex and delimited at least five species, with a new species of Jenynsia in Northern Argentina along the periphery of J. lineata. Jenynsia lineata was recovered as the most recently diverged species in our phylogeny. This placement, along with observed patterns of introgression between species, suggests peripheral budding to have facilitated speciation in the J. lineata species complex, following a range expansion of a parental J. lineata. Our results show genomic patterns associated with peripheral budding and support the utility of using peripheral budding to better understand confounding phylogenetic patterns.
Dataset DOI: 10.5061/dryad.3ffbg79xg
Description of the data and file structure
Caudal fin tissues were collected from one-sided live bearer fish (Jenynsia sp.) in Argentina, Uruguay, and Brazil between 2013 and 2017. Tissues were preserved in 100% ethanol. DNA was extracted via a double-digest restriction-associated DNA (ddRAD-seq) protocol. All raw genomic sequences (fastq files) are available in the NCBI sequence read archive (SRA) (BioProject: PRJNA822935 [Accession: SRX14732231–SRX14732245; SRX14732201–SRX14732206; SRX14732208–SRX14732216; SRX14732189–SRX14732191] and BioProject: PRJNA1238845 [Accession: SAMN47485203–SAMN47485265]). An 83 individuals dataset and 94 individuals dataset was used in these analyses. The 83 individuals dataset includes sample sequences from all species of the Jenynsia lineata species complex (onca, luxata, lineata, and darwini). The 94 individuals dataset includes all sequences from the 83 individuals dataset with the addition of sequences from a genomic outgroup, J. obscura.
Files and variables
File: data.zip
Description: The data folder contains data files used for *.R scripts. This folder also contains raw data about the sampling and output probabilities from the Delineate analysis.
Files: {species/population}.thetas.idx.pestPG
Description: Output files from diversity.sh. Describes several diversity metrics for each population/species. Files used in Jenynsia_genetic_diversity.R script.
Variables:
- (indexStart,indexStop): start and stop location in the index
- (firstPos_withData,lastPos_withData): first and last position with data within the window
- (WinStart,WinStop): start and stop location in the chromosome
- Chr: contig for the window
- WinCenter: center position of the window
- tW: Watterson's θ
- tP: pairwise θ
- tF: Fu and Li’s θF
- tH: Fay and Wu’s θH
- tL: Fu and Li’s θL
- Tajima: Tajima's D estimation
- fuf: Fu and Li’s D
- fud: Fu and Li’s F
- fayh: Fay and Wu’s H
- zeng: Zeng's E
- nSites: number of SNPs in that window
File: jenynsia_data_mapping.csv
Description: Information about where samples were collected. Used in Jenynsia_map.R.
Variables:
- order: row id number
- id: unique id for each sample
- full_id: unique id number along with population information letters
- population: assigned population/species
- lat: latitude in WGS84 projection of the sampled body of water
- long: longitude in WGS84 projection of the sampled body of water
- population_map: population identifier used for plotting sampling locations
Files: plink.{#}.Q.csv
Description: Output from Jenynsia_admixture.sh. Ancestry proportions of each k (#) cluster for each individual. popmap.txt gives the order of the individuals, 1-83.
Variables:
- Each column represents a different cluster, 1:k(#)
- Each row is a different individual
File: popmap.txt
Description: Population map of the collected samples
Variables:
- Column 1: sample ID
- Column 2: population identifier
File: Delineate_tree_probability_output.json.zip
Description: This file is output from delineate.sh from the scripts folder. The file describes the different tree probabilities from the delineate analysis.
File: inputs.zip
Description: The inputs folder contains inputs for genomic analyses. Phylip and nexus files are also included from phylogenetic analyses.
File: bfd_{test}.xml
Description: These files are used as inputs in the BFD.sh analyses.
File: {species/population}.txt
Description: These files contain a list of samples for each species/population. If the file is popmap#, that is the list of species used in the analyses that included an outgroup (94) or not (83).
File: delin_guide_tree_final.nex
Description: This file is the guide tree used in the delineate.sh analysis. It is in nexus format.
File: delin_input_final.tsv
Description: This file is the metadata used in the delineate.sh analysis.
Variables:
- lineage: These are the different population groupings for the SNAPP analysis
- species: The species being tested for the several population lineages
- status: 1 is that this species has previously been recognized as distinct. 0 is that there is currently no strong evidence if this is a distinct species or not.
File: dsuite.fb.txt
Description: This file is an output file describing the branches from the dsuite.sh analysis.
File: {dataset}.nexus
Description: These files are in standard nexus format. Files with the number 30, 83, or 94 correspond to the 30 individuals, 83 individuals, and 94 individuals datasets. svd.nexus was used as input to the SVDquartets analysis.
File: {dataset}.phy
Description: These files are in standard phylip format. Files with the number 83 or 94 correspond to the 83 individuals and 94 individuals datasets.
File: snapp_remove.txt
Description: This is a list of individuals that were not included in the 30 individuals dataset analyses.
File: snapp.xml
Description: This file is the input file for the scripts/SNAPP.sh analysis.
File: svd.newick.txt
Description: This file is the input file for the SVDquartets analysis. The file contains a tree in newick format of the Jenynsia species.
File: taxapartitions.txt
Description: This file is the input file for the SVDquartets analysis. The file contains set information for each species that corresponds to the order of species in the svd.newick.txt file.
File: scripts.zip
Description: The scripts folder contains the scripts used in analyses. The code in the *.sh files was written for use on an HPC sungrid engine system.
Code/software
Genomic analyses
dDocent
This code was run in a conda environemnt
dDocent v2.9.4 needs to be installed https://ddocent.com/bioconda/
fastq files must be in the working directory (see SRA accession information above)
For the 83 and 30 individuals data set do not have J. obscura samples in the working directory
For the 94 individuals dataset, include all fatsq files
Run
scripts/ddocent.sh
Rename output to TotalRawSNPs_83.vcf and ddocent_Final_83.recode.vcf after running on 83 individuals
Rename output to TotalRawSNPs_94.vcf and ddocent_Final_94.recode.vcf after running on 94 individuals
Filtering VCF files
vcftools and vcffilter need to be installed in conda
run scripts/Jenynsia_filter.sh
File conversion
download vcf2phylip https://github.com/edgardomortiz/vcf2phylip to convert to nexus, fasta, and phylip formats
run
for 83 individuals dataset
python vcf2phylip.py -i inputsfiltered_final_83_LD_pruned.recode.vcf -f -n -b
for 30 individuals dataset
python vcf2phylip.py -i inputs/filtered_final_30_nomiss_LD_pruned_sub1000.vcf -f -n -b
for 94 individuals dataset
python vcf2phylip.py -i inputs/filtered_final_94_LD_pruned.recode.vcf -f -n -b
PCA
ipyrad needs to be installed in conda
compress vcf file
bgzip inputs/filtered_final_83.recode.vcf > inputs/filtered_final_83.recode.vcf.gz
index the compressed file
tabix inputs/filtered_final_83.recode.vcf.gz
then run the python file ipyrad_converter_pca
followed by ipyrad_pca.py
A PCA figure will be output
Admixture
ADMIXTURE v 1.3.0
PLINK needs to be installed in conda
run scripts/Jenynsia_admixture.sh
Outputs CV errors and files for plotting each K in r
Use admixture_plot.R to plot figure in R
IQtree and RAXML
IQtree 2.4.0 and RAXML v8 need to be installed in conda
Run
iqtree.sh
Run
raxml.sh
SVDQuartets
generate file for SVDQuartets
cat inputs/filtered_final_83_LD_pruned.recode.min4.nexus inputs/taxpartitions.txt > inputs/svd.nexus
PAUP v.4.0.a168 needs to be installed and launched
Run
exe inputs/svd.nexus
Then run
svdq taxpartition=fish showScores=no seed=1234568 bootstrap nreps=1000 treeFile=svd.tre;
To save the consenses tree in newick format run
savetree
SNAPP
Beast2 v2.7.7 needs to be downloaded
Run 2 times
scripts/SNAPP.sh
BFD*
Beast2 v2.6.7 needs to be downloaded
Run
scripts/BFD.sh
DELINEATE
DELINEATE v1.2.3 needs to be downloaded to conda
Guide tree is generated from the two SNAPP runs consensus tree using the TreeAnnotator application
Run scripts/delineate.sh
Output of tree probabilty analysis can be found in the data folder under Delineate_tree_probabilty_output.json
TreeMix
compress vcf file
bgzip inputs/filtered_final_94.recode.vcf > inputs/filtered_final_94.recode.vcf.gz
index the compressed file
tabix inputs/filtered_final_94.recode.vcf.gz
Run python file ipyrad_converter_treemix.py
Followed by ipyrad_treemix.py
Save output in folder named "tree" within the data folder for further analysis in R
To plot use python file ipyrad_treemix_plot.py
Dsuite
Download dsuite v0.5.r53 https://github.com/millanek/Dsuite
Uses svd.newick.txt output from SVDquartets analysis
Run scripts/dsuite.sh Outputs dsuite.fb.txt
To plot use dtools.py, which is part of the dsuite software
run
python3 dtools.py dsuite.fb.txt inputs/svd.newick.txt
Nucleotide Diversity
ANGSD needs to be downloaded in conda
Run
diversity.sh Plotting is done in R using Jenynsia_genetic_diversity.R
Access information
Other publicly accessible locations of the data:
- https://github.com/TD-Lab-NotreDame/Peripheral_budding_Jenynsia_lineata_complex
- All raw genomic sequences (fastq files) are available in the NCBI sequence read archive (SRA) (BioProject: PRJNA822935 [Accession: SRX14732231–SRX14732245; SRX14732201–SRX14732206; SRX14732208–SRX14732216; SRX14732189–SRX14732191] and BioProject: PRJNA1238845 [Accession: SAMN47485203–SAMN47485265]).
