Peripheral budding following range expansion explains diversity and distribution of one‐sided Livebearing fish

Berg, Rachel 1 ; Aguilera, Gastón2; Goyenola, Guillermo3; Petry, Ana. C.4; Meyer, Axel5; Torres‐Dowdall, Julián1

Published Feb 02, 2026 on Dryad. https://doi.org/10.5061/dryad.3ffbg79xg

Data files

Feb 02, 2026 version files 6.81 MB

data.zip

6.19 MB
inputs.zip

591.49 KB
README.md

10.12 KB
scripts.zip

17.44 KB

Abstract

Peripheral budding occurs when populations diverge from a widespread parental population and speciate along its periphery, facilitated by the interaction of ecological and geographic barriers. This phenomenon results in species that contrast in range size and ecological tolerance and can lead to confounding phylogenies. Here we examine patterns of peripheral budding in the Jenynsia lineata species complex using a genomic approach via RAD sequences. The J. lineata species complex is a group of live‐bearing fish in South America that shows signals of peripheral budding through asymmetric range sizes, J. lineata being widespread, and with a confounding and unresolved phylogeny. Our goal was to adequately classify the J. lineata species complex, delimit species within the complex via multiple approaches, and identify signals of introgression to better understand the underlying evolutionary patterns. We collected 85 samples from the species complex for DNA extraction and performed RAD sequencing to generate genome‐wide molecular markers for phylogenetic analyses. We found evidence of six distinct genetic groups within the complex and delimited at least five species, with a new species of Jenynsia in Northern Argentina along the periphery of J. lineata. Jenynsia lineata was recovered as the most recently diverged species in our phylogeny. This placement, along with observed patterns of introgression between species, suggests peripheral budding to have facilitated speciation in the J. lineata species complex, following a range expansion of a parental J. lineata. Our results show genomic patterns associated with peripheral budding and support the utility of using peripheral budding to better understand confounding phylogenetic patterns.

Dataset DOI: 10.5061/dryad.3ffbg79xg

Description of the data and file structure

Caudal fin tissues were collected from one-sided live bearer fish (Jenynsia sp.) in Argentina, Uruguay, and Brazil between 2013 and 2017. Tissues were preserved in 100% ethanol. DNA was extracted via a double-digest restriction-associated DNA (ddRAD-seq) protocol. All raw genomic sequences (fastq files) are available in the NCBI sequence read archive (SRA) (BioProject: PRJNA822935 [Accession: SRX14732231–SRX14732245; SRX14732201–SRX14732206; SRX14732208–SRX14732216; SRX14732189–SRX14732191] and BioProject: PRJNA1238845 [Accession: SAMN47485203–SAMN47485265]). An 83 individuals dataset and 94 individuals dataset was used in these analyses. The 83 individuals dataset includes sample sequences from all species of the Jenynsia lineata species complex (onca, luxata, lineata, and darwini). The 94 individuals dataset includes all sequences from the 83 individuals dataset with the addition of sequences from a genomic outgroup, J. obscura.

Files and variables

File: data.zip

Description: The data folder contains data files used for *.R scripts. This folder also contains raw data about the sampling and output probabilities from the Delineate analysis.

Files: {species/population}.thetas.idx.pestPG

Description: Output files from diversity.sh. Describes several diversity metrics for each population/species. Files used in Jenynsia_genetic_diversity.R script.

Variables:

(indexStart,indexStop): start and stop location in the index
(firstPos_withData,lastPos_withData): first and last position with data within the window
(WinStart,WinStop): start and stop location in the chromosome
Chr: contig for the window
WinCenter: center position of the window
tW: Watterson's θ
tP: pairwise θ
tF: Fu and Li’s θF
tH: Fay and Wu’s θH
tL: Fu and Li’s θL
Tajima: Tajima's D estimation
fuf: Fu and Li’s D
fud: Fu and Li’s F
fayh: Fay and Wu’s H
zeng: Zeng's E
nSites: number of SNPs in that window

File: jenynsia_data_mapping.csv

Description: Information about where samples were collected. Used in Jenynsia_map.R.

Variables:

order: row id number
id: unique id for each sample
full_id: unique id number along with population information letters
population: assigned population/species
lat: latitude in WGS84 projection of the sampled body of water
long: longitude in WGS84 projection of the sampled body of water
population_map: population identifier used for plotting sampling locations

Files: plink.{#}.Q.csv

Description: Output from Jenynsia_admixture.sh. Ancestry proportions of each k (#) cluster for each individual. popmap.txt gives the order of the individuals, 1-83.

Variables:

Each column represents a different cluster, 1:k(#)
Each row is a different individual

File: popmap.txt

Description: Population map of the collected samples

Variables:

Column 1: sample ID
Column 2: population identifier

File: Delineate_tree_probability_output.json.zip

Description: This file is output from delineate.sh from the scripts folder. The file describes the different tree probabilities from the delineate analysis.

File: inputs.zip

Description: The inputs folder contains inputs for genomic analyses. Phylip and nexus files are also included from phylogenetic analyses.

File: bfd_{test}.xml

Description: These files are used as inputs in the BFD.sh analyses.

File: {species/population}.txt

Description: These files contain a list of samples for each species/population. If the file is popmap#, that is the list of species used in the analyses that included an outgroup (94) or not (83).

File: delin_guide_tree_final.nex

Description: This file is the guide tree used in the delineate.sh analysis. It is in nexus format.

File: delin_input_final.tsv

Description: This file is the metadata used in the delineate.sh analysis.

Variables:

lineage: These are the different population groupings for the SNAPP analysis
species: The species being tested for the several population lineages
status: 1 is that this species has previously been recognized as distinct. 0 is that there is currently no strong evidence if this is a distinct species or not.

File: dsuite.fb.txt

Description: This file is an output file describing the branches from the dsuite.sh analysis.

File: {dataset}.nexus

Description: These files are in standard nexus format. Files with the number 30, 83, or 94 correspond to the 30 individuals, 83 individuals, and 94 individuals datasets. svd.nexus was used as input to the SVDquartets analysis.

File: {dataset}.phy

Description: These files are in standard phylip format. Files with the number 83 or 94 correspond to the 83 individuals and 94 individuals datasets.

File: snapp_remove.txt

Description: This is a list of individuals that were not included in the 30 individuals dataset analyses.

File: snapp.xml

Description: This file is the input file for the scripts/SNAPP.sh analysis.

File: svd.newick.txt

Description: This file is the input file for the SVDquartets analysis. The file contains a tree in newick format of the Jenynsia species.

File: taxapartitions.txt

Description: This file is the input file for the SVDquartets analysis. The file contains set information for each species that corresponds to the order of species in the svd.newick.txt file.

File: scripts.zip

Description: The scripts folder contains the scripts used in analyses. The code in the *.sh files was written for use on an HPC sungrid engine system.

Code/software

Genomic analyses

dDocent

This code was run in a conda environemnt
dDocent v2.9.4 needs to be installed https://ddocent.com/bioconda/
fastq files must be in the working directory (see SRA accession information above)
For the 83 and 30 individuals data set do not have J. obscura samples in the working directory
For the 94 individuals dataset, include all fatsq files
Run
scripts/ddocent.sh
Rename output to TotalRawSNPs_83.vcf and ddocent_Final_83.recode.vcf after running on 83 individuals
Rename output to TotalRawSNPs_94.vcf and ddocent_Final_94.recode.vcf after running on 94 individuals

Filtering VCF files

vcftools and vcffilter need to be installed in conda
run scripts/Jenynsia_filter.sh

File conversion

download vcf2phylip https://github.com/edgardomortiz/vcf2phylip to convert to nexus, fasta, and phylip formats
run
for 83 individuals dataset
python vcf2phylip.py -i inputsfiltered_final_83_LD_pruned.recode.vcf -f -n -b
for 30 individuals dataset
python vcf2phylip.py -i inputs/filtered_final_30_nomiss_LD_pruned_sub1000.vcf -f -n -b
for 94 individuals dataset
python vcf2phylip.py -i inputs/filtered_final_94_LD_pruned.recode.vcf -f -n -b

PCA

ipyrad needs to be installed in conda
compress vcf file
bgzip inputs/filtered_final_83.recode.vcf > inputs/filtered_final_83.recode.vcf.gz
index the compressed file
tabix inputs/filtered_final_83.recode.vcf.gz
then run the python file ipyrad_converter_pca
followed by ipyrad_pca.py
A PCA figure will be output

Admixture

ADMIXTURE v 1.3.0
PLINK needs to be installed in conda
run scripts/Jenynsia_admixture.sh
Outputs CV errors and files for plotting each K in r
Use admixture_plot.R to plot figure in R

IQtree and RAXML

IQtree 2.4.0 and RAXML v8 need to be installed in conda
Run
iqtree.sh
Run
raxml.sh

SVDQuartets

generate file for SVDQuartets
cat inputs/filtered_final_83_LD_pruned.recode.min4.nexus inputs/taxpartitions.txt > inputs/svd.nexus
PAUP v.4.0.a168 needs to be installed and launched
Run
exe inputs/svd.nexus
Then run
svdq taxpartition=fish showScores=no seed=1234568 bootstrap nreps=1000 treeFile=svd.tre;

To save the consenses tree in newick format run
savetree

SNAPP

Beast2 v2.7.7 needs to be downloaded
Run 2 times
scripts/SNAPP.sh

BFD*

Beast2 v2.6.7 needs to be downloaded
Run
scripts/BFD.sh

DELINEATE

DELINEATE v1.2.3 needs to be downloaded to conda
Guide tree is generated from the two SNAPP runs consensus tree using the TreeAnnotator application
Run scripts/delineate.sh
Output of tree probabilty analysis can be found in the data folder under Delineate_tree_probabilty_output.json

TreeMix

compress vcf file
bgzip inputs/filtered_final_94.recode.vcf > inputs/filtered_final_94.recode.vcf.gz
index the compressed file
tabix inputs/filtered_final_94.recode.vcf.gz
Run python file ipyrad_converter_treemix.py
Followed by ipyrad_treemix.py

Save output in folder named "tree" within the data folder for further analysis in R
To plot use python file ipyrad_treemix_plot.py

Dsuite

Download dsuite v0.5.r53 https://github.com/millanek/Dsuite
Uses svd.newick.txt output from SVDquartets analysis
Run scripts/dsuite.sh Outputs dsuite.fb.txt
To plot use dtools.py, which is part of the dsuite software
run
python3 dtools.py dsuite.fb.txt inputs/svd.newick.txt

Nucleotide Diversity

ANGSD needs to be downloaded in conda
Run
diversity.sh Plotting is done in R using Jenynsia_genetic_diversity.R

Access information

Other publicly accessible locations of the data:

https://github.com/TD-Lab-NotreDame/Peripheral_budding_Jenynsia_lineata_complex
All raw genomic sequences (fastq files) are available in the NCBI sequence read archive (SRA) (BioProject: PRJNA822935 [Accession: SRX14732231–SRX14732245; SRX14732201–SRX14732206; SRX14732208–SRX14732216; SRX14732189–SRX14732191] and BioProject: PRJNA1238845 [Accession: SAMN47485203–SAMN47485265]).

Peripheral budding following range expansion explains diversity and distribution of one‐sided Livebearing fish

Data files

Abstract

README: Peripheral budding following range expansion explains diversity and distribution of one‐sided livebearing fish

Description of the data and file structure

Files and variables

File: data.zip

File: inputs.zip

File: scripts.zip

Code/software

Access information