Data from: Patterns of genomic heterogeneity in a classic field cricket mosaic hybrid zone
Data files
Feb 05, 2025 version files 5.82 GB
-
10368.fasta
848.91 KB
-
1101.fasta
560.90 KB
-
1121.fasta
357.37 KB
-
1145.fasta
653.94 KB
-
1275.fasta
840.47 KB
-
136.fasta
809.68 KB
-
14741.fasta
395.20 KB
-
1638.fasta
838.25 KB
-
211.fasta
648.45 KB
-
2658.fasta
850.75 KB
-
3136.fasta
586.25 KB
-
3344.fasta
542.17 KB
-
3422.fasta
477.16 KB
-
3432.fasta
821.32 KB
-
3555.fasta
568.45 KB
-
367.fasta
838.96 KB
-
3732.fasta
824.56 KB
-
3968.fasta
485.84 KB
-
402.fasta
563.39 KB
-
4205.fasta
362.10 KB
-
5021.fasta
436.27 KB
-
5052.fasta
502.16 KB
-
5136.fasta
568.44 KB
-
5177.fasta
443.76 KB
-
518.fasta
925.63 KB
-
5214.fasta
589.36 KB
-
541.fasta
670.02 KB
-
5556.fasta
192.14 KB
-
5777.fasta
606.29 KB
-
580.fasta
580.35 KB
-
586.fasta
558 KB
-
5961.fasta
814.66 KB
-
6030.fasta
525.46 KB
-
618.fasta
832.89 KB
-
625.fasta
568.03 KB
-
6271.fasta
309.95 KB
-
6571.fasta
861.66 KB
-
714.fasta
841.76 KB
-
7164.fasta
294.19 KB
-
726.fasta
380.68 KB
-
8026.fasta
463.12 KB
-
8257.fasta
858.48 KB
-
8322.fasta
660.63 KB
-
8612.fasta
678.22 KB
-
87.fasta
895.25 KB
-
90.fasta
600.75 KB
-
9839.fasta
492.19 KB
-
alldb.blastp
1.70 MB
-
allstats.txt
445 B
-
Complete_data_species_Dxy_Fis_Pi_9_26_24_new.csv
7.18 KB
-
complete_draft_genome_gryllus.gff
3.83 GB
-
Concat_file.fasta
13.96 MB
-
cricket_IMa_input.txt
161 B
-
cricket_IMa_output_final_run.txt
1.19 MB
-
final_noseq.gff
1.88 GB
-
final.all.maker.proteins.fasta
6.51 MB
-
final.all.maker.transcripts.fasta
17.43 MB
-
GryllusFragmentsAnnotated.gff
42.04 MB
-
PutativeGeneFunctions.txt
31.73 KB
-
Rcode_gryllus_DxyFst.R
25.78 KB
-
Read_me_FASTA.txt
576 B
-
README.md
3.56 KB
Abstract
Barriers to gene exchange can be semi-permeable; some genes are expected to freely flow across species boundaries whereas others, under divergent selection or responsible for reproductive isolation, might not. Genome scans in recently diverged species have identified divergent genomic regions, a pattern that has often been interpreted as islands of restricted introgression in a background of relatively free gene exchange (“genomic islands of speciation”). Areas of high differentiation, most located in the X chromosome (females XX, males X0), have been identified in the hybridizing field crickets Gryllus firmus and Gryllus pennsylvanicus. These species were assumed to follow an islands of speciation model, with highly differentiated areas interpreted as areas of reduced introgression. We sequenced the G. firmus genome to localize previously studied SNPs and sample a larger area around them in 8 allopatric populations (4 of each species). We use this data to test expectations for the islands model, in which non-introgressing areas should have both high absolute and relative differentiation. We find that in the allopatric populations, the areas with high relative differentiation (mostly X-linked), previously interpreted as non-introgressing, do not have high absolute differentiation as would be expected under the “islands model.”. We also show that the estimated divergence time based on nuclear DNA is about 4× older than that estimated based on mtDNA (800K vs 200K years ago). We discuss the implications of our results for introgression into allopatric populations.
README: Patterns of genomic heterogeneity in a classic Gryllus mosaic hybrid zone
This dataset contains all the loci used in the study, the R code, the data used in the R code, supplementary tables and figures, and the annotated draft genome for Gryllus firmus.
Description of the data and file structure
There are five supplementary tables and one supplementary figure (on Zenodo as Supplemental Information):
Table 1: IMa information for each locus including mutation rate and length of reads
Table 2: Assemblage details from Supernova
Table 3: Gene annotation for genes that fall inside sequenced loci
Table 4: Autosomal non-introgressing genes with low Tajima’s D, all located in chromosome LG14
Table 5: Primers used for longPCR of the 48 loci
Figure 1: Flow-chart for the Gryllus genome annotation
We also included the IMa3 input file (cricket_IMa_input), and output (results, IMa_output_final_run) as "data"
Fasta_files:
These are the sequenced loci (~5K basepairs) for 4 populations of Gryllus firmus and 4 population of G. pennsylvanicus. See paper for more information. The population sequences for each of the 47 loci are in DNAsp Fasta Format - see Read_me_Fasta for more information
All of these data is uploaded on Genbank - see paper for accession numbers
We also included one file of the concatenated loci used to build the tree - 79 individuals, 34 loci (concat_file.fasta).
The annotated draft genome is the .gff dataset.
Genome file with annotation for scaffolds that contain the loci studied: GryllusFragmentsAnnotated.gff
putative gene functions for annotated genes: PutativeGeneFunctions.txt
The complete draft genome: complete_draft_genome_gryllus.gff
Additional files related to genome annotations: allstats.txt, alldb.blatp, final_no_seq.gff
Fasta files for annotated proteins and transcripts are in: final.all.maker.proteins and final.all.maker.transcripts
Sharing/Access information
Data was derived from the following sources:
Whole genome sequence ->primer design for long PCR based on existing previously described SNPs-> long sequencing of 48 individuals for each species (96 total).
Code/Software (uploaded as "data")
The R code used in the analyses for Dxy and Fst as well as the dataset itself
data: Complete_data_species_Dxy_Fis_Pi_9_26_24_new.cvs - this data set contains all of the studied loci ("gene") - values were calculated with DnaSP (http://www.ub.edu/dnasp/). The labels refer to: information on introgressing status ("Intro", yes or no), if loci are X linked or not ("sex"). Chromosome location of each locus ("Chromosome"). The total number of sequenced nucleotides ("N_sites_no_gaps"). Number of G. pennsylvanicus sequenced individuals ("Pnumber"). Number of G. fimrus sequenced individuals ("Fnumber"). Total number of individuals ("Ntot"). Number of G. pennsylvanicus populations and G. firmus populations sampled ("Ppops" and "Fpops" respectively). The number of haploypes observed in G. firmus and G. pennsylvanicus ("F_hap_N" and "P_hap_N"). The total number of variable sites in G. firmus and G. pennsylvanicus ("F_S" and "P_S"). The total number of mutations "eta" for G. firmus and G. pennsylvanicus ("F_eta" and "P_eta"). The nucleotide diversity, pi, for G. firmus and G. pennsylvanicus ("F_Pi" and "P_Pi"). And several measures of population diversity calculated with the command >gene flow and genetic differentiation - including relative differentiation ("Fst" and absolute diferentaition "Dxy").
R code used: Rcode_gryllus_DxyFst.R
Usage notes
Word, Excel, R, gff file needs to be opened with genomic programs such as Geneious or other genomic browsers