Skip to main content

Discovery of facultative parthenogenesis in a New World crocodile

Cite this dataset

Levine, Brenna; Booth, Warren (2023). Discovery of facultative parthenogenesis in a New World crocodile [Dataset]. Dryad.


Over the past two decades, there has been an astounding growth in the documentation of vertebrate facultative parthenogenesis (FP). This unusual reproductive mode has been documented in birds, non-avian reptiles—specifically lizards and snakes—, and elasmobranch fishes. Part of this growth among vertebrate taxa is attributable to awareness of the phenomenon itself and advances in molecular genetics/genomics and bioinformatics, and as such our understanding has developed considerably. Nonetheless, questions remain as to its occurrence outside of these vertebrate lineages, most notably in Chelonia (turtles) and Crocodylia (crocodiles, alligators, and gharials). The latter group is particularly interesting because unlike all previously documented cases of FP in vertebrates, crocodilians lack sex chromosomes and sex determination is controlled by temperature. Here, using whole-genome sequencing data, we provide the first evidence of FP in a crocodilian, the American Crocodile, Crocodylus acutus. The data support terminal fusion automixis as the reproductive mechanism; a finding which suggests a common evolutionary origin of FP across reptiles, crocodilians, and birds. With FP now documented in the two main branches of extant archosaurs, this discovery offers tantalizing insights into the possible reproductive capabilities of the extinct archosaurian relatives of crocodilians and birds, notably members of Pterosauria and Dinosauria.


DNA extracted from the mother and fetus using a Qiagen DNeasy Blood & Tissue kit was sent to Novogene (Sacramento, CA) for whole genome sequencing on an Illumina platform (NovaSeq 6000 PE150). Raw sequences were mapped by Novogene to the Saltwater crocodile, Crocodylus porosus, reference genome with single nucleotide polymorphisms (SNPs) identified by Novogene using the following the command in SAMtools: mpileup -m 2 -F 0.002 -d 10.

Following the parameters used in Card et al. as a guideline, variants were filtered using VCFtools v. 0.1.16, the R package vcfR, and bedtools v2.30.0, with the following criteria: (1) indels were excluded; (2) individuals with a read depth of less than 5 were excluded; (3) variants with a Phred quality score below 30 were excluded; (4) non-biallelic SNPs were excluded; (5) SNPs with significant statistical biases were removed using the hard filter ‘MQ < 40.0’; (6) SNPs were thinned to avoid the potential effects of linkage by randomly selecting one variant per 10 kb, 25 kb, and 50 kb region of the saltwater crocodile (Crocodylus porosus) genome, resulting in three sets of filtered VCFs. Prior to thinning, SNPs that were not found in both individuals were removed using the bedtools v2.30.0 ‘intersect’ function.

For each set of VCFs, files were prepared for input to program ParthenoGenius. First, maternal and offspring VCFs were merged into a single VCF with BCFtools. PGDspider was then used to convert each merged VCF into Structure file format. Structure files were modified to remove column two which contained Structure PopData values, such that the resulting files contained only the sample IDs (mother = M2, offspring = P1) and genotypes at each SNP retained after filtering. These files were then converted into CSV format.

ParthenoGenius was used to test for evidence and mode of parthenogenesis for each of the three thinned SNP data sets. Briefly, ParthenoGenius is a python program that first compares the number of the mother’s homozygous loci for which the offspring does not have identical genotypes to the mother against the number expected due to genotyping error alone based on a per-base genotyping error rate to determine whether discordance between maternal and offspring genotypes at maternal homozygous loci is more likely due to sources of genotyping error or the presence of paternal alleles. If the number of maternal homozygous loci for which the offspring has non-identical genotypes to the mother is less than the number expected due to genotyping error alone (i.e., the offspring is homozygous at all or nearly all of the mother’s homozygous loci), the offspring is called as a parthenogen and the proportion of maternal homozygous loci for which the offspring’s genotypes differ is recorded by ParthenoGenius as an updated estimated per-base error rate for the following heterozygosity scan. This assumes that genotyping error rate is consistent across the genome. If parthenogenesis is supported, ParthenoGenius then scans maternal heterozygous loci to identify those at which the offspring has retained heterozygosity for maternal alleles. If the number of maternal heterozygous loci at which the offspring is heterozygous is less than the number expected based on genotyping error alone assuming a null hypothesis of gametic duplication (i.e., the offspring is homozygous for maternal alleles at all or nearly all of the maternal heterozygous loci), the mode of parthenogenesis is called as gametic duplication. Alternatively, if the number exceeds that expected due to genotyping error alone (i.e., the offspring has retained heterozygosity at some or all maternal heterozygous loci), the mode of parthenogenesis is called as automixis (although there is no test to parse terminal from central fusion automixis). A per-base error rate of 0.1% was used for the initial homozygosity scan of the three thinned data sets, representing a conservative estimate of genotyping error rate on par with that of a filtered SNP array.


University of Tulsa