Data from: The biogeographic history of neosuchian crocodiles and the impact of saltwater tolerance variability
Data files
Sep 14, 2023 version files 254.60 MB
Abstract
Extant neosuchian crocodiles are represented by only 24 taxa that are confined to the tropics and subtropics. However, at other intervals during their 200 million-year evolutionary history, the clade reached considerably higher levels of species-richness, matched by more widespread distributions. Neosuchians have occupied numerous habitats and niches, ranging from dwarf riverine forms to large marine predators. Despite numerous previous studies, several unsolved questions remain with respect to their biogeographic history, including the geographic origins of major groups, e.g., Eusuchia and Neosuchia itself. We carried out the most comprehensive biogeographic analysis of Neosuchia to date, based on a multivariate K-means clustering approach followed by the application of two ancestral area estimation methods (BioGeoBEARS and Bayesian Ancestral Location Estimation) applied to two recently published phylogenies. Our results placed the origin of Neosuchia in north-western Pangaea, with subsequent radiations into Gondwana. Eusuchia probably emerged in the European archipelago during the Late Jurassic/Early Cretaceous, followed by dispersal to the North American and Asian landmasses. We show that putative transoceanic dispersal events are statistically significantly less likely to happen in alligatoroids. This finding is consistent with the saltwater intolerant physiology of extant alligatoroids, bolstering inferences of such intolerance in their ancestral lineages.
README
GENERAL INFORMATION
1. Title of Dataset: Groh_2023_croc_biogeo
2. Author information:
A. Principal Investigator Contact Information
Name: Sebastian S. Groh
Institution: Dpt. of Earth Sciences, University College London
Address: Gower Street, London, WC1E 6BT, UK
Email: sebastian.s.groh@gmail.com
B. Co-investigators Contact Information
Name: Paul Upchurch
Institution: Dpt. of Earth Sciences, University College London
Address: Gower Street, London, WC1E 6BT, UK
Email: p.upchurch@ucl.ac.uk
Name: Paul M. Barrett
Institution: Dpt. of Earth Sciences, Natural History Museum
Address: Cromwell Road, London, SW7 5BD, UK
Email: p.barrett@nhm.ac.uk
Name: Julia J. Day
Institution: Dpt. of Genetics, Environment and Evolution, University College London
Address: Gower Street, London, WC1E 6BT, UK
Email: j.day@ucl.ac.uk
3. Dates of data collection: 2015-04-01 to 2022-08-31\,
collected in person in museum collections and from online sources
(see Groh et al. 2021 and Groh et al. 2022 for more information on how the in person collection took place)
4. Funding sources: Financial support for this study was provided by the
London NERC DTP (Ref NE/L002485/1) and the UCL Bogue Fellowship.
SHARING/ACCESS INFORMATION
1. Restrictions: please do not reuse the data or code without citation.
2. A previous version of the phylogenetic dataset can be found in
Groh, S.S., Upchurch, P., Barrett, P.M. and Day, J.J., 2020. The phylogenetic relationships of neosuchian crocodiles and their implications for the convergent evolution of the longirostrine condition. Zoological Journal of the Linnean Society, 188(2), pp.473-506.
the time-scaled trees and updated phylogenies can be found in:
Groh, S. S., Upchurch, P., Barrett, P. M., & Day, J. J. (2022). How to date a crocodile: estimation of neosuchian clade ages and a comparison of four time‐scaling methods. Palaeontology, 65(2), e12589.
3. Recommended citation for this dataset:
Please cite the main paper this dataset is linked to.
Groh, S.S., Upchurch, P., Barrett, P.M. and Day, J.J., 2023. The biogeographic history of neosuchian crocodiles and the impact of saltwater tolerant/intolerant physiologies. Royal Society Open Science [in press]
DATA & FILE OVERVIEW
File List and descriptions
Groh_2023_croc_biogeo_CLADE_DEFINITIONS.pdf [definition of all the clade names used in the paper, for both the G21 and SB21 phylogenies]
Dataset.1: K-means analysis (Groh_2023_croc_biogeo_KMEANS_FILES.zip)
|- Groh_2023_croc_biogeo_kmeans_areas.pdf [palaeomaps with the occurrences plotted and centres of the nine areas per time bin]
|- Groh_2023_croc_biogeo_kmeans_Button_et_al_2017_analysis_script.txt [K-means analysis script from Button et al. 2017. All run on R 4.2.2. Each time period needs a different input file\, for example Groh_2023_croc_biogeo_Lower_Cretaceous_coordsall.txt for the Lower Cretaceous]
|- Groh_2023_croc_biogeo_kmeans_results.xlsx [statistical result values; each row corresponds to a different time bin in the dataset 'Groh_2023_croc_biogeo_palaeocoordinates_data.xlsx'; each column is a different number of potential geographic clusters. Each percentage value is the total percentage of data explained by the number of clusters]
|- Groh_2023_croc_biogeo_kmeans_script_for_plotting_result_maps.txt [Script to plot the results coordinates on a palaeomap. All run on R 4.2.2. The input coordinate files are in Groh_2023_croc_biogeo_result_coordinates. Georeferenced tifs can be obtained e.g. via PALEOMAP in GPlates (see paper for citations). Each plot needs one georeferenced tif for the time period\, and the area and coordinates files for each time period (e.g.\, for the Upper Cretaceous\, Upper_Cre.tif; Groh_2023_croc_biogeo_Upper_Cre_area_coords9.txt; Groh_2023_croc_biogeo_Upper_Cretaceous_coordsall.txt))
|- Groh_2023_croc_biogeo_palaeocoordinates_data.xlsx [palaeocoordinates used in the K-means analysis; each sheet represents a different time bin\, each row a different occurrence and its palaeocoordinates that was used (paleolatitude and paleolongitude)]
|- A. RESULTS (Groh_2023_croc_biogeo_result_coordinates)
| |- Groh_2023_croc_biogeo_Lower_Cretaceous_area_coords9.txt [coordinates for the centres of the nine identified areas]
| |- Groh_2023_croc_biogeo_Lower_Cretaceous_coordsall.txt [all occurrence coordinates for the time bin]
| |- Groh_2023_croc_biogeo_Neogene_area_coords9.txt[coordinates for the centres of the nine identified areas]
| |- Groh_2023_croc_biogeo_Neogene_coordsall.txt [all occurrence coordinates for the time bin]
| |- Groh_2023_croc_biogeo_Paleogene_area_coords9.txt [coordinates for the centres of the nine identified areas]
| |- Groh_2023_croc_biogeo_Paleogene_coordsall.txt [all occurrence coordinates for the time bin]
| |- Groh_2023_croc_biogeo_Upper_Cre_area_coords9.txt [coordinates for the centres of the nine identified areas]
| |- Groh_2023_croc_biogeo_Upper_Cretaceous_coordsall.txt [all occurrence coordinates for the time bin]
Dataset.2: Biogeography files (Groh_2023_croc_biogeo_BIOGEOGRAPHY_FILES.zip)
|- A. Groh et al. 2022 files (Groh_2023_croc_biogeo_G21_files)
| |- a. Results of the BALE analysis (Groh_2023_croc_biogeo_BALE_G21_results)
| | |- i. Results for the cal3 trees (Groh_2023_croc_biogeo_BALE_G21_cal3)
| | | |- Groh_2023_croc_biogeo_BALE_G21_cal3_Run01.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_G21_cal3_Run02.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_G21_cal3_Run03.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_G21_cal3_Run04.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_G21_cal3_Run05.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_Results_G21_cal3.xlsx [concatenated results of the analysis. Each row represents one of the major neosuchian groups (according to the definitions in Groh_2023_croc_biogeo_CLADE_DEFINITIONS. The columns represent the geographic locations (abbreviations the same as in Figure 1 of the main paper)\, the palaeolatitude and paleolongitude of the reconstructed location of the node\, age of the node and number of the node in the phylogenetic tree used]
| | |- ii. Results for the FBD trees (Groh_2023_croc_biogeo_BALE_G21_FBD)
| | | |- Groh_2023_croc_biogeo_BALE_G21_FBD_Run01.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_G21_FBD_Run02.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_G21_FBD_Run03.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_G21_FBD_Run04.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_G21_FBD_Run05.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_Results_G21_FBD.xlsx [concatenated results of the analysis. Each row represents one of the major neosuchian groups (according to the definitions in Groh_2023_croc_biogeo_CLADE_DEFINITIONS. The columns represent the geographic locations (abbreviations the same as in Figure 1 of the main paper)\, the palaeolatitude and paleolongitude of the reconstructed location of the node\, age of the node and number of the node in the phylogenetic tree used]
| | |- Groh_2023_croc_biogeo_BALE_Code.txt
| |- b. Results of the Distance-based BGB analysis (Groh_2023_croc_biogeo_BGB_Distance_G21_results)
| | |- i. Results for the cal3 trees (Groh_2023_croc_biogeo_Distance_cal3_G21)
| | | |- Groh_2023_croc_biogeo_BAYAREALIKE.pdf
| | | |- Groh_2023_croc_biogeo_DEC.pdf
| | | |- Groh_2023_croc_biogeo_DIVALIKE.pdf
| | | |- Groh_2023_croc_biogeo_restable_AIC_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_restable_AICc_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_teststable.txt
| | |- ii. Results for the FBD trees (Groh_2023_croc_biogeo_Distance_FBD_G21)
| | | |- Groh_2023_croc_biogeo_BAYAREALIKE.pdf
| | | |- Groh_2023_croc_biogeo_DEC.pdf
| | | |- Groh_2023_croc_biogeo_DIVALIKE.pdf
| | | |- Groh_2023_croc_biogeo_restable_AIC_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_restable_AICc_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_teststable.txt
| | |- Groh_2023_croc_biogeo_distances_explanation.txt [explanation for distance calculations and area abbreviations]
| | |- Groh_2023_croc_biogeo_distances_matrix.txt [geographic distances for the distance-based BGB analysis]
| | |- Groh_2023_croc_biogeo_timeperiods.txt [time periods used for the distance-based BGB analysis]
| |- c. Results of the Unconstrained BGB analysis (Groh_2023_croc_biogeo_BGB_Unconstrained_G21_results)
| | |- i. Results for the cal3 trees (Groh_2023_croc_biogeo_Unconstrained_cal3)
| | | |- Groh_2023_croc_biogeo_BAYAREALIKE.pdf
| | | |- Groh_2023_croc_biogeo_DEC.pdf
| | | |- Groh_2023_croc_biogeo_DIVALIKE.pdf
| | | |- Groh_2023_croc_biogeo_restable_AIC_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_restable_AICc_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_teststable.txt
| | |- ii. Results for the FBD trees (Groh_2023_croc_biogeo_Unconstrained_FBD)
| | | |- Groh_2023_croc_biogeo_BAYAREALIKE.pdf
| | | |- Groh_2023_croc_biogeo_DEC.pdf
| | | |- Groh_2023_croc_biogeo_DIVALIKE.pdf
| | | |- Groh_2023_croc_biogeo_restable_AIC_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_restable_AICc_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_teststable.txt
| |- Groh_2023_croc_biogeo_BGB_Distance_script.txt [Code used in the Distance-based BGB analysis. All run on R 4.2.2. The following data files were used with this script: Groh_2023_croc_biogeo_NeosuchiaGeog_G21.data\, Groh_2023_croc_biogeo_tree_G21_*.newick [*FBD or cal3 depending on the analysis]\, Groh_2023_croc_biogeo_distances_matrix.txt\, Groh_2023_croc_biogeo_timeperiods.txt]
| |- Groh_2023_croc_biogeo_BGB_Unconstrained_script.txt [Code used in the Unconstrained BGB analysis. All run on R 4.2.2. The following data files were used with this script: Groh_2023_croc_biogeo_NeosuchiaGeog_G21.data\, Groh_2023_croc_biogeo_tree_G21_*.newick [*FBD or cal3 depending on the analysis]]
| |- Groh_2023_croc_biogeo_NeosuchiaCoord_G21.txt [coordinates for the BALE analysis]
| |- Groh_2023_croc_biogeo_NeosuchiaGeog_G21.data [geographic data for the BGB analysis]
| |- Groh_2023_croc_biogeo_tree_G21_cal3.newick [cal3 tree file]
| |- Groh_2023_croc_biogeo_tree_G21_FBD.newick [FBD tree file]
|- B. Stockdale & Benton 2021 files (Groh_2023_croc_biogeo_SB21_files)
| |- a. Results of the BALE analysis (Groh_2023_croc_biogeo_BALE_SB21)
| | |- i. Results for the cal3 trees (Groh_2023_croc_biogeo_BALE_cal3)
| | | |- Groh_2023_croc_biogeo_BALE_cal3_Results.xlsx [concatenated results of the analysis. Each row represents one of the major neosuchian groups (according to the definitions in Groh_2023_croc_biogeo_CLADE_DEFINITIONS. The columns represent the geographic locations (abbreviations the same as in Figure 1 of the main paper)\, the palaeolatitude and paleolongitude of the reconstructed location of the node\, age of the node and number of the node in the phylogenetic tree used]
| | | |- Groh_2023_croc_biogeo_BALE_cal3_Run01.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_cal3_Run02.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_cal3_Run03.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_cal3_Run04.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_cal3_Run05.AncStates.txt
| | |- ii. Results for the FBD trees (Groh_2023_croc_biogeo_BALE_FBD)
| | | |- Groh_2023_croc_biogeo_BALE_FBD_Results.xlsx [concatenated results of the analysis. Each row represents one of the major neosuchian groups (according to the definitions in Groh_2023_croc_biogeo_CLADE_DEFINITIONS. The columns represent the geographic locations (abbreviations the same as in Figure 1 of the main paper)\, the palaeolatitude and paleolongitude of the reconstructed location of the node\, age of the node and number of the node in the phylogenetic tree used]
| | | |- Groh_2023_croc_biogeo_BALE_FBD_Run01.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_FBD_Run02.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_FBD_Run03.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_FBD_Run04.AncStates.txt
| | | |- Groh_2023_croc_biogeo_BALE_FBD_Run05.AncStates.txt
| |- b. Results of the Distance-based BGB analysis (Groh_2023_croc_biogeo_BGB_Distance_SB21
| | |- i. Results for the cal3 trees (Groh_2023_croc_biogeo_BGB_Distance_cal3_SB21)
| | | |- Groh_2023_croc_biogeo_BAYAREALIKE.pdf
| | | |- Groh_2023_croc_biogeo_DEC.pdf
| | | |- Groh_2023_croc_biogeo_DIVALIKE.pdf
| | | |- Groh_2023_croc_biogeo_restable_AIC_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_restable_AICc_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_teststable.txt
| | |- ii. Results for the FBD trees (Groh_2023_croc_biogeo_BGB_Distance_FBD_SB21)
| | | |- Groh_2023_croc_biogeo_BAYAREALIKE.pdf
| | | |- Groh_2023_croc_biogeo_DEC.pdf
| | | |- Groh_2023_croc_biogeo_DIVALIKE.pdf
| | | |- Groh_2023_croc_biogeo_restable_AIC_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_restable_AICc_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_teststable.txt
| | |- Groh_2023_croc_biogeo_distances_explanation.txt [explanation for distance calculations and area abbreviations]
| | |- Groh_2023_croc_biogeo_distances_matrix.txt [distances between land masses used for the distance-based BGB analysis]
| | |- Groh_2023_croc_biogeo_timeperiods.txt [time periods used for the distance-based BGB analysis]
| |- c. Results of the Unconstrained BGB analysis (Groh_2023_croc_biogeo_BGB_Unconstrained_SB21
| | |- i. Results for the cal3 trees (Groh_2023_croc_biogeo_Unconstrained_cal3_SB21)
| | | |- Groh_2023_croc_biogeo_BAYAREALIKE.pdf
| | | |- Groh_2023_croc_biogeo_DEC.pdf
| | | |- Groh_2023_croc_biogeo_DIVALIKE.pdf
| | | |- Groh_2023_croc_biogeo_restable_AIC_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_restable_AICc_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_teststable.txt
| | |- ii. Results for the FBD trees (Groh_2023_croc_biogeo_Unconstrained_FBD_SB21)
| | | |- Groh_2023_croc_biogeo_BAYAREALIKE.pdf
| | | |- Groh_2023_croc_biogeo_DEC.pdf
| | | |- Groh_2023_croc_biogeo_DIVALIKE.pdf
| | | |- Groh_2023_croc_biogeo_restable_AIC_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_restable_AICc_rellike_formatted.txt
| | | |- Groh_2023_croc_biogeo_teststable.txt
| |- Groh_2023_croc_biogeo_BGB_Distance_script.txt [Code used in the Distance-based BGB analysis. All run on R 4.2.2. The following data files were used with this script: Groh_2023_croc_biogeo_NeosuchiaGeogSB.data\, Groh_2023_croc_biogeo_tree_SB21_*.newick [*FBD or cal3 depending on the analysis]\, Groh_2023_croc_biogeo_distances_matrix.txt\, Groh_2023_croc_biogeo_timeperiods.txt]]
| |- Groh_2023_croc_biogeo_BGB_Unconstrained_script.txt [Code used in the Unconstrained BGB analysis. All run on R 4.2.2. The following data files were used with this script: Groh_2023_croc_biogeo_NeosuchiaGeogSB.data\, Groh_2023_croc_biogeo_tree_SB21_*.newick [*FBD or cal3 depending on the analysis]]
| |- Groh_2023_croc_biogeo_NeosuchiaCoordSB.txt [coordinates for the BALE analysis]
| |- Groh_2023_croc_biogeo_NeosuchiaGeogSB.data [geographic data for the BGB analysis]
| |- Groh_2023_croc_biogeo_tree_SB21_cal3.newick
| |- Groh_2023_croc_biogeo_tree_SB21_FBD.newick
Dataset.3: Transoceanic dispersal files (Groh_2023_croc_biogeo_TRANSOCEANIC_DISPERSAL_FILES.zip)
|- Groh_2023_croc_biogeo_transoceanic_dispersals.docx [word document detailing all potential transoceanic dispersal events counted]
|- Groh_2023_croc_biogeo_transoceanic_G21_Distance_cal3.png
|- Groh_2023_croc_biogeo_transoceanic_G21_Distance_FBD.png
|- Groh_2023_croc_biogeo_transoceanic_G21_Unconstrained_FBD.png
|- Groh_2023_croc_biogeo_transoceanic_Results_Summary.xlsx [chi-square value calculation results on sheet 1. The top row is the group whose significance is measured\, against the group in the second row. For example\, cell D3 shows that there is a significantnly different number of trans-oceanic dispersal events in Alligatoroidea\, compared to Crocodylia (if minimum dispersal is assumed for both). Each row represents a different set of trees.
Sheet 2 is the Excel sheet used for the actual calculations. The yellow cells are the cells were values are input; the sheet calculates the rest, giving the two chi-squared values for different scenarios (trans-oceanic dispersal events in Alligatoroidea significantly different than expected; Crocodylia/saltwater-tolerant group significantly different than expected)
|- Groh_2023_croc_biogeo_transoceanic_SB21_Distance_cal3.png
PACKAGE VERSIONS USED FOR ANALYSES
All R analyses run in R 4.2.2. Package versions:
ape 5.6.2
beepr 1.3
BioGeoBEARS 1.1.2
Claddis 0.6.6
cladoRcpp 0.15.1
cluster 2.1.4
devtools 2.4.5
FD 1.0.12.1
foreach 1.5.2
GenSA 1.1.7
ggplot2 3.3.6
paleoMap 0.0.0.9001
paleotree 3.4.4
parallel 4.2.2
phytools 1.2
raster 3.6.23
rexpokit 0.26.6.7
rgdal 1.6.7
snow 0.4.4
tictoc 1.1
METHODOLOGICAL INFORMATION
Methods explained in detail in Groh, S.S., Upchurch, P., Barrett, P.M. and Day, J.J., 2023. The biogeographic history of neosuchian crocodiles and the impact of saltwater tolerant/intolerant physiologies. Royal Society Open Science [in press]
Methods
For methodological details, see Groh, S.S., Upchurch, P., Barrett, P.M. and Day, J.J., 2023. The biogeographic history of neosuchian crocodiles and the impact of saltwater tolerant/intolerant physiologies [under review].