Data for: Latitudinal gradients in the species diversity of Japanese earthworms
Data files
May 14, 2026 version files 891.41 KB
-
28Sfortr2.nwk
33.53 KB
-
3genesfortr2andASAP.nwk
33.08 KB
-
a_diversity_Faith_Lum.csv
4.35 KB
-
a_diversity_Faith_Meg.csv
8.91 KB
-
a-gLum.csv
1.32 KB
-
All_samples_used_in_this_study.xlsx
396.79 KB
-
b_diversity_Lum_sor.csv
4.29 KB
-
b_diversity_Meg_sor.csv
8.96 KB
-
beta_diversity.txt
8.21 KB
-
COIfortr2.nwk
32.97 KB
-
COItreeforPTP.nwk
106.17 KB
-
distribution_Meg_no0.001.csv
3.58 KB
-
f-aLum.csv
2.51 KB
-
f-gLum.csv
2.55 KB
-
f-sLum.csv
3.70 KB
-
gamma_alpha_diversity.txt
3.23 KB
-
Genetic_differentiation.txt
476 B
-
ITSfortr2.nwk
32.89 KB
-
Latitudinal_distribution_range.txt
311 B
-
MOTUtable_Lum01.csv
1.45 KB
-
MOTUtable_Meg01.csv
30.10 KB
-
MOTUtableA21sites_Lum.csv
577 B
-
MOTUtableA21sites_Lum01.csv
563 B
-
MOTUtableA22sites_Meg.csv
6.22 KB
-
MOTUtableA22sites_Meg01.csv
6.20 KB
-
MOTUtableB18sites_Lum.csv
497 B
-
MOTUtableB18sites_Lum01.csv
497 B
-
MOTUtableB33sites_Meg.csv
8.77 KB
-
MOTUtableB33sites_Meg01.csv
8.73 KB
-
MOTUtableC16sites_Lum.csv
450 B
-
MOTUtableC16sites_Lum01.csv
442 B
-
MOTUtableC16sites_Meg.csv
4.82 KB
-
MOTUtableC16sites_Meg01.csv
4.81 KB
-
MOTUtableD18sites_Meg.csv
5.30 KB
-
MOTUtableD18sites_Meg01.csv
5.28 KB
-
MOTUtableD7sites_Lum01.csv
254 B
-
MOTUtableDEF7sites_Lum.csv
254 B
-
MOTUtableE17sites_Meg.csv
5.07 KB
-
MOTUtableE17sites_Meg01.csv
5.05 KB
-
MOTUtableF20sites_Meg.csv
5.75 KB
-
MOTUtableF20sites_Meg01.csv
5.73 KB
-
MOTUtableregion_Lum.csv
197 B
-
MOTUtableregion_Lum01.csv
187 B
-
MOTUtableregion_Meg.csv
2.61 KB
-
MOTUtableregion_Meg01.csv
2.52 KB
-
OTUs_used_from_Genbank.xlsx
46.49 KB
-
PGLS.txt
2.09 KB
-
README.md
9.63 KB
-
s-aLum.csv
2.47 KB
-
s-gLum.csv
2.51 KB
-
tr2sp_BEAST_Burn-in125000000.nex
4.86 KB
-
traits.csv
21.80 KB
-
traitsgenedist.csv
1.39 KB
Abstract
Although a latitudinal gradient of species diversity is known for diverse animal groups, few detailed studies on this topic exist for soil animals. Our study aimed to clarify the formation process of latitudinal gradients in the species diversity of soil animals in terms of dispersal and evolutionary processes using terrestrial earthworms. We used 4074 earthworms collected from 131 sites in the Japanese archipelago between 31.7°N and 45.1°N. Because morphological classification was difficult for these earthworms, especially for juveniles, we employed a DNA-based method for species delimitation. We used the DNA data for phylogenetic diversity indices and population genetic analysis in subsequent community-ecological analyses. We analysed latitudinal changes in local and regional species diversity and the replacement and nested structures among species assemblages using phylogenetic diversity indices and examined whether environmental factors and the ecological traits of earthworms related to dispersal ability contribute to geographic diversity patterns. Our earthworm samples comprised 113 megascolecid and nine lumbricid molecular operational taxonomic units (MOTUs). The species assemblage of Megascolecidae presented higher γ-, β- and α-diversity at lower latitudes affected by temperature, precipitation and snow depth. Overall β-diversity was greater at lower latitudes, reflecting greater spatial turnover due to greater γ-diversity with more local species at lower latitudes. In contrast, relatively high nestedness was observed at highest latitudes, where γ-diversity and overall β-diversity were lowest and spatial turnover was minimal, suggesting that high-latitude species assemblages were formed through the range expansion of a subset of species from lower latitudes. Such range expansion was likely facilitated for potentially parthenogenetic species. In contrast, Lumbricidae presented greater γ-diversity at higher latitudes and a converse nested structure from high to low latitudes. Our study demonstrated that DNA-based species delimitation is necessary to understand the exact geographic diversity pattern and its formation process for organisms whose morphological classification is difficult. The contradistinctive species-assemblage patterns between related earthworm groups might be considered to have resulted from their low dispersal ability, different biogeographical histories, and characteristic topography in the Japanese Archipelago, which covers a latitudinally wide range.
Description of the data and file structure
Raw data used in the study
OTUs_used_from_Genbank.xlsx: The OTUs used from the GenBank database. In the outgroup column, OTUs with "+" are the OTUs used as outgroups, while OTUs with "-" are those not used as outgroups. The family names identified by Claident are shown. When the family names of the OTUs could not be identified by Claident, the morphological identification results are shown (shown as "*"). In PTPsp column, the OTUs with "n/a" are the OTUs used as outgroups. In tr2sp and family name columns, the OTUs with "n/a" are the OTUs not used for tr2sp or Claident.
All_samples_used_in_this_study.xlsx: All samples used in this study. Samples with "n/a" in sample ID and/or site ID columns are the samples not used for community analysis. Samples with "○" in the analysis columns were used for each analysis, while those with "-" in the analysis columns were not used for each analysis. In the PTPsp column, the OTUs with "n/a" are the OTUs used as outgroups. In tr2sp column, the OTUs with "n/a" are the OTUs not used for tr2sp. In species name column, "-" represents that the sample was not identified. The name of the species previously reported to be potentially parthenogenetic (Kobayashi, 1938; Ishizuka and Minagoshi, 2014; Yanagido and Yanagido, 2016; Minamiya, 2017; https://japanese-mimizu.jimdofree.com/; accessed January 21 2026) are indicated with "*". In the male pore column, "○" indicates that the individual has a male pore, "×" indicates that the individual does not have a male pore, and "-" indicates that the individual was not examined. For example, ○/× indicates that the individuals have a male pore on the left and no male pore on the right. In the intestinal cecum column, "-" means that the individual was not eaxmined. In the ITS-28S accession number column, sequences are shown when the sequences are fewer than 100, and "-" means that the ITS-28S sequence was not obtained for the individual.
Phylogenetic trees used for species delimitation
COItreeforPTP.nwk: The phylogenetic tree constructed by COI gene sequence used for PTP analysis.
3genesfortr2andASAP.nwk: The phylogenetic tree constructed by COI, ITS and 28s gene sequences used for tr2 and ASAP analyses.
COIfortr2.nwk: The phylogenetic tree constructed by COI gene sequence used for tr2 analysis.
ITSfortr2.nwk: The phylogenetic tree constructed by ITS gene sequence used for tr2 analysis.
28Sfortr2.nwk: The phylogenetic tree constructed by 28S gene sequence used for tr2 analysis.
The data used in statistical analysis
The test for gamma- and alpha-diversity
In MOTUtables, alphabets at the left column indicate site IDs. ElevationVariance_radius100km indicates the variance in elevation within a radius of 100 kmOther variables in csv files are explained at the traits.csv description.
gamma_alpha_diversity.txt: The script for calculating Faith's PD and performing GLM.
tr2sp_BEAST_Burn-in125000000.nex: The species tree constructed in this study.
MOTUtableA22sites_Meg.csv: The data of Megascolecidae at each site in region A for calculating Faith's PD at each site.
MOTUtableB33sites_Meg.csv: The data of Megascolecidae at each site in region B for calculating Faith's PD at each site.
MOTUtableC16sites_Meg.csv: The data of Megascolecidae at each site in region C for calculating Faith's PD at each site.
MOTUtableD18sites_Meg.csv: The data of Megascolecidae at each site in region D for calculating Faith's PD at each site.
MOTUtableE17sites_Meg.csv: The data of Megascolecidae at each site in region E for calculating Faith's PD at each site.
MOTUtableF20sites_Meg.csv: The data of Megascolecidae at each site in region F for calculating Faith's PD at each site.
MOTUtableregion_Meg.csv: The data of Megascolecidae at each region for calculating Faith's PD at each region.
MOTUtableA21sites_Lum.csv: The data of Lumbricidae at each site in region A for calculating Faith's PD at each site.
MOTUtableB18sites_Lum.csv: The data of Lumbricidae at each site in region B for calculating Faith's PD at each site.
MOTUtableC16sites_Lum.csv: The data of Lumbricidae at each site in region C for calculating Faith's PD at each site.
MOTUtableDEF7sites_Lum.csv: The data of Lumbricidae at each site in region D, E and F for calculating Faith's PD at each site.
MOTUtableregion_Lum.csv: The data of Lumbricidae at each region for calculating Faith's PD at each region.
a_diversity_Faith_Meg.csv: The alpha-diversity data of Megascolecidae used for GLM.
a_diversity_Faith_Lum.csv: The alpha-diversity data of Lumbricidae used for GLM.
The test for beta-diversity
In MOTUtables, alphabets at the left column indicate site IDs. ElevationVariance_radius100km indicates the variance in elevation within a radius of 100 kmOther variables in csv files are explained at the traits.csv description.
beta_diversity.txt: The script for calculating beta-diversity indices and performing GLM.
tr2sp_BEAST_Burn-in125000000.nex: The species tree constructed in this study.
MOTUtableA22sites_Meg01.csv: The data of Megascolecidae at each site in region A for calculating beta-diversity indices.
MOTUtableB33sites_Meg01.csv: The data of Megascolecidae at each site in region B for calculating beta-diversity indices.
MOTUtableC16sites_Meg01.csv: The data of Megascolecidae at each site in region C for calculating beta-diversity indices.
MOTUtableD18sites_Meg01.csv: The data of Megascolecidae at each site in region D for calculating beta-diversity indices.
MOTUtableE17sites_Meg01.csv: The data of Megascolecidae at each site in region E for calculating beta-diversity indices.
MOTUtableF20sites_Meg01.csv: The data of Megascolecidae at each site in region F for calculating beta-diversity indices.
MOTUtableregion_Meg01.csv: The data of Megascolecidae for calculating beta-diversity indices between regions.
MOTUtable_Meg01.csv: The data of Megascolecidae for calculating beta-diversity indices between sites.
MOTUtableA21sites_Lum01.csv: The data of Lumbricidae at each site in region A for calculating beta-diversity indices.
MOTUtableB18sites_Lum01.csv: The data of Lumbricidae at each site in region B for calculating beta-diversity indices.
MOTUtableC16sites_Lum01.csv: The data of Lumbricidae at each site in region C for calculating beta-diversity indices.
MOTUtableD7sites_Lum01.csv: The data of Lumbricidae at each site in region D for calculating beta-diversity indices.
MOTUtableregion_Lum01.csv: The data of Lumbricidae for calculating beta-diversity indices between regions.
MOTUtable_Lum01.csv: The data of Lumbricidae for calculating beta-diversity indices between sites.
b_diversity_Meg_sor.csv: The beta diversity data of Megascolecidae used for GLM.
b_diversity_Lum_sor.csv: The beta diversity data of Lumbricidae used for GLM.
f-sLum.csv, f-aLum.csv, f-gLum.csv, s-aLum.csv, s-gLum.csv, a-gLum.csv: The data used in post hoc tests for vegetation in Lumbricidae.
The test for the differences in latitudinal distribution range among species
Variables in distribution_Meg_no0.001.csv are explained at the traits.csv description.
beta_diversity.txt: The script for calculating beta-diversity indices and performing GLM.
Latitudinal_distribution_range.txt: The script for performing GLM. The latitudinal distribution range of each MOTU was used as responsible variable, and the highest latitude among the collected sites of each MOTU or the mean latitude among the collected sites of each MOTU was used as the explanatory variable.
distribution_Meg_no0.001.csv: The data of Megascolecidae used for GLM.
The test to examine whether the latitudinal distribution and habitat ranges were explained by earthworm traits
PGLS.txt: The script for performin PGLS. The latitudinal distribution and habitat ranges were used as responsible variable, and the habitat layer, reproductive mode and body weight were used as explanatory variables.
traits.csv: The data used for PGLS. Empty cells mean that these trait values were not obtained, and were not included for analysis. "LN_" in variable names means that the varialbes were ln-transformed. Latitude_high_Latitude_low: latitudinal distribution range. Latitude_high, Latitude_low (°N): highest or lowest latitude across the collection sites for each MOTU, respectively. MaxTemp_high_MinTemp_low: the difference between the highest annual maximum temperature and the lowest annual minimum temperature across the sites where each MOTU was collected. MaxTemp_high, MinTemp_low (°C): the highest annual maximum temperature or the lowest annual minimum temperature across the sites where each MOTU was collected, respectively. TotalRain_high_TotalRain_low, SnowDepth_high_SnowDepth_low: the differences between the maximum and minimum values across the collection sites for each MOTU. TotalRain_high, TotalRain_low (mm), SnowDepth_high, SnowDepth_low (cm): highest or lowest values across the collection sites for each MOTU, respectively. Vegetation: the number of different vegetation types from which each MOTU was collected.
tr2sp_BEAST_Burn-in125000000.nex: The species tree constructed in this study.
The test to examine whether the genetic differentiation was explained by earthworm traits
AmongPopratio in traitsgenedist.csv indicates the proportion of molecular variance among populations. Other variables are explained at the traits.csv description.
Genetic_differentiation.txt: The script for performing GLM.
traitsgenedist.csv: The data used for GLM.
