Vegetation structure and climate shape mountain arthropod distributions across trophic levels
Data files
Aug 02, 2024 version files 482.59 KB
-
Data_Dryad.zip
471.90 KB
-
README.md
10.69 KB
Abstract
Arthropods play a vital role in ecosystems, yet their distributions remain poorly understood, particularly in mountainous regions. This study delves into the modeling of the distribution of 31 foliar arthropod genera in the French Alps, using a comprehensive approach encompassing multi-trophic sampling, community DNA metabarcoding, and random forest models. The results underscore the significant importance of vegetation structure, such as herbaceous vegetation density, and forest density and heterogeneity, along with climate, in shaping the distributions of most arthropods. These responses to environmental gradients are consistent across trophic groups, with the exception of nectarivores, whose distributions are more sensitive to landscape structure and water availability. By leveraging community DNA metabarcoding, this study sheds light on the understudied drivers of arthropod distributions, emphasizing the importance of modeling across diverse trophic groups to anticipate arthropod responses to global change.
[https://doi.org/10.5061/dryad.wdbrv15xr]
This dataset contains data and script necessary to reproduce all analyses of the manuscript "Vegetation structure and climate shape mountain arthropod distributions across trophic levels". This study delves into the modeling of the distribution of 31 foliar arthropod genera in the French Alps, using a comprehensive approach encompassing multi-trophic sampling, community DNA metabarcoding, and random forest models.
Description of the data and file structure
Data used in the main analyses:
- CO1c.MTROPH.rds, a 'metabarlist' sensu metabaR package, is an R list built of 4 arrays respectively describing reads, motus, PCRs, and sample metadata. With this list, the script 1_Prepare_mblists_for_models_2.R allows us to obtain the inputs necessary for distribution models and other analyses (e.g. genus-plot matrix).
- $reads is a table of the class matrix describing the number of reads of each of the 3632 MOTUs (columns) for each of the 237 biological replicates taken (lines). The number of reads for each MOTU in each biological replicate is given in each cell, with 0 corresponding to no reads.
For the main analyses, MOTUs are aggregated by genera and the biological replicates by plot, in script 1_Prepare_mblists_for_models_2.R. - $motus is a table of class data.frame where the MOTUs are listed in a row and their attributes in a column. Here we provide only attributes that can be interesting for the analyses.
- $count = total number of reads of the focal MOTU
- $kingdom, $phylum, $class, $order_name, $family_name, $genus_name, $species_name = taxonomic information of the focal MOTU. Sequences that cannot be assigned to a given taxonomic level are NA in these columns.
- $path = full taxonomic path of the focal MOTU
- $taxid_by_db.fwhF2fwhR2nCOI.DB = taxid of the focal MOTU, from the database of the reference sequences for the CO1 region extracted from the EMBL database version 142 by in silico PCR (ecoPCR, Ficetola et al., 2010) using the “fwhF2” & “fwhR2n'' primer sequences (see Materials and Methods of the manuscript for more details)
- $sequence = sequence of the focal MOTU
- $FINAL_TROPHIC_CLASS = non corrected trophic class. Sequences that cannot be assigned to a trophic class are NA in this column (e.g. for example, because knowledge of the taxon's ecology is insufficient, or because the sequence has not been assigned to a sufficiently precise taxonomic level).
- $FINAL_T.CL_REF = reference for the trophic class. Sequences that cannot be assigned to a trophic class are NA in this column.
- $FINAL_T.CL_MATCH_LEVEL = taxonomic level of assignation for the trophic class. Sequences that cannot be assigned to a trophic class are NA in this column.
- $pcrs is a table of class data.frame describing each PCR (row) and its attributes (columns).
- $experiment = name of the experiment
- $primer_fwd, $primer_rev = forward and reverse primer sequences
- $tag_fwd, $tag_rev = forward and reverse tag sequences
- $plate_no, $plate_col, $plate_row = number and coordinates of the sample on PCR plate
- $type, $control**type = all are samples at this stage, therefore all control types are NA. NA is used if type="sample", i.e., for any PCR obtained from a biological sample.
- $sample_id = name of the biological replicate
- $samples is a table of class data.frame describing each biological replicate (row) and its attributes (columns).
- $type = all are bulks
- $idplot, $codeplot = names of the plot where the biological replicates were taken
- $codeTube, $sample_id = identifiers of the biological replicates
- $reads is a table of the class matrix describing the number of reads of each of the 3632 MOTUs (columns) for each of the 237 biological replicates taken (lines). The number of reads for each MOTU in each biological replicate is given in each cell, with 0 corresponding to no reads.
- raw_environmental_data.rds, a table of class data.frame of the raw environmental predictors.
- $HOBO_T_seasonality = T. seasonality. We computed the annual temperature range as the standard deviation of the monthly mean temperatures * 100 from sensors permanently recording soil temperature in the centre of ORCHAMP plots). NA corresponds to lost sensors.
- $HOBO_avT_coldQ = Average T. cold quarter. We computed the average temperature of the three coldest months of the year to characterize frost exposure from sensors permanently recording soil temperature in the centre of ORCHAMP plots). NA corresponds to lost sensors.
- $GDD_airT.sum.mean = Growing Degree Days (GDD). We calculated Growing Degree Days (GDD) to characterize heat accumulation, as the sum of the average daily degrees above zero accumulated over the growing season each year, averaged over 10 years (i.e., 2009-2018), and modeled in the first soil horizon (to 10cm depth), from the SAFRAN- SURFEX/ISBA-Crocus-MEPRA reanalysis.
- $solar.radiation.sum.mean = Solar radiations. We considered the cumulative solar radiation over each growing season, averaged over the period 2009-2018, from the SAFRAN- SURFEX/ISBA-Crocus-MEPRA reanalysis)
- $mean_MO_2mm = Soil organic matter (SOM)
- $mean_pH = Soil pH
- $RU_mm_10cm = We obtained topsoil water retention capacity (TWRC) using the coarse element content and the water retention capacities, estimated by pedotransfer functions, at field capacity and the wilting point. NA in this column corresponds to data not collected (due to missing Coarse elements content).
- $Pierrosite_pourc_10cm = The weight content (%) of coarse elements (Coarse elements content) in the topsoil (0-10cm) was obtained after air-drying, sieving to 2mm and weighing the fraction > 2mm (coarse fragments). NA in this column corresponds to data not collected.
- $tot.nb.contacts = To measure the density of the herbaceous stratum, we calculated the total number of contacts for each plot (Veg. density), from 300 pin-point positions per plot.
- $CWM.SLAm = community weighted mean of specific leaf area (CWM SLA), from species-level mean traits from our measurements, which covered over 85% of the recorded species
- $CWM.Height = community weighted mean of plant height (CWM Height), from species-level mean traits from our measurements, which covered over 85% of the recorded species
- $CWM.LANDOLT_MOIST = community weighted mean of the Landolt moisture value of the herbaceous stratum (CWM Moisture).
- $surf.terr.small, $surf.terr.big = We characterized forest density by calculating both the sum of the basal area of small (diameter < 7.5cm) and of large trees per square meter (Surf. of small - respectively -* large trees).***
- $Gini_small = We characterized the heterogeneity of the forest structure by applying the Gini coefficient to all tree trunks’ diameter classes combined (Forest struct. heterogeneity)
- $eauvive = proximity of watercourses (Prox. of waterc.), which represents the area covered by watercourses within a 100m radius, using the BD TOPAGE "cours d'eau" shapefile ([https://geo.data.gouv.fr/fr/datasets/665ec762431a3aa7d8c34f9ca058c1e2792dcb7e].
- $Shannon_landscape = indicator of landscape heterogeneity (Landcover heterogeneity) = Shannon entropy diversity index of the relative proportion of landscape elements from OSO land cover map at 10m, in a 100m radius around each plot
- $Shannon_clust30m, $Shannon_clust200m = We collected aerial images from the IGN BDOrtho database (https://geoservices.ign.fr/bdortho), which provides color (RGB) and near-infrared (IR) reflectance at 20cm resolution. For each plot, we extracted images at 30m and at 100m to cover the heterogeneity within plots and in their vicinity, respectively. The goal was to identify spectral clusters representing visual patterns in the images that are consistently observed across all plot landscapes. For each plot, we then computed the relative cover of different spectral clusters and derived spectral heterogeneity using the Shannon index within and around the plot (Plot –respectively- Plot vici. spectral heterogeneity).
For more details see the manuscript methods.
- CO1c_genus_trophic_class_corrections.csv, corrections at the genus level of the trophic assignation of the 31 modeled genera. For more details see the manuscript methods and supplementary materials, Appendix 5. In this table, when the information about the trophic class assigned at the family level was already adequate, NA appears in the trophic class correction columns at the genus level.
Data used in the supplementary analyses:
- arthro_sampling_dates.csv, table of class data.frame of the precise dates of sampling
- $codeplot = name of the plot
- $Sampling_date = sampling date in format DD/MM/YYYY
- $Julian_day = sampling date in Julian day
- environmental_data_i.rds, a data frame resulting from script 2_Prepare_env_data_for_models.R necessary to run supplementary analyses directly, with inputted missing environmental data.
Sharing/Access information
All data of ORCHAMP projects are available directly or on demand via https://orchamp.osug.fr/
Code/Software
We performed all analyses in the manuscript using R version 4.2.2 via the scripts we provide here.
The scripts should be run following the numeration, from 1 to 8. User should change the object "root" with their work directory.
All paths are at the beginning of the scripts and are relative. They could be changed according to user preferences, or run in a directory with classic folder architecture (1_scripts/, 2_data/, 3_results/, and 4_figures/)
Scripts used in the main analyses:
- 0_FUNCTIONS.R, script containing all necessary self-made functions, sourced at the beginning of each of the other scripts
- 1_Prepare_mblists_for_models_2.R & 2_Prepare_env_data_for_models.R, preparation scripts
- 3_Random_forests_genera.R, random forests
- 4_Synthese_RF_on_modeled_genus.R, preparation for figures
- 5_Figures_RF.R, main figures of the paper
Scripts used in the supplementary analyses:
- 6_OTU_and_community_level_analyses_aphadiv.R, supplementary analysis of alpha diversity
- 7_OTU_and_community_level_analyses_aphadiv_Figures.R, figures for the supplementary analysis of alpha diversity
- 8_OTU_and_community_level_analyses_NMDS_2.R, supplementary analysis of beta-diversity with figure inside
