Data from: Model-aided climate adaptation for future maize in the U.S.
Data files
Mar 01, 2024 version files 110.69 GB
-
f2050.db.zip
-
f2100.db.zip
-
ideotype_data.tar.zip
-
present.db.zip
-
README.md
-
weatherdata.tar.gz
Abstract
Over the next three decades, rising population and changing dietary preferences are expected to increase food demand by 25-75%. At the same time climate is also changing –with potentially drastic impacts on food production. Breeding new crop characteristics and adjusting management practices are critical avenues to mitigate yield loss and sustain yield stability under a changing climate. In this study, we use a mechanistic crop model (MAIZSIM) to identify high-performing trait and management combinations that maximize yield and yield stability for different agroclimate regions in the US under present and future climate conditions. We show that morphological traits such as total leaf area and phenological traits such as grain-filling start time and duration are key properties that impact yield and yield stability; different combinations of these properties can lead to multiple high-performing strategies under present-day climate conditions. We also demonstrate that high performance under present-day climate does not guarantee high performance under future climate. Weakened trade-offs between canopy leaf area and reproductive start time under a warmer future climate led to shifts in high-performing strategies, allowing strategies with higher total leaf area and later grain-filling start time to better buffer yield loss and out-compete strategies with a smaller canopy leaf area and earlier reproduction. These results demonstrate that focused effort is needed to breed plant varieties to buffer yield loss under future climate conditions as these varieties may not currently exist, and showcase how information from process-based models can complement breeding efforts and targeted management to increase agriculture resilience.
README: Data in support of: Model-Aided Climate Adaptation for Future Maize in the U.S.
https://doi.org/10.5061/dryad.8w9ghx3v1
This dataset contains processed meterological data, databases containing the hourly output from 100 member ensembles of simulated maize growing seasons driven by the meteorology data, and post-processed output used to generate figures. We also archived a snapshot of the code used to generate the figures at the time of publication.
Maize simulation model
We used a process-based crop simulation model, MAIZSIM, to carry out our simulations across site-years in US maize growing regions. MAIZSIM is a mechanistic crop model that simulates key physiological, phenological, and physical processes such as leaf gas-exchange, canopy radiative transfer, carbon partitioning, water relations, nitrogen dynamics, and phenological development at the organ level to explain the whole-plant responses and crop level phenomena (Kim et al., 2012).
Meteorological data
We assembled hourly inputs of temperature, relative humidity, solar radiation, and precipitation throughout the growing season (roughly defined between February 1st –November 30th) as meteorological drivers for our simulations. First, we used data from the United States Department of Agriculture –National Agriculture Statistics Service (USDA- NASS, https://www.nass.usda.gov/Data_and_Statistics/index.php) to select rain-fed (less than 25% irrigation) maize cultivation sites with greater than 10,000 acres of maize planted. Next, we identified weather stations near these maize cultivation sites with available hourly data from both the Integrated Surface Hourly Data Base (https://www.ncei.noaa.gov/products/landbased-station/integrated-surface-database) and the National Solar Radiation Data Base (https://nsrdb.nrel.gov/data-sets/archives) over the years 1961-2005. We excluded site-years with more than two consecutive hours of missing data and gap-filled any remaining missing data by linearly interpolating the meteorological information from data points prior and post the missing data. Finally, we excluded sites with less than 15 years of data to ensure sufficient sampling to assess inter-annual climate variability (Soltani and Hoogenboom, 2003; Van Wart, Grassini, and Cassman, 2013). With this method, we compiled 1160 site-years of meteorological data spanning a total of 60 sites, each site with available weather data ranging from 15-27 years. See Fig. S2 in the manuscript for the final simulation sites.
Description of the data and file structure
Simulation output
Simulation output is stored in three SQLite database files, which contain hourly output for all variables over a growing season. The databases are provided in compressed zip files. We include code that queries these databases in our ideotype repository (https://doi.org/10.5281/zenodo.10606505).
○ Simulations using present-day meteorology: present.db.zip
○ Simulations using meteorology adjusted for 2100: f2100.db.zip
○ Simulations using meteorology adjusted for 2050: f2050.db.zip
See below for the database schema:
Parameter Table (params)
Entry Name | Long Description | Unit |
---|---|---|
run_name | run name of simulation experiments, part of primary key | |
cvar | cultivar number that represents specific param combinations | |
param | perturbed parameter | |
value | parameter value |
Site information table (site_info)
Entry Name | Long Description | Unit |
---|---|---|
site | site key | |
state | state | |
lat | latitude | |
lon | longitude | |
years | years of data | |
area | area of maize grown in the county at this location | acres |
perct_irri | percent of agricultural land irrigated in the county at this location | % |
texture | soil texture classificaiton |
Weather data table (weadata)
Entry Name | Long Description | Unit |
---|---|---|
year | year | |
site | site key | |
jday | julian day | |
time | hour in 24hr format | |
date | date in MM/DD/YYYY format | |
solar | solar radiation in units of Watt-hours per meter squared | Wh/m2 |
temp | surface temperature in units of Celsius | C |
precip | precipitation in units of millimeters per hour | mm/hr |
rh | relative humidity in units of percent | % |
co2 | carbon dioxide concentration in units of parts per million | ppm |
vpd | vapor pressure deficit in kilopascal | kPa |
Simulations table (sims)
Entry Name | Long Description | |
---|---|---|
cvar | cultivar number | |
year | year | |
site | site key | |
run_name | run category (present, 2050, or 2100) | |
jday | julian day | |
time | hour in 24hr format | |
date | date in MM/DD/YYYY format | |
leaves | leaf number | |
leaves_mature | number of mature leaves | |
leaves_dropped | number of dropped leaves | |
LA_perplant | total leaf area | m2 |
LA_dead | leaf area of dead leaves | m2 |
LAI | total leaf area index | m2 m-2 |
leaf_wp | leaf water potential | MPa |
temp_soil | soil temperature in Celsius | C |
temp_air | air temperature in Celsius | C |
temp_canopy | leaf canopy temperature in Celsius | C |
ET_dmd | evapotranspiration demand | g plant-1 hr-1 |
ET_sply | evapotranspiration supply | g plant-1 hr-1 |
Pn | whole plant net photosynthesis | g C plant-1 hr-1 |
Pg | whole plant gross photosynthesis | g C plant-1 hr-1 |
resp | respiration | g C plant-1 hr-1 |
av_gs | average stomatal conductanace | mol m-2 s-1 |
LAI_sun | leaf area index of sunlit leaves | m2 m-2 |
LAI_shade | leaf area index of shaded leaves | m2 m-2 |
PFD_sun | photon flux density for sunlit leaves | |
PFD_shade | photon flux density for shaded leaves | |
An_sun | net photosynthesis of sunlit leaves | umol CO2 m-2 s-1 |
An_shade | net photosynthesis of shaded leaves | umol CO2 m-2 s-1 |
Ag_sun | gross photosynthesis of sunlit leaves | umol CO2 m-2 s-1 |
Ag_shade | gross photosynthesis of shaded leaves | umol CO2 m-2 s-1 |
gs_sun | stomatal conducatnce of sunlit leaves | mol m-2 s-1 |
gs_shade | stomatal conducatnce of shaded leaves | mol m-2 s-1 |
VPD | vapor pressure deficit in kilopascal | kPa |
Nitr | total N in plant | mg N plant-1 |
N_Dem | N demand | g N plant-1 |
NUpt | N uptake from soil | g N plant-1 |
LeafN | Leaf N content | g N m-2 |
PCRL | Rate at which carbon would be supplied to growing roots in a soil slab fi all potential shoot growth had been satisfied | g day-1 |
DM_total | dry mass total | g plant-1 |
DM_shoot | dry mass of shoot | g plant-1 |
DM_ear | dry mass of ear | g plant-1 |
DM_leaf | dry mass of leaves | g plant-1 |
DM_stem | dry mass of stems | g plant-1 |
DM_root | dry mass of roots | g plant-1 |
AvailW | available water | |
solubleC | longterm C pool | |
Pheno | phenology stage |
Log Information Table (log_init)
Entry name | Long description |
---|---|
run_name | run name of simulation experiments |
init_yml | init.yaml file used for experiment |
path_inits | path where inits are stored for experiment |
path_params | path where params are stored for experiment |
path_jobs | path where jobs are stored for experiment |
path_sims | path where sims are stored for experiment |
path_maizsim | path pointing to the maizsim directory used |
siteyears | path pointing to the siteyears file used |
site_info | path pointing to the site_info file used |
site_summary | path pointing to site_summary file used |
pdate | planting date set for simulations |
version | ideotype version - git hash |
Post-processed simulation output
ideotype_data.tar.zip
post-processed data is stored in comma-delimited text files (csv) grouped in folders by data type, packaged in a tarball and compressed. These post-processed files were created using the "ideotype" repository (https://doi.org/10.5281/zenodo.10606505) and are used to make the plots in the manuscript using Python notebooks from the "upscale" repository (https://doi.org/10.5281/zenodo.10601180).
See below for a brief description on contents of each subfolder:
● climate_cluster: Simulation sites clustered by climate features (temp, vpd, precip, etc.). Used to analyze optimal crop features in different climate regions. Columns are defined as in the Site information table above.
● files: .yml files that include file paths pointing to files required to set up different simulation runs.
● inits: maizsim .yml initial files for different simulation runs.
● logs: maizsim log file from a specific run, used for debugging purposes.
● nass: USDA NASS data on corn planting area, yield, irrigated acre, etc. Units for corn_yield are in bushel acres, units for corn_area are in hectares. Latitude and longitude are in degrees.
● params: Default parameters (params_control.csv), perturbed parameters for present-day climate (params_present.csv), and top 100 optimal parameters (params_opt.csv) used as input in the MAIZSIM model.
● sims: Small subset of maizsim simulation output for testing purpose. Columns are defined as in the Simulations table above.
● sites: Simulation site locations and climate info summary (site_summary.csv), raw weather stations info (stations_info_9110.csv), weather station code mapping between WBAN & USAF (stations_wban_usaf.csv). The site summary file combines station information with area of corn grown in county (acres), area of irrigation in county (acres), and soil texture (see soils texture definitions under soils below).
● siteyears: All site-year combinations for fixed planting date simulation (siteyears_control_fixpd.csv), subset of site-year combinations for testing purpose. The test data includes a column for planting date (pdate).
● soils: SSURGO soil info (soils_nass.csv) and soil texture (soil_nass_texture.csv) for NASS sites. Soil textures are defined as Cl: Clay, SiCl: Silty Clay, SaCl: Sandy Clay, ClLo: Clay Loam, SiClLo: Silty Clay Loam, SaClLo: Sandy Clay Loam, Lo: Loam, SiLo: Silty Loam, SaLo: Sandy Loam, Si: Silt, LoSa: Loamy Sand, Sa: Sand. Columns of the soil info (soils_nass.csv) are defined as follows:
Soils Data Table
column name | Long Description | unit |
---|---|---|
cokey | A non-connotative string of characters used to uniquely identify a record in the Component table. | |
chkey | A non-connotative string of characters used to uniquely identify a record in the Horizon table. | |
prcent | The percentage of the component of the mapunit. | percent |
slope_r | Representative value of the difference in elevation between two points, expressed as a percentage of the distance between those points. | percent |
slope | High value of the difference in elevation between two points, expressed as a percentage of the distance between those points. | percent |
hzname | The concatenated string of four kinds of symbols (five data elements) used to distinguish different kinds of layers in the soil. | |
depth | The distance from the top of the soil to the upper boundary of the soil horizon. | cm |
awc | The amount of water that an increment of soil depth, inclusive of fragments, can store hat is available to plants. AWC is expressed as a volume fraction, and is commonly estimated as the difference between the water contents at 1/10 or 1/3 bar (field capacity) and 15 bars (permanent wilting point) tension and adjusted for salinity, and fragments. | cm/cm |
clay | Clay mineral particles less than 0.002mm in equivalent diameter as a weight percentage of the less than 2.0mm fraction. | percent |
silt | Silt mineral particles less than 0.002mm in equivalent diameter as a weight percentage of the less than 2.0mm fraction. | percent |
sand | Sand particles less than 0.002mm in equivalent diameter as a weight percentage of the less than 2.0mm fraction. | percent |
OM | The amount by weight of decomposed plant and animal residue expressed as a weight percentage of the less than 2 mm soil material. | percent |
dbthirdbar | The oven dry weight of the less than 2 mm soil material per unit volume of soil at a water tension of 1/3 bar. | g/cm3 |
th33 | The volumetric content of soil water retained at a tension of 1/3 bar (33 kPa, field capacity, saturation), expressed as a percentage of the whole soil (need to divide by 100). | g/cm3 |
th1500 | The volumetric content of soil water retained at a tension of 15 bars (1500 kPa, wilting point), expressed as a percentage of the whole soil. | g/cm3 |
bd | (th33-th1500)/100 | percent |
lat | latitude | degrees |
lon | longitude | degrees |
● strategies_cluster: Parameter values for a subset of strategy clusters. Parameters are used as input in the MAIZSIM model.
● test_data: Small subset of data required to set up a test environment in MAIZSIM.
● wea: Climate summary (temp, rh, precip, etc.) for all site-years. Columns are defined as in the Weather data table above.
Meteorology Data
weatherdata.tar.gz
Meteorology data is stored in individual tab-delimited text files for each site-year. Each file is named first with the 6 digit site code, and then with the year (SITECODE_YEAR.txt). There are three sets of meteorology files, one for present day, and two idealized representations of future climate. The text files are packaged in a tarball and compressed.
○ Control - observed historical meteorology data
○ f2050 - idealized 2050 climate projection with temperature, VPD, and precipitation perturbations following CMIP6 model scaling patterns
○ f2100 - idealized 2100 climate projection with temperature, VPD, and precipitation perturbations following CMIP6 model scaling patterns
Each file represents one site-year and has the following columns:
jday - Julian date from January 1st\
date - in MM/DD/YYYY format\
hour - in 24h format\
solrad - solar radiation in units of Watt-hours per meter squared (Wh/m2)\
temp - surface temperature in units of Celsius (C)\
precip - precipitation in units of millimeters per hour (mm/hr)\
rh - relative humidity in units of percent (%)\
co2 - carbon dioxide concentration in units of parts per million (ppm)
Sharing/Access information
The MAIZSIM model is available on github (https://github.com/USDA-ARS-ACSL/MAIZSIM).
Meteorology data was assembled from publicly accessible data, the Integrated Surface Hourly Data Base (https://www.ncei.noaa.gov/products/landbased-station/integrated-surface-database) and the National Surface Radiation Data Base (https://nsrdb.nrel.gov/data-sets/archives).
Code/Software
We have archived two repositories of python code, one to that processes the data, and a second which analyzes the data. They can both be loaded as python packages. The archived code can be accessed from Zenodo at the following DOIs.
ideotype - this package primarily processes the simulation output (https://doi.org/10.5281/zenodo.10606505).
upscale - this package analyzes output and produces graphs (https://doi.org/10.5281/zenodo.10601180).
Methods
This data was generated using the MAIZSIM model as described in the associated manuscript. The raw model output is stored in a database format and we also include the code that processed the model output into intermediate files which were then analyzed.