# Auburn Soybean Disease Image Dataset (ASDID): ---   Brief summary of dataset contents, contextualized in experimental procedures and results.   This dataset contains 2D images/photographs of diseased soybean leaves that were captured during the 2020 and 2021 Alabama soybean season using a Canon EOS 7D Mark II Digital SLR Camera and a Motorola Moto Z2 Play Smartphone from fields at the EV Smith Agricultural Research Station (Tallassee, Alabama), the Cullars Rotation (Auburn, Alabama), and the Brewton Agricultural Research Unit (Brewton, Alabama). Across both seasons there are a total of 9,981 original images collected across eight disease/deficiency categories. These include (1) healthy-looking plants, and those displaying the symptoms of (2) bacterial blight, (3) cercospora leaf blight, (4) downey mildew, (5) frogeye leaf spot, (6) soybean rust, (7) target spot, and (8) potassium deficiency. For each disease category, leaves were either photographed at various canopy heights while still attached to the plant in the field or they were detached from the plant and then immediately photographed while laid flat on the ground in trimmed grass or on a white surface. Images were collected with the goal of developing a Convolutional Neural Network (CNN)-based automated classifier of digital images of soybean diseases. Please see Computers and Electronics in Agriculture (COMPAG) entitled ‘Soybean Disease Identification Using Original Field Images and Transfer Learning with Convolutional Neural Networks’ for our original implementation of model building and analysis using this image dataset. Note that 9,594 of the images were used for our analysis and these images are clearly denoted in the file structure (there are additional unused images included in the dataset). We experimented with a variety of CNN model building approaches including transfer learning, data engineering, and data augmentation. Our best performing model was based on the DenseNet201 base architecture. It achieved an overall testing accuracy of 96.8% when trained from scratch. Exploring full or partial freezing of core weights did not improve DenseNet201 model performance. Directly manipulating the diversity of subject backgrounds in the digital images also did not improve performance. Increasing representational parity across disease classes via data augmentation did provide a substantial performance boost.       ## Description of the Data and file structure   All images are simply contained in a folder (zipped) with the respective disease category as the name of the folder (e.g. bacterial_blight.zip or target_spot.zip). There are eight original zipped folders, one for each disease category used in our analysis. Each image within a disease folder is then named with the respective disease and a corresponding number (e.g. bacterial_blight_1.jpg or target_spot_15.jpg). All of the images are formatted as .jpg files and of horizontal orientation. There are 9648 original images that were used in the analysis. The used image totals are 484 for bacterial blight, 1598 for cercospora leaf blight, 652 for downey mildew, 1540 for frogeye leaf spot, 1632 for healthy (asymptomatic), 1034 for potassium deficiency, 1627 for soybean rust, and 1081 for target spot. There is also an additional three zipped folders that contain 387 additional images that were not used in our analysis (unused_cercospora_leaf_blight.zip, unused_healthy.zip, unused_soybean_rust.zip). There are 114 additional cercospora leaf blight images, 118 additional healthy images, and 155 additional soybean rust images.         ## FILE LIST   bacterial_blight.zip         |- 484 images cercospora_leaf_blight.zip         |- 1598 images downey_mildew.zip         |- 652 images frogeye.zip         |- 1540 images healthy_disease.zip         |- 1632 images potassium_deficiency.zip         |- 1034 images soybean_rust.zip         |- 1627 images target_spot.zip         |- 1081 images unused_cercospora_leaf_blight.zip         |- 114 images unused_healthy.zip         |- 118 images unused_soybean_rust.zip         |- 155 images     ## Sharing/access Information   Please cite our article published in Computers and Electronics in Agriculture (COMPAG) entitled ‘Soybean Disease Identification Using Original Field Images and Transfer Learning with Convolutional Neural Networks’.   DOI: 10.1016/j.compag.2022.107449