Skip to main content

A high-throughput multispectral imaging system for museum specimens

Cite this dataset

Chan, Wei-Ping et al. (2022). A high-throughput multispectral imaging system for museum specimens [Dataset]. Dryad.


We present an economical imaging system with integrated hardware and software to capture multispectral images of Lepidoptera with high efficiency. This method facilitates the comparison of colors and shapes among species at fine and broad taxonomic scales and may be adapted for other insect orders with greater three-dimensionality. Our system can image both the dorsal and ventral sides of pinned specimens. Together with our processing pipeline, the descriptive data can be used to systematically investigate multispectral colors and shapes based on full-wing reconstruction and a universally applicable ground plan that objectively quantifies wing patterns for species with different wing shapes (including tails) and venation systems. Basic morphological measurements, such as body length, thorax width, and antenna size are automatically generated. This system can increase exponentially the amount and quality of trait data extracted from museum specimens.


Processed data

These data include but are not limited to all parameters generated during image processing, gridded multispectral reflectance, wing shapes, and the measurements of body size and antennae. The detailed data structure can be found on the GitHub repository.

 Map of archived materials, protocols, and tutorials

To prevent potential conflicts, scripts for different purposes on the cluster and on the local machine are provided in different protocols on and repositories on GitHub. Here, the summary of online protocols and source codes are organized as follows. Inclusion in [Protocol] indicates the corresponding step-by-step instruction on; inclusion in [Cluster] indicates the script will run better on the cluster; inclusion in [Local] indicates the script is designed for local machines with relatively low CPU and memory demands.

     Raw data: files described in the following format [[Folder/File name]]: descriptions

       [[Methodology_imaging_records.csv]]: A file recording image names and the barcode of imaged specimens

       [[Drawer_img_nef]]: Drawer images in RAW (*.NEF) format (total 35 images)

o   Five set of images: Method_1-1_dorsal, Method_1-1_ventral, Method_1-2_dorsal, Method_1-2_ventral, Method_1-r_ventral (with a scale bar placed upside-down)

       [[Drawer_img_tiff]]: Drawer images in linearized 16-bit (*.tiff) format (total 35 images)

       [[manual_bounding_box_par]]: Manually corrected bounding boxes

       [[spp_img_inspection]]: Specimen images for visualization (*.jpg)

o   [[Problematic]]: Those problematic ones that need to be manually corrected

       [[spp_img_reMask_tiff_done]]: Specimen images (*.tiff) after the mask correction

       [[spp_first_level_product]]: The initial descriptive data or ‘first-level products’ (*AllBandsMask.mat). Find Methods for the detailed data structure

       [[spp_RGB_Imgs]]: Images used for manual fore-and hindwing segmentation

o   [[Seg_done]]: Done images (*.jpg)

o   [[Segmented]]: The fore-and hindwing segmentation parameters (*.json)

       [[spp_segmentation_analysis]]: Segmented images after inspection and manual correction

o   [[wing_segmentation_img]]: The visualizations of image segmentation (*.jpg)

o   [[wing_shape_morph-seg]]: The results of image segmentation (*morph-seg.mat)

o   [[morphology_analysis_spp_preference_table_template.csv]]: A table generated according to the images in the ‘wing_segmentation_img’ folder, which is later used for inspection

o   [[morphology_analysis_spp_preference_table.csv]]: The result after manual inspection, which records the condition of different body parts of a specimen

o   [[reflectance_table]]: The reflectance data for all body parts of all specimens

       [[spp_wing_grids_generation]]: Generate wing grids and processed data

o   [[inspect_imgs]]: The visualization (*.jpg) of wing grids (no correction was needed in these results)

o   [[spp_wing_parameters]]: Processed wing data. The original folder name is kept here.

o   [[wing_matrix_visualization]]: The summarized multispectral reflectance (NIR [740], fNIR [940], F, FinRGB, PolDiff, UV, UVF, white, whitePol1, whitePol2) according to wing grids.

       [[spp_second-level_product]]: The processed “second-level products”. (*_d-v_gridsPars*.mat). Find Methods for the detailed data structure

       [[group_summary]]: The summary statistics for specified groups

o   [[specimen_groups.csv]]: A table specifying groups

o   [[specimen_groups_group_barcode_list.json]]: The group table in JSON format

o   [[summary_matrices]]: The summary results according to the group table (*._summary.mat)

o   [[summary_visualization]]: The summary visualization for each group (*.png)

o   [[shp_tail_adv_vis]]: Replot wing shape and tails by scripts for advanced visualization (*.png)

o   [[tail_summary_visualization]]: Replot tails by scripts for advanced visualization (*.png)


    Blueprints and materials (Fig. 6)



    Bash scripts and shell scripts running on the cluster



    Image preprocessing to derive initial descriptive data for museum archiving



       Inspection and manual correction of specimen bounding box (Fig. 3d)


       Inspection and manual correction of mask for background removal (Fig. 9a)

[Local] commercial painting software, such as Adobe Photoshop


    Data preparation and processing for color and shape quantification


       Body-part segmentation (Fig. 9c panels at right)

manually defined fore-hindwing segmentation data




Inspection and manual correction of primary landmarks (Fig. 9b)


       Multispectral reflectance at wing-size level (as table format; Fig. 9d)



       Dorsal-ventral side analyses (Fig. 4)



       Inspection and manual correction of secondary landmarks (Fig. 1d)



    Visualization (Fig. 1g-h & Fig.5)


       Multispectral reflectance at wing-pattern level with wing shape summary


       Advanced visualization for wing shapes and tails (Methods)


Usage notes

The pipeline was mainly developed under Matlab and R, but the data formats (e.g. *.mat, *.json) can still be operated in Python or other interface.


National Science Foundation, Award: PHY-1411123

National Science Foundation, Award: DEB-0447242

National Science Foundation, Award: PHY-1411445

United States Air Force Office of Scientific Research, Award: FA9550-14-1-0389

United States Air Force Office of Scientific Research, Award: FA9550-16-1-0322