Skip to main content

Ground truth data used to train the synapse classifier used in Lillvis et al., 2022 for ExLLSM circuit reconstruction

Cite this dataset

Lillvis, Joshua (2022). Ground truth data used to train the synapse classifier used in Lillvis et al., 2022 for ExLLSM circuit reconstruction [Dataset]. Dryad.


Brain function is mediated by the physiological coordination of a vast, intricately connected network of molecular and cellular components. The physiological properties of network components can be quantified with high throughput; the ability to assess many animals per study has been key to relating physiological properties to behavior. Conversely, detailed anatomical properties (e.g., the synaptic connectivity of molecularly-defined cell types across an entire circuit) are presently quantifiable only with low throughput; thus we know very little about how network structure, and structural variation, influences behavior. For neuroanatomical reconstruction there is a methodological gulf between electron-microscopic (EM) methods, which yield dense connectomes (but at great expense and low throughput) and light-microscopic methods, which provide molecular and cell-type specificity with high throughput (but without synaptic resolution). We developed a high-throughput analysis pipeline and imaging protocol using tissue expansion and light sheet microscopy (ExLLSM) to rapidly reconstruct selected circuits across many animals with single-synapse resolution and molecular contrast. Using Drosophila to validate this approach, we demonstrate that it yields synaptic counts similar to those obtained by EM, enables synaptic connectivity to be compared across sex and experience, and can be used to correlate structural connectivity, functional connectivity, and behavior. This approach fills a critical methodological gap in studying variability in the structure and function of neural circuits across individuals within and between species.

Here, we share the data used to train the synapse classifier that was utilized in the analysis pipeline. All additional software, code, and usage examples to train and run the classifier can be found at Github:


Automatic synapse classification

Presynaptic sites can be identified as clusters of BRP proteins (Ehmann et al., 2017). Using 8X ExLLSM and labeling BRP with the nc82 antibody (Wagh et al., 2006) or the STaR-BRP reporter (Chen et al., 2014b), discrete clusters of fluorescent antibodies were present that, as expected (Schneider-Mizell et al., 2016), varied significantly in shape and  size across the Drosophila brain (Figure 1H-K). We tested using ilastik (Sommer et al., 2011), a 3D VGG shaped neural network (Simonyan and Zisserman, 2014), and 3D U-Net shaped neural network (Çiçek et al., 2016) to segment these heterogeneous structures from non-specific antibody labels and background signals. On our data, we found that the neural networks performed better than ilastik and similarly to each other, and that the U-Net was faster than the VGG. Therefore, we elected to train a U-Net convolutional neural network to automatically classify presynaptic sites.  

To generate ground truth data for training the U-Net, we made 100x100x100 and 500x500x500 pixel crops of BRP staining (as labeled using the nc82 antibody) using the Fiji N5 Viewer. We considered clusters of three or more BRP labels in close proximity that fell along a common plane to be presynaptic sites. We semi-automatically segmented these presynaptic sites from non-specific antibody labels and background signals using VVD Viewer. This semi-automatic segmentation was accomplished similarly to semi-automatic neuron segmentation: the VVD Viewer Component Analyzer tool was used to extract signal from background followed by manual inspection of each potential presynaptic site. In total, we segmented over 10,000 presynaptic sites in image crops from 25 different brains. Crops were made from the optic lobe, mushroom body, lateral horn, central complex, antennal lobe, and protocerebrum.

We used these raw image data crops and manually segmented presynaptic sites to train the U-Net for 3000 epochs until the loss, accuracy, and error rates plateaued. The entire synapse classification and assignment pipeline includes a post-U-Net processing workflow. This post-U-Net workflow includes a watershed segmentation step to segment individual synaptic sites and a size filter to remove connected components below a given size threshold. For presynaptic sites labeled by nc82 or STaR-BRP, objects smaller than 400 pixels were removed.

We evaluated the results of this synapse detection pipeline (including post-U-Net watershed segmentation and 400 pixel size thresholding) by running it on data crops of BRP labeled by nc82 from the optic lobe, protocerebrum, and lateral horn of three brain samples that were not included in the training. We compared these results to the manually segmented ground truth data (2300 presynaptic sites) of these image volumes. The final synapse detection pipeline had an average precision of 94% and recall of 88% (Fig. 1L-Q).

Here, we include the ground truth data used to train the model. On Github, we include the trained model used for classifying synaptic sites, code and instructions to train the classifier, and code and instructions to calculate performance of the classifier ( These components can be run locally or on a compute cluster, and can be run independently or as part of several common use workflows described below (

Usage notes

Any software that can open .nrrd files (e.g. Fiji:, VVDViewer: can be used to open the data files.


Howard Hughes Medical Institute