Segmentation, the computation of object boundaries, is one of the most important steps in intermediate visual processing. Previous studies have reported cells across visual cortex that are modulated by segmentation features, but the functional role of these cells remains unclear. First, it is unclear whether these cells encode segmentation consistently since most studies used only a limited variety of stimulus types. Second, it is unclear whether these cells are organized into specialized modules or instead randomly scattered across the visual cortex: the former would lend credence to a functional role for putative segmentation cells. Here, we used fMRI-guided electrophysiology to systematically characterize the consistency and spatial organization of segmentation-encoding cells across the visual cortex. Using fMRI, we identified a set of patches in V2, V3, V3A, V4, and V4A that were more active for stimuli containing figures compared to ground, regardless of whether figures were defined by texture, motion, luminance, or disparity. We targeted these patches for single-unit recordings and found that cells inside segmentation patches were tuned to both figure-ground and borders more consistently across types of stimuli than cells in the visual cortex outside the patches. Remarkably, we found clusters of cells inside segmentation patches that showed the same border-ownership preference across all stimulus types. Finally, using a population decoding approach, we found that segmentation could be decoded with higher accuracy from segmentation patches than from either color-selective or control regions. Overall, our results suggest that segmentation signals are preferentially encoded in spatially discrete patches.
The dataset uploaded here contains spike rasters for all cells and stimuli analyzed for the paper, in addition to information about the locations and animals they were recorded from, and information about the different stimuli used. In addition, it contains averaged base-line-subtracted time courses of fMRI activations. Please read the README.md for more detailed information about each variable.
See the manuscript "Functional modules for visual scene segmentation in macaque visual cortex" for more details.
Excerpt:
All animal procedures used in this study complied with local and NIH guidelines including the US NIH Guide for Care and Use of Laboratory Animals. All experiments were performed with the approval of the Caltech Institutional Animal Care and Use Committee (IACUC).
Cylindrical recording chambers (Crist) were implanted using dental acrylic on the left hemisphere of monkey F, the right hemisphere of monkey A, and the right hemisphere of monkey T. Custom grids were printed and inserted into chambers to record from targets defined by fMRI. Chamber positioning and grid design were planned using the software Planner. Guide tubes were cut and inserted through the grid to extend 2 mm beyond the dura. Single tungsten electrodes (FHC) with 1 MΩ impedance were inserted through guide tubes and used for recording. An oil hydraulic microdrive (Narishige) was used to advance electrodes through the brain. Neural signals were recorded using an Omniplex system (Plexon). Local field potentials were low-pass filtered at 200 Hz and recorded at 1,000 Hz, while spike data were high-pass filtered at 300 Hz and recorded at 40 kHz. Only well-isolated units were considered for further analysis. During electrophysiology, stimuli were presented on an LCD screen (Acer) of 47-degree diameter.
Monkeys were head fixed and passively viewed the stimuli presented on a screen in the dark. In the center of the screen, a fixation spot of 0.25-degree diameter was presented and monkeys received juice reward for properly maintaining fixation for 3 s. Eye position was monitored using an infrared eye tracking system (ISCAN). Images were presented in random order using custom software. For the main segmentation fMRI localizer, stimuli consisted of either 8 large (10-degree diameter) or 72 small (3-degree diameter) rounded squares that formed a grid covering the entire screen (see
Fig. 1 B–E for examples of the large square version). Squares were laid out interleaved on a checkerboard, i.e., filling up every second position of a 4×4 grid (for large squares) or 12×12 grid (for small squares), see
Fig. 1. The stimulus set also contained a version of each stimulus where the locations of squares and empty spaces were swapped, by shifting the squares by one square width, so that every position of the visual field was occupied by figure equally as often as it was occupied by background. The background control contained no shapes. For both the main (full-field) segmentation localizer (
Fig. 1 A–E) and segmentation retinotopy localizer (
SI Appendix, Fig. S1), figure shapes were defined by luminance, texture, motion, or disparity. For the segmentation retinotopy localizer, stimulus shapes were the same as for the standard retinotopy localizer, i.e., wedges for polar angle and rings for eccentricity. For luminance, figures and background were black or white; for texture, figures and background were created from lines of two different orientations at random positions; for motion, figures and background consisted of dots at random positions moving left or right; for disparity, figures and background consisted of random dots viewed through red-cyan goggles that had near or far disparity. For all stimuli, we also showed stimuli with switched figure and background assignment, e.g., for luminance, we also showed the same stimuli where figures were white and background was black. During fMRI experiments, stimuli were presented in a block design. Stimuli of each modality (luminance, texture, motion, and disparity) were presented in different blocks. Moreover, for each modality there was a block of stimuli that contained figures and a separate control block of stimuli that contained only background. Each block was presented for 24 s, and stimuli of each block were presented in pseudorandom order with 500 ms ON time, 0 ms OFF time.
Stimuli presented during electrophysiology experiments were similar to the main segmentation fMRI localizer, in that figures were defined by luminance, texture, motion, or disparity; however, they only contained a single (nonrounded) square that had a size and orientation determined by the receptive field and orientation tuning (see Online analysis). The square position was shifted giving rise to the four conditions described in the Results (square centered on receptive field, top edge of square centered on receptive field, bottom edge of square centered on receptive field, only background without a square). During electrophysiology, stimuli were presented for 250 ms ON time and 50 ms OFF time. Emission spectra were measured using a PR-650 SpectraScan colorimeter (Photo Research) and for the color experiment colors were adjusted to be equiluminant. For disparity stimuli, random dots were slightly shifted to the left and right, respectively, depending on whether they were on the square or in the background, by a distance of 3% of the square diameter, leading to horizontal disparities on the order of 0.2 degrees.
For electrophysiology data, spikes were resorted offline using OfflineSorter (Plexon). Trials in which monkeys broke fixation were discarded (using a 1-degree eccentricity fixation window). Peristimulus time histograms were computed and smoothed with a Gaussian kernel (
σ=100 ms) for plotting. To determine whether a cell is visually responsive, we computed each trial’s visual response (averaged spike count from 50 ms to 250 ms after trial onset) and baseline (averaged spike count from 50 ms before trial onset to 50 ms after trial onset) and performed a paired
t test across trials (threshold: p<0.05 ). Since mean spike counts are not expected to satisfy a Gaussian distribution, we also applied an Anscombe transform to spike counts and repeated the
t test which yielded the same results on visual responsiveness for 99.9% of all cells. To compute modulation indices and determine consistency of cells’ selectivity, we used average spike counts 50 ms to 250 ms after trial onset. For figure versus background modulation indices, we compared responses for stimuli where a figure was centered on the receptive field (i.e., square center) versus background stimuli . Note that for disparity, we defined the center of the far-disparity square on a near-disparity background (i.e., a hole) as background, so that only the center of the near-disparity square on a far-disparity background was labeled as figure. For edge versus nonedge modulation indices, we computed the preference for stimuli where a border was centered on the receptive field (i.e., top edge or bottom edge) versus stimuli where no border was in the receptive field (i.e., square center or background). For decoding, we trained a support vector machine (libsvm package in Matlab
(36) with a linear kernel function and otherwise default parameters) to discriminate between the four classes “Figure,” “Background,” “Top edge,” and “Bottom edge” on a single-trial-basis using spike counts from 50 ms to 250 ms after trial onset of each recorded neuron as features. Since neurons were recorded across different sessions, we created a pseudopopulation [similar to Yamane et al.
(27), see
Discussion for a detailed comparison] with the requirement that for every included neuron at least 40 trials were collected for each stimulus. We varied the number of neurons to be included for training (
X axis in
Figs. 7 and
8) and constructed a feature matrix with number of neurons as columns and total number of trials (40 per stimulus) as rows. We performed twofold cross-validation by randomly splitting all trials into two halves, with one half being used for training and the other half being used for testing. For each number of neurons to include (value on the
X axis in
Figs. 7 and
8), we performed 100 iterations of randomly selecting neurons from the pseudopopulation and randomly selecting training and testing trials to get a distribution of decoding accuracies. All analysis was performed in Matlab (Mathworks).
The dataset can be loaded into MATLAB using the command load('PNASTsao2023_Dryad.mat'). There are also different packages available in python to load this data format.