Musical pitch interval comparisons in cochlear implants
Data files
Apr 09, 2024 version files 269.61 KB
-
PSE_of_all_subjects.xlsx
-
README.md
-
StretchedIntervals_0017_UCSF_CIuser.dat
-
StretchedIntervals_0018_UCSF_CIuser.dat
-
StretchedIntervals_0055_UCSF_CIuser.dat
-
StretchedIntervals_0057_UCSF_CIuser.dat
-
StretchedIntervals_0070_UCSF_CIuser.dat
-
StretchedIntervals_1011_UCSF_CIuser.dat
-
StretchedIntervals_1059_UCSF_CIuser.dat
-
StretchedIntervals_2011_UCSF_CIuser.dat
-
StretchedIntervals_27_UCSF_CIuser.dat
-
StretchedIntervals_2980_NHPilot2.dat
-
StretchedIntervals_3368_NHPilot2.dat
-
StretchedIntervals_3388_vocoderCond_1_2_4.dat
-
StretchedIntervals_3661_NHPilot2.dat
-
StretchedIntervals_3661_vocoderCond_1_2_4.dat
-
StretchedIntervals_3784_vocoderCond_1_2_4.dat
-
StretchedIntervals_4148_NHPilot2.dat
-
StretchedIntervals_4149_NHPilot2.dat
-
StretchedIntervals_4178_NHPilot2.dat
-
StretchedIntervals_4182_NHPilot2.dat
-
StretchedIntervals_4298_NHPilot2.dat
-
StretchedIntervals_4300_NHPilot2.dat
-
StretchedIntervals_4301_NHPilot2.dat
-
StretchedIntervals_4316_NHPilot2.dat
-
StretchedIntervals_4317_NHPilot2.dat
-
StretchedIntervals_4319_NHPilot2.dat
-
StretchedIntervals_4320_NHPilot2.dat
-
StretchedIntervals_4334_NHPilot2.dat
-
StretchedIntervals_4334_vocoderCond_1_2_4.dat
-
StretchedIntervals_4402_vocoderCond_1_2_4.dat
-
StretchedIntervals_4445_vocoderCond_1_2_4.dat
-
StretchedIntervals_4449_vocoderCond_1_2_4.dat
-
StretchedIntervals_4450_vocoderCond_1_2_4.dat
-
StretchedIntervals_4451_vocoderCond_1_2_4.dat
-
StretchedIntervals_4462_vocoderCond_1_2_4.dat
-
StretchedIntervals_4463_vocoderCond_1_2_4.dat
-
StretchedIntervals_4464_vocoderCond_1_2_4.dat
-
StretchedIntervals_4467_vocoderCond_1_2_4.dat
-
StretchedIntervals_4486_vocoderCond_1_2_4.dat
-
StretchedIntervals_4498_vocoderCond_1_2_4.dat
-
StretchedIntervals_4504_vocoderCond_1_2_4.dat
-
StretchedIntervals_54_UCSF_CIuser.dat
Abstract
Music perception remains challenging for many cochlear implant (CI) recipients, due perhaps in part to the frequency mismatch that occurs between the electrode-neural interface and the frequencies allocated by the programming. Individual differences in ear anatomy, electrode array length, and surgical insertion can lead to great variability in the positions of electrodes within the cochlea, but these differences are not typically accounted for by current CI programming techniques. Flat panel computed tomography (FPCT) can be used to visualize the location of the electrodes and calculate the corresponding spiral ganglion characteristic frequencies. Such FPCT-based CI frequency mapping may improve pitch perception accuracy, and thus music appreciation, as well as speech perception. The present study seeks to develop a behavioral assessment metric for how well place-based pitch is represented across the frequency spectrum. Listeners were asked to match the pitch interval created by two tones, played sequentially, across different frequency ranges to estimate the extent to which pitch is evenly distributed across the CI array. This test was piloted with pure tones in normal hearing listeners, using both unprocessed and vocoder-processed sounds to simulate both matched and mismatched frequency-to-place maps. We hypothesized that the vocoded stimuli would be more difficult to match in terms of pitch intervals than unprocessed stimuli and that a warped map (as may occur with current clinical maps) would produce poorer matches than a veridical and even map (as may be achieved using FPCT-based frequency allocation). Preliminary results suggest that the task can reveal differences between veridical and warped maps in normal-hearing listeners under vocoded conditions. A small cohort of CI recipients performed similarly to a vocoded condition employing the same pitch map. The next steps will be to test this procedure in CI users and compare results with traditional clinical maps and FPCT-based frequency allocation to determine whether the FPCT-based maps result in improved pitch-interval perception.
README: Musical pitch interval comparisons in cochlear implants
https://doi.org/10.5061/dryad.dfn2z359d
Description of the data and file structure
Excel file with summary data:
- Participant ID - each subject is given a number
- Test ID - the number the subject used to access the test, which matches their raw data file
- Age - the age in years when the subject was tested
- Music Training - the number of years of formal music training each subject reports having (e.g., 1:1 lessons, school band, etc.)
- Cohort - which of the 3 cohorts each subject belonged to: NH (natural hearing), Voc (Vocoder), or CI (cochlear implant)
- Condition - the listening condition
- Freq Range - either Low vs Mid or Mid vs High
- Fixed Semitone Interval - either 4 or 7
- Test Parameters - a field combining the Freq Range and Fixed Semitone Interval fields, used to indicate which subtest was completed
- PSE - point of subjective equality, measured in semitones, a continuous dependent variable
- Filename - the corresponding raw data file name
Raw data files; there is a mismatch between column header names and the data contained in each column. Below is the data contained in each column.
- Track number: There were four tracks in each trial. Each set of four rows represents one trial.
- LoHiFix: Representing whether the low interval was fixed (0) or the high interval was fixed (1)
- ST: Number of semitones of the fixed interval (either 4 or 7, depending on the subtest)
- Hz: A value representing which frequency range was being tested, either Low vs Mid (150) or Mid vs High (572)
- Oct: A constant indicating the distance, in octaves, between the pitch intervals being tested (always 1.93)
- ST: The dependent variable, in semitones, representing the point of subjective equality
- A value, in semitones, of the variability of the subject's responses.
- Elapsed time: The duration of each trial in seconds.
Sharing/Access information
Data was collected at the University of California, San Francisco, and the University of Minnesota. All subjects gave informed consent under IRB-approved study protocols.
Methods
Subjects
Two primary groups were enlisted for this study: normal hearing (NH) individuals and cochlear implant (CI) recipients. NH listeners were used to establish baseline data that could be used to compare against CI recipients. CI recipients are included here as a pilot to determine whether this approach is feasible for these listeners.
Normal Hearing (NH) Participants
Recruited through the University of Minnesota, 31 NH individuals participated. The group that assessed unprocessed stimuli comprised 15 participants (average age: 22.6 years, SD: ±1.5; gender distribution: 5 males, 10 females) with an average of 8.1 years (SD: ±3.9) of musical experience. The vocoded stimuli group included 16 participants (average age: 28.6 years, SD: ±13.8; gender distribution: 7 males, 9 females), reporting an average of 11.1 years (SD: ±11.1) of musical experience. Testing for both NH groups was completed remotely via an online MATLAB platform, requiring the use of headphones.
Cochlear Implant (CI) Recipients
Nine CI recipients (Table 1, average age: 57.4 years, SD: ±13.2; gender distribution: 6 males, 3 females) were recruited through UCSF. This group consisted of one bilateral and eight unilateral CI users, all equipped with MED-EL CIs and using their clinical everyday listening programs. Their reported musical experience averaged 11.3 years (SD: ±12.3). Similar to the NH group, the CI cohort completed the task via an online MATLAB platform. CI recipients were instructed to choose the transducer that they regularly use with success at home; this could have included sound field speakers, headphones, or streaming, with care taken to isolate the test ear.
Pitch Interval Assessment Procedure
For these experiments, we focused on pitch interval comparisons across a frequency range utilized by contemporary CI processors. This frequency range was divided into three regions, assuming a logarithmic distribution of frequencies, resulting in low (root note 150 Hz, interval range 126-505 Hz), mid (root note 572 Hz, interval range 480-1924 Hz), and high (root note 2181 Hz, interval range 1833-7314 Hz) categories.
NH participants completed the pitch interval assessment with pure tones across both frequency test ranges (Low vs Mid, Mid vs High), whereas CI recipients and NH subjects with vocoded stimuli were tested only in the Mid vs High ranges due to limitations of the vocoded conditions in low frequencies and time constraints.
The task involved comparing two pitch intervals between two frequency regions presented in succession. Participants identified the larger of two presented pitch intervals in a forced choice paradigm (Zarate et al., 2012; McDermott et al., 2010).
A single pitch interval consisted of a 3-tone melody in a low-high-low sequence, where the first and last notes were the same (e.g., C4 - G4 - C4). The melody's root note was roved within a half-octave range. Each note was a pure tone of 300 ms, including 30-ms onset and 50-ms offset raised-cosine ramps. The notes within each 3-note sequence were separated by 150-ms gaps.
For the NH listeners presented with pure tones, the fixed interval was either 4 ST (a major 3rd in music notation) or 7 ST (a perfect 5th); for the CI users and NH listeners presented with vocoded stimuli, the fixed interval was always 7 ST. Intervals were defined using equal temperament tuning, where 1 ST always represents a change in frequency of 21/12.
Adaptive Tracking Procedure
The assessment employed an adaptive testing approach (e.g., Jesteadt, 1980) to determine each participant's point of subjective equality (PSE) for pitch intervals across different frequency regions. One of the two intervals was fixed, and the other interval was adaptively varied, based on the listener’s previous responses. A value of 0 semitones (ST) in this procedure indicates that the adaptively varying interval was the same size as the fixed interval (either 4 or 7 ST).
Each run consisted of four randomly interleaved adaptive tracks, two of which used a 2-down 1-up procedure and two of which used a 1-up 2-down procedure, tracking the 71% and 29% points of the psychometric function, respectively (Levitt, 1971). For each of these pairs of tracks, one pair varied the first (lower) interval and the other pair varied the second (higher) interval. For each track, the starting size of the varying interval was ±3 STs, and the starting step size was 4 STs and decreased after the first two reversals to 2 STs. Four reversals were required during the initial phase and two reversals were required during the measurement phase.
Once all the tracks had terminated, the PSE was defined as the average of the four tracks (as the mean of the 71% and 29% points approximates the 50% point). A total of 5 runs were completed per participant in each condition, with a prompt for participants to rest between runs.
The adaptive tracking procedure was limited to values of between -7 and +10 ST. If the adaptive procedure called for a value exceeding the maximum or minimum more than 6 times in one track, the track was terminated and a value of -8 or +11 ST was assigned to that track.
Prior to data collection, participants completed a short training module in the initial phase of the experiment that utilized 7 ST intervals and provided feedback to ensure understanding of the task. No feedback was given during testing.
Stimuli
Normal Hearing Cohort
Normal hearing (NH) participants were assessed using either pure-tone (unprocessed) stimuli or vocoded stimuli (to simulate aspects of cochlear implant sound perception).
To create the vocoded stimuli, a frequency warp was first applied to simulate either a full-length (28 mm) or a shorter (24 mm) electrode array placement. The basis for these two frequency warps was generated by calculation of the spiral ganglion characteristic frequencies of electrodes measured using lateral wall cochlear duct length measurements of our previously FPCT-imaged cohort (Jiam et al., 2021; Helpard et al., 2020; Li et al., 2021). The most apical electrode of the 28 mm array corresponds to 350 Hz (Figures 1A and 1B, black dots), and of the 24 mm array corresponds to 500 Hz (Figure 1C, black dots), consistent with larger cohorts reported elsewhere (Canfarotta et al., 2020, ***).
The second step to create the vocoded stimuli was to apply a frequency allocation table to simulate either default or custom CI filterbank settings, which yielded the following three conditions: (1) Vocoded 28 mm Array with Default Frequencies (“Voc Default”, black bars in Figure 1A), (2) Vocoded 28 mm Array with Middle Frequencies Matched (“Voc MidFreq Match”, gray bars, Figure 1B), and (3) Vocoded 24 mm Array with All Frequencies Matched (“Voc AllFreq Match”, gray bars, Figure 1C).
The Voc Default setting (Figure 1A, black bars) used a frequency range of 70-8500 Hz, logarithmically divided into 12 channels, and was modeled after the manufacturer’s default frequency allocation table (i.e., “LogFS”).
The Voc MidFreq Match applied a strict match of center frequencies to the mid-frequency range (950-4000 Hz) and the remaining frequency ranges were then redistributed across the most apical and basal electrodes (70-950 Hz, 4000-8500 Hz, as seen in Figure 1B, gray bars). This approach attempted to maintain audibility across the entire frequency range while also maintaining pitch interval integrity where feasible.
The Voc AllFreq Match utilized a strictly CT-based approach (Figure 1C, gray bars), which matched all channel center frequencies to the electrode contact locations as much as is feasible (<4000 Hz). To avoid deactivating electrodes that were located above the bandwidth limit of the software (8500 Hz), a logarithmic redistribution was applied in the highest frequencies (>4000 Hz) to make the best use of available electrodes.
Cochlear Implant Cohort
The CI cohort was presented with only pure-tone (unprocessed) stimuli during the assessment. All CI recipients utilized the manufacturer's default logarithmic frequency allocation table (i.e., LogFS, indicated by the black bars in Figure 1A).
Additionally, the CI and the vocoded conditions were only completed the mid to high-frequency range comparisons (i.e., 480-7314 Hz) since the low-frequency range was not accessible to the vocoded condition modeling the shorter electrode array (most apical electrode corresponding to a characteristic frequency of 500 Hz).
Standardization of Sound Levels for Participant Assessment
Given the diversity of hardware used by participants conducting the assessment on their computers, inherent variability in presentation levels and frequency response was a significant challenge. To mitigate this variability and standardize sound levels across varying setups, we implemented a personalized standardization process for each participant.
This standardization involved a subjective loudness equivalency task, where participants were presented with five tones that spanned the critical frequency range of the assessment (125 Hz, 250 Hz, 500 Hz, 2000 Hz, and 7352 Hz). Accompanied by a graphical user interface, each tone was adjustable via a slider allowing for a ±30 dB range of loudness modification. Participants adjusted the loudness of each tone to achieve a uniformly "most comfortable" listening level across the frequency spectrum.
The standardization data from these five frequencies were then utilized to construct a participant-specific frequency response profile. This was achieved through linear interpolation of the logged frequency values against the decibels of standardization adjustment, using MATLAB's interp1 function. The interpolated values provided a tailored correction for each tone's amplitude, allowing for more consistent volume across the test's frequency range.
Consequently, this standardization process generated a custom amplitude scaling curve for each participant. This curve adjusted the nominal level of each tone to 60 dB, considering the individual standardization offsets. These adjustments were applied to all stimuli, including those processed through the vocoder, to maintain a more consistent stimulus volume throughout the assessment.
This standardization approach was intended to reduce the variability introduced by participants' diverse hardware configurations, ensuring that all participants experienced the stimuli at a consistent and optimal listening level throughout the experiment.