Skip to main content

Realistic 3D avian vocal tract model demonstrates how shape affects sound filtering (Passer domesticus)


Kazemi, Alireza; Kesba, Mariam; Provini, Pauline (2023), Realistic 3D avian vocal tract model demonstrates how shape affects sound filtering (Passer domesticus), Dryad, Dataset,


Despite the complex geometry of songbird’s vocal system, it was typically modelled as a tube or with simple mathematical parameters to investigate sound filtering. Here, we developed an adjustable computational acoustic model of a sparrow’s upper vocal tract (Passer domesticus), derived from micro-CT scans. We discovered that a 20% tracheal shortening or a 20° beak gape increase caused the vocal tract harmonic resonance to shift towards higher pitch (11.7% or 8.8%, respectively), predominantly in the mid-range frequencies (3-6 kHz). The oropharyngeal-esophageal cavity (OEC), known for its role in sound filtering, was modelled as an adjustable 3D cylinder. For a constant OEC volume, an elongated cylinder induced a higher frequency shift than a wide cylinder (70% versus 37%). We found that the OEC volume adjustments can modify the OEC first harmonic resonance at low frequencies (1.5–3 kHz) and the OEC third harmonic resonance at higher frequencies (6-8 kHz). This work demonstrates the need to consider the realistic geometry of the vocal system to accurately quantify its effect on sound filtering and show that sparrows can tune the entire range of produced sound frequencies to their vocal system resonances, by controlling the vocal tract shape, especially through complex OEC volume adjustments.


The dimensions of the trachea, larynx, beak, tongue, and other anatomical features derive from CT scan images of a whole-body specimen of a house sparrow, preserved in 70% ethanol and stained with iodine-potassium iodine dissolved in water for four weeks (Morphosource: usnm:birds:657964 Passer domesticus – ark:/87602/m4/M115379). The device used to acquire these images was a General Electric phoenix v|tome|x m, Smithsonian Institution Bio-Imaging Research (SIBIR) Center (National Museum of Natural History, Smithsonian Institution), with a X, Y, and Z pixel spacing of 0.067805 mm. The contrast agent used to prepare the specimen allows for the visualisation of soft tissues (tongue, cheeks, etc.) and provides accurate modelling of the entire 3D vocal system shape, in place. This specimen was used to investigate the chemical effects of staining (Early et al., 2020), therefore it corresponds to a well-preserved specimen, prepared in optimal condition for CT-scan acquisition. Bone demineralization occurs throughout the staining process (Early et al., 2020), but we chose a CT-scan performed after 4 weeks of staining, which minimize the risk of tissue degradation, while providing a satisfactory 3D model quality. We segmented the CT scan, using Avizo (v. 9.7, FEI Visualisation Sciences Group, Burlington, MA, USA). We extracted the upper beak, lower beak, tongue, and rest of the body, as separate 3D objects. To save solver time without losing any accuracy, we reduced the 3D object to the upper body only, and smoothed the external body, while preserving the entire trachea and the rest of the upper vocal tract.

We imported the 3D objects into COMSOL (COMSOL, 2021) Multiphysics® software to build the initial model, which we digitally modified to test the influence of each structure on sound modulation (Figure 1A). The initial model has a 30.58 mm long trachea and a 10-degree beak gape with the larynx’ opening, the glottis, around 0.7 mm in radius.


To assess the effect of trachea shortening, we cut 7%, 11%, and 20% of the initial model trachea (Figure 1B, respectively). We modelled extreme elongation or shortening to test the potential effect of trachea lengthening, our personal observations on cadavers led us to test a maximum of 20% variation. We used the “differences” tool in COMSOL Multiphysics® software (COMSOL, 2021). We considered the end correction and added a short distance to the actual length of the beak (Bosanquet, 1878; Levine and Schwinger, 1948).


We started with a 0.5 mm glottis diameter, corresponding to the geometry of the µCT scan. For that, we inserted a cylinder in the original larynx and cut the cylinder surface area to decrease the diameter of the glottis. It resulted in a sharp edge at the junction of the glottis and the cylinder. We compared the result for a geometry with a smooth surface or a sharp edge. The difference was inferior to 1%. Therefore, we considered that the effect was negligeable. With this method, we decreased the radius by 0.1 mm steps to reach a final radius of 0.1 mm (Figure 1C).


As the tongue was previously segmented, we were able to move it in COMSOL Multiphysics® software, using the rotation tool (COMSOL, 2021). We considered the tongue in 3 different positions: 1) the tongue touches the lower beak, 2) the tongue is located in the middle of the beaks, and 3) the tongue touches the upper beak. We chose these conformations to mimic the potential motions of the tongue during vocalisation, from the lower to the upper beak. We compared those three models with the initial model without tongue (Figure 1D).

Beak opening

We imported the beak 3D models derived from the segmentation, in Autodesk Maya (Student version 2020). We placed a virtual joint at the approximate location of the lower beak joint (Baumel, 1993) (Figure 1E). We chose three beak poses, representing extreme beak gape angles: 10°, 20°, and 30°.


Based on the location and shape of the oropharyngeal cavity (OEC) on the µCT-scan data, we artificially added a curved cylindric shape of 0.4 mL to the initial model (Figure 6). To investigate the role of the 3D shape on sound modulation, we modelled the OEC in seven different volumes (1.4, 1.3, 1.2, 1.1, 1, 0.9, and 0.8 mL) (Figure 4), by adding a cylindric shape to the initial curved cylinder and by incrementally changing its length: 8, 10, 12, 14, 16, 18, and 20 mm (Figure 6A), and radius: 2.5 , 2.8 , 3.1 , 3.34 , 3.57 , 3.78 , 4 mm (Figure 6B). We estimated the volume of the OEC using previous X-ray video recordings of OEC movements during vocalisation (Fletcher et al., 2006; Riede and Suthers, 2009; Riede et al., 2006). We checked the effect of the connection between the cylinder and torus, and between the torus and the buccal cavity in each model. The effect was inferior to 1%, thus considered ignorable.

Usage notes

You need COMSOL Multiphysics® (COMSOL, 2021) software to open the models.


LPI Research Fellowship (Bettencourt Schueller Foundation)