# Evolution of male genitalia in the Drosophila repleta species group (Diptera: Drosophilidae)

## Citation

Stefanini, Manuel; Gottschalk, Marco Silva; Calvo, Natalia Soledad; Soto, Ignacio Maria (2021), Evolution of male genitalia in the Drosophila repleta species group (Diptera: Drosophilidae), Dryad, Dataset, https://doi.org/10.5061/dryad.b2rbnzsd8

## Abstract

The Drosophila repleta group comprises more than one hundred species that inhabit several environments in the Neotropics and use different hosts as rearing and feeding resources. Rather homogeneous in their external morphology, they are generally distinguished by the male genitalia, their fastest evolving morphological trait, constituting an excellent model to study patterns of genital evolution in the context of a continental adaptive radiation. Although much is known about the evolution of animal genitalia at population level, surveys on macroevolutionary scale of this phenomenon are scarce. Herein, in order to elucidate the macroevolutionary patterns of genital evolution through deep time and large continental scales, a suite of phylogenetic comparative methods was used in this group for the first time. Our results indicate that male genital size and some aspects of shape have been evolving by speciational evolution, probably due to the microevolutionary processes involved in species mate recognition. In contrast, several features of the aedeagus shape seemed to have evolved stochastically and thereby in a gradual fashion, with heterogeneous evolutionary phenotypic rates among clades, however. In general, the tempo of the evolution of aedeagus morphology was constant from the origin of the group until the Pliocene, when it accelerated in some clades that diversified mainly in this period. The incidence of novel ecological conditions in the tempo of aedeagus evolution and the relationship between species mate recognition and speciation in the Drosophila repleta group are discussed.

## Methods

**Data collect**

Aedeagi from adult males of 51 species of the repleta group were obtained via direct dissection, from the laboratory photographic collection or images available on bibliography. Aedeagus morphology in species of the repleta group shows low intraspecific but high interspecific variability (Barrios-Leal et al., 2017; Stefanini et al., 2018) . Thus, one aedeagus per species was quantified. When dissected, male genitalia were mounted on microscope slides and flattened with cover slips with DPX (Sigma-Aldrich) mountant. Slides were photographed at 400 x magnification with a camera mounted on a microscope following Soto et al. (2007).

**Morphological quantification**

Size and shape components of morphology were analyzed separately. The original size of different individuals coming from different sources was recovered by scaling the original photos to the same pixel resolution using the scales provided. Aedeagal morphology was quantified in lateral view with an open outline approach which uses sliding semilandmarks (Bookstein, 1996) to quantify two or three-dimensional homologous curves and surfaces (Gunz et al., 2005, Gunz and Mitteroecker, 2013; Schlager, 2017). To implement this analytic approach, the outlines were first extracted from jpg images and transformed into closed outlines using Momocs package version 2.0-6 (Bonhomme et al., 2014). The closed outlines retrieved were standardized by using 800 (x,y) coordinates also with Momocs package. Then, three landmarks (i.e., discrete anatomical loci or geometric features recognizable in all specimens of the sample) were located on outlines of aedeagus, namely, the landmarks one and three at each side of the base and landmark one at the tip, with the def_ldk function of Momocs package. The three resultant inter-landmark segments represent the outlines of the lateral dorsal margin, ventral dorsal margin and the base of the intromittent part of aedeagus. The base contains non-relevant information because it is an artificial linear segment retrieved from the manipulation of the images. Therefore, the segment corresponding to the base (the segment between landmarks one and three) was deleted with the aim to obtain the open outlines that only carry the biological information. Subsequently, using the curve length as an index of geometric accuracy (Macleod, 1999), 169 semilandmarks were determined as the optimum number to reproduce the perimeter of inter-landmark boundary curves of the samples (i.e., 89 semilandmarks corresponding to the lateral dorsal margin segment and 80 corresponding to the lateral ventral margin segment). The optimal number of coordinates necessary to represent a given curve is the minimum number of coordinates that has at least the 99% of the length of the original curve perimeter (Macleod, 1999). Then, the spacing of the semilandmarks is optimized by allowing them to slide along the curves. Herein, the bending energy criteria of sliding was used to the detriment of the Procrustes distance criteria, which can induce distortions of the original curve (Gunz & Mitteorocker, 2013). Semilandmarks are used to represent homologous curves and surfaces by sets of points, establishing a geometric homology between corresponding semilandmarks across the sample (Bookstein, 1996). This geometric correspondence is established by the sliding of the semilandmarks along the curves, which reduces the effect of the arbitrary initial spacing (i.e., shape variation) of the arbitrary initial equidistant spacing (Gunz & Mitteorocker, 2013). One issue of the sliding semilandmarks approach was that the adjustment of the semilandmarks positions takes place along tangents (i.e., linear trajectories) to the boundary outline and not on the boundary outline curve itself and this can lead to serious technical artifacts (see Macleod, 2013 for a comprehensive revision of this morphometric technique). In the version of the sliding semilandmarks procedure implemented here, this issue has been satisfactorily solved by the addition of one step that comprises the reprojection on the original curve of slid semilandmarks on the tangent (Gunz et al., 2005; Schlager, 2013; Schlager, 2017). Differences in size, rotation and position were removed by performing Generalized Procrustes Analysis (GPA; Gower, 1975) on the open outlines using only the three fixed landmarks. The size proxy of each outline extracted in the GPA analysis (i.e., Centroid size, Cs) was retained and used as an estimate of aedeagus size for subsequent analyses. Optimization of the number of semilandmarks, the slide of those on tangent to the curves, reprojection on the curves and GPA were performed using Morpho package version 2.8 (Schlager, 2017) in R environment. Several functions needed to fully implement this sliding semilandmarks approach were taken from https://github.com/millacarmona/gmorphometrics/blob/main/Functions_MillaCarmona.R. The Procrustes coordinates obtained were used to perform a Principal Component Analysis (PCA), with the prcomp function of the stats package of R environment (R Core Team 2016). The PCA permitted to summarize and reduce the dimensionality of the shape information described by the coefficients. The resulting scores of each Principal Component (PC) could be considered as reorganized and uncorrelated morphological descriptors representing different aspects of total shape variation (Zeldtich et al., 2012). After applying the Broken-Stick method (Jackson, 1993) implemented in the R package adephylo (Jombart et al., 2010), three PCs (representing about 85% of the total Shape variation) were retained for the subsequent statistical analyses of shape variation.