Skip to main content

Patch size and vegetation structure drive changes to mixed-species flock diversity and composition across a gradient of fragment sizes in the Western Andes of Colombia

Cite this dataset

Jones, Harrison; Robinson, Scott (2021). Patch size and vegetation structure drive changes to mixed-species flock diversity and composition across a gradient of fragment sizes in the Western Andes of Colombia [Dataset]. Dryad.


This data set represents a series of 502 mixed-species bird flock compositions, and derived taxonomic, functional, and phylogenetic diversity indices, that were gathered along a gradient of forest fragment sizes (range = 10-173 ha) in the Colombian Western Andes. We sampled mixed-species flocks using transect surveys along 14 transects in 8 fragments and a continuous forest reference site in the same landscape and at the same elevation (~1900-2200 m.a.s.l.). We also used buffer analysis to quantify the proportion of forest cover and forest edge within 1 km of each transect, and calculated local vegetation density and complexity, as well as distance from edge, for each 100-meter transect segment (n = 70 segments). Flock composition data observed on a transect were used to calculate overall species richness and flock size as well as two indices of functional and phylogenetic diversity; we calculated the stadardized effect size (SES) of each measure to account for the correlation between these measures and species richness. We also provide the raw counts of each species for each flock composition. These data were used for the analyses in Jones and Robinson (2020). 


Study System and Sites

We conducted all fieldwork in subtropical humid forests located within the municipality of El Cairo, Valle del Cauca department in Colombia. The study region is part of the Serrania de los Paraguas in the Western Andes mountain range, a center of avian threatened species diversity and endemism within Colombia. The study landscape in this municipality consists of a patchwork of forest fragments embedded in a matrix of cattle pasture, regenerating scrub, and coffee farms. Within this landscape, we selected eight fragments representing a gradient in patch sizes (range 10 to 170 ha). Sites are in the same altitudinal belt (1900-2200 m.a.s.l.) and matrix type (cattle pasture) to control for effects of altitude and matrix type on flock size and composition. Within-patch disturbance is common in fragmented Andean forests in Colombia, particularly illegal selective logging, which in our landscape typically occurred as removal of select old-growth trees for lumber by landowners; logging histories varied considerably from historical to ongoing, and extensive to limited, both within and between patches. We established 500-meter transects through forest interior (n = 14 total transects) which were opportunistically placed on existing trails, at variable distances from the edge of the fragments. We further divided each transect into 100-meter segments to account for heterogeneity in vegetation structure within transects. We accounted for edge effects by measuring the distance to forest edge of each transect segment. 

We stratified forest fragments into large (≥ 100 ha), medium (~30-50 ha), and small (≤ 20 ha) size categories and selected a minimum of two replicates of each; these represent the range of fragment sizes available in our study landscape. We also included a non-fragmented reference site (Reserva Natural Comunitária Cerro El Inglés, ~750 ha) connected to over 10,000 ha of continuous forest to the north and west along the spine of the Serranía de los Paraguas.  We only selected fragments with primary or late-successional secondary forest; vegetation structure and canopy height varied substantially between patches based on intensities of selective logging and land-use histories (see above). Fragments were all separated by ≥ 100 meters to minimize among-patch movement of birds, and all transects in different fragments were at least 250 meters apart.

Transect Surveys for Mixed-species Flocks

We performed transect surveys for mixed-species flocks, adapted from Goodale et al. (2014), in forest fragments from June-August 2017 (boreal migrants absent) and January-March 2018 (boreal migrants present). Both sampling periods corresponded to a dry season in the Western Andes, which has a bimodal two-dry, two-wet seasonality pattern. For each transect, we spent two and a half sequential field days performing continuous transect surveys; we conducted surveys in small fragments, large fragments, and continuous forest sites in random order to avoid a temporal bias in sampling. Surveys were distributed across the morning (7:30-11:30) and evening (15:00-17:30) hours. Transects were walked slowly and continuously by 2-3 observers, including local birdwatchers familiar with all species (Harrison Jones present for all surveys); flocking birds were identified by both sight and sound. When we encountered a flock, we noted the time of day and transect segment in which it was observed and spent up to a maximum of 45 minutes characterizing it with 10x binoculars. At least 5 minutes were spent with each flock, following it if possible. Because detection of species in flocks was imperfect, we only included a flock observation in the analysis if we felt that at least 80% of the individuals were observed (e.g. after spending several minutes of continuous observation at the end of the survey period without observing a new species or individual); incomplete flock observations were not included in analyses. We feel that our survey methodology accurately described flock composition because birds moved and called frequently in flocks, leading to high detectability. We noted the start and end time of each survey, and the presence of incomplete flocks to calculate flock encounter rate. We also supplemented the transect surveys with data from flocks opportunistically observed on a transect while performing other fieldwork. Some flocks in the data set represent flock compositions recorded near but not on a transect; these compositions have no associated transect segment.  

Calculation of Landscape-level Variables

            We obtained landscape-level variables for analyses using geographic information software (GIS) analysis in ArcGIS (ArcMap 10.3.1; Esri; Redlands, CA). To quantify landscape composition and configuration, we buffered each transect (n = 14) by 1 km; buffers extended from the entire length of the transect. We then calculated measures of landscape composition and configuration using a recent land-cover/use categorization made by the Corporación Autónoma Regional del Valle del Cauca, converted to a 25-m cell-size raster. To quantify landscape composition, we calculated percentages of the forest-cover type within each buffer using the ‘isectpolyrst’ tool in Geospatial Modelling Environment (version We measured landscape configuration for each transect as edge density, or length of all forest edges (in meters) divided by total buffer area (in hectares). The distance to edge was calculated in meters for each 100-meter transect segment (n = 70) as the shortest straight-line distance between the center point of the segment and the nearest edge of the fragment. 

Vegetation Measurements and Principal Component Analysis

            We measured vegetation structure in each 100-m transect segment used for flock sampling. Vegetation measurements were made from June-August 2017; based on our observations of vegetation, we assumed variation between the two sampling periods was minimal. We used the sampling methodology of James and Shugart (1970), following the modifications made by Stratford and Stouffer (2013), and further modified to be used with belt transects. Broadly, the methodology comprises two components for every 100-meter transect segment: (1) the quantification of canopy cover, ground cover, canopy height, and foliage height diversity of vegetation using point sampling every 10 meters and (2) the quantification of shrub, vine, fern, palm, and tree fern and tree density using 3 meter-wide belt sampling.

For the point sampling, we measured eight variables at ten-meter intervals, for 10 points per 100-meter segment. As a measure of foliage height diversity along the transect, we noted the presence or absence of live vegetation at five heights: <0.5 m, >0.5–3 m, >3–10 m, >10–20 m, and >20 m. Above 3 meters, we used a rangefinder to determine heights while sighting through a tube with crosshairs. Canopy and ground cover were calculated to the nearest 1/8th of the field of view by sighting through a vertical canopy densiometer (GRS Densiometer, Geographic Resource Solutions, Arcata, CA). For each segment, we averaged values for canopy cover, and ground cover, and calculated the proportion of points at which vegetation was present for each height category. For the belt transect sampling, we surveyed vegetation along the same transects and calculated densities for each 100-m transect interval. We counted all shrubs, vines, ferns, tree ferns, and palms encountered on 1.5 meters to either side. Secondly, we counted all trees (woody vegetation > 2 m in height) within 1.5 meters of the transect and measured their diameter at breast height (DBH). Trees were later categorized into six DBH size classes for analysis: 1-7 cm, 8-15 cm, 16-23 cm, 24-30 cm, 31-50 cm, and > 50 cm. We additionally recorded the largest tree’s DBH.

            To quantify foliage height diversity, we calculated the Shannon Diversity Index of the proportion of points with vegetation present in each of the five height bands for each segment (n = 70 segments). To reduce redundancy and minimize correlation between variables, we (separately) ordinated our tree DBH and understory plant density data using principal component analysis (PCA: McGarigal et al. 2000) for each 100-meter transect segment. We column (Z score) standardized data prior to ordination to account for differences in the units of measurement and used the covariance matrix to run the PCA. The principal components were interpreted using the significance of the principal component loadings. The PCA was run in R (version 3.5.1) using the princomp function in the stats package. The Shannon Index was calculated using the diversity function of the vegan package (Oksanen et al. 2019). The results of the vegetation data ordination are reported in the Supplementary Materials for Jones and Robinson (2020). 

Calculation and Standardization of Functional and Phylogenetic Diversity Metrics

            We define functional diversity as the diversity of foraging niches and behaviors present within a given flock. We calculated two measures of functional diversity: functional richness (Villéger et al. 2008) and functional dispersion (Laliberté and Legendre 2010). These are multivariate measures calculated using a distance framework from a matrix of quantitative and categorical traits of all species observed (Laliberté and Legendre 2010). We built a trait matrix for all species observed in flocks based on information from the Handbook of the Birds of the World Alive website (del Hoyo et al. 2020; also uploaded on Dryad), supplemented with family-specific references where appropriate (full reference list provided in Supplementary Materials of Jones and Robinson [2020]). We included body mass, diet (degree of insectivory, frugivory, nectivory, and granivory), foraging maneuvers, foraging substrates, foraging strata (ground, understory, midstory, sub-canopy, and canopy), and habitat preferences (forest interior, forest edge, secondary forest, open habitat) as functional traits. Categories were not mutually exclusive; species’ diet preferences were classified on a 0 to 3 scale based on reported frequency of consumption, while use of foraging maneuvers, substrates, strata and habitat types were keyed as present (1) or absent (0). Foraging maneuvers and substrates were classified according to the Remsen and Robinson (1990) typology. Functional diversity metrics were calculated from a matrix of species-by-flock abundances using the dbFD function of the FD package (Laliberté et al. 2014). Functional dispersion was weighted by the abundance of each species in each flock. We standardized functional richness by the ‘global’ functional richness to constrain it between 0 and 1.

            We also selected phylogenetic diversity as a measure of the quantity of phylogenetic differences present in each flock. This is measured as the summed branch lengths of all species present on a phylogeny of all species observed (Faith 1992). The divergence of phylogenetic lineages present in each flock was measured using the mean pairwise distance measure, which calculates the mean branch lengths between all species pairs in a given flock (Webb et al. 2002). We created a phylogeny of all species observed in flocks by sub-setting the global Jetz (2012) phylogeny. We downloaded 1000 trees from the Bird Tree website ( using the Hackett (2008) backbone phylogeny. We calculated a 50% majority-rule consensus tree using mean edge lengths with the consensus.edge function of the phytools package (Revell 2019); the observed metrics were then calculated using the pd and mpd functions of the picante package (Kembel et al. 2019), using the same species-by-flock abundances matrix as for the functional diversity measures.

            Because measures of functional and phylogenetic diversity are often correlated with species richness (Vellend et al. 2014; Weiher 2014), we used standardized effect sizes (SES) to control for species richness when calculating these metrics. To create a null model of functional and phylogenetic diversity we used 999 iterations of the tip swapping method (Webb et al. 2008) to randomly shuffle taxa labels across the tips of the functional trait matrix and flocking species phylogeny, respectively. We then calculated the relevant diversity metrics on each of the null communities and compared it to the observed value. We standardized SES values for each metric by subtracting the mean of the null values from the observed value and then dividing this by the standard deviation of the null values. We used the ses.pd and ses.mpd functions of the picante package to calculate phylogenetic diversity SES values; functional diversity values were calculated based on novel code developed by Dr. Ben Baiser (all R code available as an electronic supplement in Jones and Robinson [2020]).

Usage notes

Variable Descriptions 

Flock_ID: A unique identifier for each flock composition. 

Julian_Date: The Julian (numeric) date on which the flock was observed. 

Season: Indicates whether the flock was observed during the boreal summer sampling period (0; June-August 2018) or the boreal winter sampling period (1; January-March 2019). 

Site_Segment: Indicates the name of the transect on which the flock was observed followed by the 100-meter transect segment (1-5 for each transect). Where no value is given the flock was observed off of the transect but within 100 meters of the transect; these compositions were not used for analyses. Site characteristics and coordinates for each transect are described in Table 1, Jones and Robinson (2020).

Transect: The name of the transect that the mixed-species flock was observed on or near. 

Site: Corresponds to the forest fragment number that the flock was observed in. In some cases, fragments had multiple transects present in them. Site numbers are ordered by increasing area; site 9 corresponds to a continuous forest reference site (RNC Cerro El Ingles). Site characteristics are listed in Table 1, Jones and Robinson (2020). 

Minute and Minute_2: Numeric minute and minute squared of the day at which the flock was first detected. Zero corresponds to midnight. 

Chlorospingus: This variable indicates whether Chlorospingus canigularis, a nuclear species in this Andean flocking system, was present in the flock composition. 

Species: The total species richness of species observed in the flock during the survey. 

Size: The total number of individuals of all bird species observed in the flock during the survey. 

Furnariid_Sp: The total observed species richness of ovenbirds (Furnariidae) in the flock during the survey. 

Tyrannid_Sp: The total observed species richness of tyrant flycatchers (Tyrannidae) in the flock during the survey. 

Migrant_Sp: The total observed species richness of boreal migrant species observed in the flock during the survey. 

Thraupid_Sp: The total observed species richness of tanager (Thraupidae) species observed in the flock during the survey. 

SESFRic: The standardized effect size of the functional richness index of functional diversity present in the flock. 

SESFDis: The standardized effect size of the functional dispersion index of functional diversity present in the flock. 

SESPD: The standardized effect size of the PD measure of phylogenetic diversity present in the flock. 

SESMPD: The standardized effect size of the mean pairwise distance measure of phylogenetic diversity present in the flock. 

Dist_Edge: The straight-line distance in meters from the midpoint of the transect segment in which the flock was obsserved to the nearest fragment edge. 

1km_forest: The percentage of forest land use in a 1 km buffer centered along the entire length of the transect. This variable was calculated at the transect (and not transect segment) level. 

1km_edge: The edge density, calculated as the distance of forest edge in meters divided by the total area in hectares, for a 1 km buffer centered along the entire length of the transect. Calculated at the transect level. 

Canopy_Cover: The average canopy cover calculated using a canopy densiometer at ten points for each transect segment. 

Understory_Density: A principal component axis measuring the density of understory shrubs, ferns, vines, tree ferns, and palms, as measured using belt transects, for each transect segment. More negative values correspond to higher densities of each category of plant. 

Tree_Size: A principal component axis measuring the density of large-diameter trees, as measured by belt transects, for each transect segment. More positive values correspond to greater densities of large-diameter trees.  

Vertical_Sturcture: A measure of vertical vegetation complexity. We calculated the proportion of ten point surveys that contained vegetation in five height bands for each transect segment. We then used these proportion data to calculate the Shannon Diversity Index of vegetation complexity for each transect segment. This variable was closely correlated with canopy height at our sites.  

Species abundances: The observed abundance of each species in the flock. Species names use the latin binomial. 


Tinker Foundation

Animal Behavior Society