Juvenile alewives (Alosa pseudoharengus): Size, growth, and date of emigration from natal lake
Data files
Mar 05, 2026 version files 113.64 KB
-
agesclean.csv
6.97 KB
-
agesraw.csv
11.78 KB
-
Figure_1_Code.R
3.05 KB
-
Figure_2_Code.R
3.74 KB
-
Figure_3_Code.R
3.42 KB
-
Figure_4_Code.R
4.56 KB
-
growthrates.csv
1.68 KB
-
lengthdata.csv
8.16 KB
-
masslengthagedata.csv
9.77 KB
-
masslengthdata.csv
10.70 KB
-
README.md
16.10 KB
-
Supp_Fig_1_Code.R
5.92 KB
-
Supp_Table_1_Code.R
5.57 KB
-
Supp_Table_2_Code.R
3.26 KB
-
Supp_Table_3_Code.R
2.79 KB
-
Supp_Table_4_Code.R
3.59 KB
-
Supp_Table_5_Code.R
4.57 KB
-
Supp_Table_6_Code.R
3.50 KB
-
Supp_Table_7_Code.R
4.50 KB
Abstract
This dataset supports an investigation into how the growth of young alewives (Alosa pseudoharengus) is affected when they are unable to migrate from the lake in which they hatched to the sea. Fish were sampled at the outlet of Bride Lake in East Lyme, Connecticut, USA, during a non-drought year and a prolonged drought year in which their migration was blocked when the exit stream dried up for a prolonged period. Data taken on individuals include standard length and dry mass, daily age from earstones (otoliths), and measurements of daily growth marks on otoliths that permit estimation of daily growth over various periods after hatch. These data provide insight into within- and among-year variability in attributes of individuals who are undertaking a migration. Individuals undertaking migration are a non-random subset of the population, and their features (size, growth rate) influence migration performance, hence prospects for subsequent survival and growth. Variability in these features underscores the complex rules of the strategic decision to migrate and will help to inform management and conservation of this species of concern.
This dataset contains morphometric measurements, age estimates, and otolith-based growth calculations for juvenile alewives (Alosa pseudoharengus) collected from Bride Brook, Connecticut, during a non-drought year (2021) and a drought year (2022). The data support analyses examining how drought-induced loss of estuarine connectivity affected fish size, growth, body condition, and size-at-age. Age estimation was performed using otoliths with validation through multiple independent readers. The datasets enable reproduction of all tables and figures in Burgess et al. (manuscript in review), which documents significant negative effects of drought on juvenile alewife phenology, growth rates, and body condition.
We have submitted six datasets (agesclean.csv, agesraw.csv, lengthdata.csv, masslengthdata.csv, masslengthagedata.csv, growthrates.csv) and associated R scripts for reproducing all analyses. Dataset filenames indicate their use in specific tables and figures within the manuscript.
Description of the Data and File Structure
agesclean.csv
This dataset contains quality-controlled age estimates from multiple readers, used for age validation statistics in Supplementary Table 1. Only fish with high-quality, readable otoliths and age readings within 10 % agreement of the expert reader are included. All ages have been corrected by adding 2 days to account for when daily growth increments are not yet visible in otoliths.
Variables:
- Sample: Collection date in MM_DD_YY format (e.g., 06_09_21 = June 9, 2021)
- Expert.Count: Age estimate in days from the expert reader (primary age determination); represents the number of daily growth increments counted in the otolith plus 2-day correction for the pre-feeding period
- Expert.Count.2: Age estimate in days from the expert reader's second independent reading of the same otolith (used to assess intra-reader repeatability); includes 2-day correction; blank cells indicate the otolith was not re-read by the expert or the reading was excluded due to > 10 % disagreement with the primary expert count
- Stu.Count.1: Age estimate in days from Student Reader 1; includes 2-day correction; blank cells indicate the otolith was excluded due to > 10 % disagreement with the expert count
- Stu.Count.2: Age estimate in days from Student Reader 2; includes 2-day correction; blank cells indicate the otolith was excluded due to > 10 % disagreement with the expert count
- Mean.Age: Mean age across all available quality-controlled readings for an individual fish (days); used to back-calculate hatch dates and for subsequent age-based analyses
Missing data codes: blank cells indicate that a reader did not examine the otolith, a second reading was not performed, or a reading was excluded from the mean calculation due to > 10 % disagreement with the expert reader.
Sample size: n = 303 fish
agesraw.csv
This dataset contains all unfiltered age readings from the initial aging, including both readable and unreadable otoliths, and all student readings regardless of agreement level. This complete dataset is used in Supplementary Figure 1 to visualize age bias patterns before quality filtering. Ages in this dataset include the 2-day correction applied in ageclean.csv.
Variables:
- Site: Collection location; "Bride" indicates Bride Brook, Old Lyme, Connecticut
- Count.1: Age estimate in days from the expert reader's first reading
- Rep.Count.2: Age estimate in days from Student Reader 1; blank cells indicate the otolith was unreadable by this reader
- Rep.Count.3: Age estimate in days from Student Reader 2; blank cells indicate the otolith was unreadable by this reader
- Countable: Otolith quality classification (1 = readable/countable otolith with clear daily increments; 0 = poor quality, unreadable otolith with obscured or indistinct increments)
Missing data codes: blank cells indicate that an otolith could not be read by that reader due to poor quality (e.g., crystallized, cracked, or unclear increment structure).
Sample size: n = 493 fish (includes both readable and unreadable otoliths)
lengthdata.csv
This dataset contains collection dates and total length measurements for all juvenile alewives used in seasonal growth analyses (Supplementary Tables 2-3, Figure 1). These data examine temporal patterns in body size during the growing season.
Variables:
- Date: Collection date in M/D/YY format (e.g., 6/9/21 = June 9, 2021)
- Site: Collection location; "Bride" indicates Bride Brook, Old Lyme, Connecticut
- Length.TL.: Total length measured from tip of snout to tip of caudal fin (tail) in millimeters; measured to nearest 0.1 mm using digital calipers on freshly caught specimens
Missing data codes: No missing data; all fish have complete date and length information.
Sample size: n = 429 fish
masslengthdata.csv
This dataset contains total length and dry mass measurements for fish used in body condition analyses (Supplementary Tables 4-5, Figure 2). Body condition was assessed through mass-at-length relationships and residuals from allometric scaling equations.
Variables:
- Date: Collection date in M/D/YY format
- Site: Collection location; "Bride" indicates Bride Brook, Old Lyme, Connecticut
- Length.TL.: Total length measured from tip of snout to tip of caudal fin (tail) in millimeters; measured to nearest 0.1 mm
- Dry.Mass.Net.: Dry body mass in grams after drying at 60°C for 48+ hours; measured to nearest 0.001 g
Missing data codes: No missing data; all fish have complete measurements.
Sample size: n = 429 fish
masslengthagedata.csv
This dataset combines morphometric measurements with otolith-derived age estimates, enabling age-adjusted growth analyses and examination of size-at-age patterns (Supplementary Tables 6-7, Figure 3).
Variables:
- Date: Collection date in M/D/YY format
- Site: Collection location; "Bride" indicates Bride Brook, Old Lyme, Connecticut
- Age(Days): Fish age in days, estimated from otolith daily increment counts that passed quality control tests; includes 2-day correction; blank cells indicate age could not be determined due to poor otolith quality
- Length(TL): Total length measured from tip of snout to tip of caudal fin (tail) in millimeters; measured to nearest 0.1 mm
Missing data codes: blank cells in Age(Days) indicates that age estimation was not possible for that specimen due to poor otolith readability or disagreement among readers.
Sample size: n = 434 total fish; n = 249 fish with valid age estimates (excluding blank cells). This excludes ages that were not from fish captured at Bride Lake. (n = 54 excluded)
growthrates.csv
This dataset contains otolith radius measurements used for back-calculating growth rates during specific life history periods using the biological intercept method (Campana and Jones 1992). Growth rates (in mm) were estimated for three time periods: (1) the first 30 days post-hatch, (2) the last 100 days before capture, and (3) days 70-100 post-hatch (a 30-day period when one cohort experienced drought conditions while the other did not). Otolith measurements were taken along a transect at -135° from the rostrum. Fish from 2021 were collected on Oct 5th and fish in 2022 were collected in on Oct 7th. The similar calendar dates of collection, and difference in connectivity between cohorts caused us to choose these two groups for comparison.
Variables:
- year: Year of fish collection (2021 = non-drought year; 2022 = drought year)
- radius: Total otolith radius from primordium (core) to edge in micrometers (μm); measured along the -135° transect from the rostrum on digital images
- measfirst30: Fish growth at the 30th daily increment from the core in millimeters; represents fish size at 30 days post-hatch
- measlast30: Fish growth at 30 daily increments between day 70 and day 100; represents fish growth from day 70 to 100 (Period of no drought in 2021, period of drought in 2022)
- meas100: Otolith radius at 100 daily increments from the core; represents otolith size 100 days post-hatch
- tl: Total length at capture measured from tip of snout to tip of caudal fin in millimeters
Back-calculation method: Growth increments (amount fish grew in mm during each period) were back-calculated using the biological intercept method (Campana and Jones 1992) with the following formula:
Li = Lc + (Si - Sc) × (Lc - a) / (Sc - b)
where:
- Li = fish length at age i (the age of interest)
- Lc = fish length at capture (tl)
- Si = otolith radius at age i (measfirst30, measlast30, or meas100)
- Sc = otolith radius at capture (radius)
- a = 4.5 mm (point when fish length and otolith radius have a linear relationship)
- b = 9.25 μm (point when otolith radius and fish length have a linear relationshop)
Time periods analyzed:
- First 30 days: Growth from hatch to day 30 (Li calculated using measfirst30)
- Last 100 days: Growth from hatch to day 100 (Li calculated using meas100)
- Days 70-100: Growth from day 70 to day 100; period when 2021 cohort was not experiencing broken connectivity from drought, and the 2022 cohort was based on back-calculated birthdays.
Missing data codes: No missing data; all fish selected for otolith measurement had complete measurements.
Sample size: n = 40 fish (n = 20 from 2021, n = 20 from 2022)
Relationships Between Files and Missing Data
The six datasets are organized by analytical purpose:
- agesraw.csv (n = 493) contains the complete, unfiltered age dataset including poor-quality otoliths and all student readings regardless of agreement. Used for visualizing age bias patterns in Supplementary Figure 1 (panels showing both clean and unclean data).
- agesclean.csv (n = 303) is the quality-controlled subset of agedroughts.csv, retaining only readable otoliths (Countable = 1) and age readings within 10 % agreement of the expert reader. All ages include the 2-day correction. Used for age validation statistics in Supplementary Table 1.
- lengthdata.csv (n = 429) contains basic morphometric data (date, site, length) for all fish used in seasonal length analyses (Supplementary Tables 2-3, Figure 1).
- masslengthdata.csv (n = 429) contains the same fish as masslengthdata.csv but includes the additional dry mass variable for body condition analyses (Supplementary Tables 4-5, Figure 2).
- masslengthagedata.csv (n = 434, with 361 aged fish) combines morphometric data with age estimates for age-adjusted growth analyses (Supplementary Tables 6-7, Figure 3).
- growthrates.csv (n = 40) is a specialized subset containing detailed otolith measurements for back-calculation of growth rates during specific life periods (Figure 4).
Missing data conventions across all files: blank cells indicate missing or excluded data. Specific reasons for blank cells are detailed in each dataset description above.
Abbreviations used:
- TL = Total Length
- blank cells = Not Available (missing or excluded data)
- μm = micrometers
- mm = millimeters
- g = grams
Data Collection Methods
Study location: Bride Brook, Old Lyme, Connecticut, USA; a small coastal stream connecting Bride Lake to Long Island Sound. Some ages used for the production of supp. figure 1 and supp. table 1 were from samples collected in Lake Rockview, Old Saybrook, Connecticut, USA and Rogers Lake, Old Lyme, Connecticut, USA. These samples were included as part of the age bias analysis and CV calculation, as they were used for testing reader bias and precision. They were not included in the age or growth rate analysis, and those samples are not otherwise utilized in this manuscript.
Study period and context:
- 2021 (non-drought reference year)
- 2022 (severe drought year): characterized by extremely low precipitation and stream discharge that prevented juvenile alewives from migrating to the estuary for 5 months.
Required R packages and their uses:
- readr (version 2.1.0 or higher): Data import
- dplyr (version 1.0.0 or higher): Data manipulation and filtering
- ggplot2 (version 3.3.0 or higher): Statistical graphics and visualization
- lubridate (version 1.8.0 or higher): Date parsing and manipulation
- janitor (version 2.1.0 or higher): Data cleaning and column name standardization
- car (version 3.0.0 or higher): Type-II ANCOVA and model diagnostics
- broom (version 0.7.0 or higher): Converting statistical models to tidy data frames
- FSA (version 0.9.0 or higher): Fisheries stock assessment functions for age bias and precision analyses
Script organization and workflow:
Scripts follow the naming convention [Type]_[Number]_Code.R where Type indicates the output category (Figure, Supp_Table, Supp_Fig) and Number indicates the specific table or figure in the manuscript.
Each script is self-contained and includes:
- Header documentation (purpose, author, date, inputs, outputs)
- Package loading
- Data import and preparation
- Statistical analyses
- Table/figure generation
- Session information output (R version, platform, loaded package versions)
Supplementary Table scripts:
Supp_Table_1_Code.R: Age estimation validation and inter-reader precision metrics using FSA packageSupp_Table_2_Code.R: Type-II ANCOVA of total length vs. Julian date × year (tests for differences in seasonal length patterns)Supp_Table_3_Code.R: Year-specific linear regressions of length vs. Julian dateSupp_Table_4_Code.R: Type-II ANCOVA of log-transformed body mass vs. log-length + Julian date × year (body condition analysis)Supp_Table_5_Code.R: Overall mass-length scaling relationship and year-specific regressions of mass residuals vs. lengthSupp_Table_6_Code.R: Type-II ANCOVA of length vs. age + Julian date × year (age-adjusted growth analysis)Supp_Table_7_Code.R: Overall length-age relationship and year-specific regressions of length residuals vs. Julian date
Main figure scripts:
Figure_1_Code.R: Seasonal patterns in mean total length with drought period marked (line plot with error bars)Figure_2_Code.R: Seasonal patterns in body condition (mass residuals) with drought period markedFigure_3_Code.R: Seasonal patterns in size-at-age (length residuals from length-age relationship)Figure_4_Code.R: Otolith-based back-calculated growth rates for three time periods (produces multi-panel violin/box plots)
Supplementary figure scripts:
Supp_Fig_1_Code.R: Age bias plots comparing expert and student readers, showing both unfiltered (agedroughts.csv) and quality-controlled (ageclean.csv) datasets
Usage instructions:
- Set working directory to the location containing data files and scripts:
setwd("path/to/data/directory")
- Verify working directory:
getwd()
- Run desired script:
source("Figure_1_Code.R")
Scripts will automatically:
- Load required packages (install them first if needed using install.packages())
- Import the appropriate data file(s)
- Perform analyses
- Print session information to console for reproducibility documentation
Troubleshooting:
If scripts fail to run:
- Ensure all required packages are installed:
install.packages(c("readr", "dplyr", "ggplot2", "lubridate", "janitor", "car", "broom", "FSA")) - Verify data files are in the working directory:
list.files() - Check R version:
R.version.string(should be 4.0 or higher) - Review session information output at the end of each script for package version details
Date parsing may generate warnings from lubridate - these are normal and can be ignored as the function tries multiple date formats automatically.
Contact Information
For questions about this dataset or code, please contact:
Michael Burgess
Email: michaelburgess89@gmail.com
Juvenile Alewife were sampled at the outlet of Bride Lake in East Lyme, Connecticut, during a non-drought year and a prolonged drought year. We measured standard length and dry mass, determined daily age from earstones (otoliths), quantified condition as the residual from the overall relationship between mass and length, and estimated daily growth over the lifetime as well as in stanzas from 0 to 30 days, 0 to 100 days, and 70 to 100 days post hatch from the width of daily growth marks in the otolith.
