Data from: Collective movement increases initial accuracy and path efficiency in talitrid amphipod orientation
Data files
Apr 30, 2026 version files 292.80 GB
-
Averaged_GS1_Values.csv
373 B
-
Bootstrapped_Underlying_Dist_for_GS10.csv
2.78 MB
-
Bootstrapped_Underlying_Dist_for_GS5.csv
2.78 MB
-
GS1_Raw_DLTdv8a_output.zip
2.63 MB
-
GS1_Videos.zip
118.34 GB
-
GS10_Raw_DLTdv8a_output.zip
5 MB
-
GS10_Videos.zip
174.37 GB
-
GS5_Raw_DLTdv8a_output.zip
3.25 MB
-
GSall.csv
257.33 KB
-
GSalltracks.csv
68.35 MB
-
README.md
20.40 KB
Abstract
Talitrid amphipods are an extensively studied system for navigation due to their robust ability to navigate back to the optimal burrowing zone after foraging, and could be a model system in which to study the impacts of collective behavior on short-distance navigation and orientation. We investigated whether talitrid amphipods (Megalorchestia pugettensis) differ in their orientation abilities when released individually versus in a group. When released individually, the amphipods took longer to start moving (p<0.001), traveled longer paths (p=0.003), moved faster (p=0.016), had a different initial bearing (p=0.003), and exhibited more spread in their initial bearing (p=0.009) than when released in groups. There was no difference between individuals and groups in terms of their trial time, nor in the direction or spread of their final orientation. This study introduces a tractable, invertebrate species in which to study the impacts of collective movement and reveals previously unexamined differences in orientation abilities for talitrid amphipods released independently vs. in a group that have implications for experimental design in this system.
Dataset DOI: 10.5061/dryad.7m0cfxq3x
Description of the data and file structure
Talitrid amphipods are an extensively studied system for navigation due to their robust ability to navigate back to the optimal burrowing zone after foraging and could be a model system in which to study the impacts of collective behavior on short-distance navigation and orientation. We investigated whether talitrid amphipods (Megalorchestia pugettensis) differ in their orientation abilities when released individually versus in a group. When released individually, the amphipods took longer to start moving (p<0.001), traveled longer paths (p=0.003), moved faster (p=0.016), had a different initial bearing (p=0.003), and exhibited more spread in their initial bearing (p=0.009) than when released in groups. There was no difference between individuals and groups in terms of their trial time, nor in the direction or spread of their final orientation. This study introduces a tractable, invertebrate species in which to study the impacts of collective movement and reveals previously unexamined differences in orientation abilities for talitrid amphipods released independently vs. in a group that have implications for experimental design in this system.
On Zenodo: The Markdown document "Main Code.rmd" along with the associated knit "Main-Code.html" contains all the code needed to recreate the analyses in the main text and supplemental of the associated paper. In addition to the code to make the figures and tables, it also contains the following:
- Loading of all of the trial data that was created in the file "Code to Load and Wrangle Data.rmd" (described below), along with detailed information about each column
- Loading of the data from the non-parametric permutation test (i.e. "bootstrapped data"). This is done to reduce the time associated with re-running the bootstrap. These files are generated in the markdown document "Nonparametric Bootstrap Analysis.rmd". The files in question are entitled "Boostrapped Underlying Dist for GS5.csv", "Boostrapped Underlying Dist for GS10.csv" and "Averaged GS1 Values.csv"
- Sourcing of the file "Circular Plots Function.R" which contains a function to create the circular plots in figure 2A
- Sourcing of the files "Violin Plots with Bootstrap pvalue.R" and "Circular Violin Plots with Bootstrap pvalue.R" which contains functions to create the violin plots (for linear and circular variables respectively) in figure 3
- Sourcing of the files "Bootstrap Histogram Plot Function.R" and "Circular Bootstrap Histogram Plot Function.R" which contains functions to create the histogram plots (for linear and circular variables) in figure S6 and S7
- Sourcing of the file "Mixed Effect Model Function.R" which contains a function to run the mixed effect models associated with table S2 and pull out the relevant variables of interest
- Sourcing of the file "Violin Plots of Weather.R" which contains functions to create the violin plots in figure S5
The original trial videos were digitized using the MATLAB app DLTdv8a. The raw output is included for each group size in the zipped folders (including both the .csv output and the .mat files) entitled "GS#_Raw DLTdv8a output.zip" where the # symbol corresponds to each of the associated GS's. The original trial videos used for each .mat are included in the zipped folders "GS# Videos.zip". The videos for GS 1 and GS10 are included, but the videos for GS5 could not be included due to exceeding the data storage limit, and due to the fact that they were not used in the final publication. These videos are available upon request to the lead author.
These raw csv files output from DLTdv8a were then compiled and tidied to use for further analysis using the code in the markdown document entitled "Code to Load and Wrangle Data.rmd". This markdown document also sources the following scripts which contain the functions needed to tidy the outputs from DLTdv8a and calculate the variables of interest: "Functions for GS1_Cleaned for Pub_V2.R", "Functions for GS5_Cleaned for Pub_V2.R", "Functions for GS10_Cleaned for Pub_V2.R". The output from this code is the two files: GSall.csv which contains the metadata and track summary statistics for each animal, and GSalltracks.csv which contains the raw path information for each animal. More details about each of the columns of these files can be found in the beginning of the markdown document "Main Code.rmd".
Code/Software
All statistical analyses were performed using R (v. 4.2.3., R Core Team 2023)
Files and variables
File: GS1_Videos.zip
Description: Zip folders containing the raw experimental videos for the Group-Size 1 trials. File names are as follows: MMDDYY-GS1-T# for the Month, Day, and Year. The GS stands for group size and the number after designates how many individuals were in the trial. The T stands for trial, and the number after designates the trial number. Any additional parenthetical numbers included in the title are because the videos were broken into parts, and designates the order in which they should appear.
File: GS10_Videos.zip
Description: Zip folders containing the raw experimental videos for the Group-Size 10 trials. File names are as follows: MMDDYY-GS10-T# for the Month, Day, and Year. The GS stands for group size and the number after designates how many individuals were in the trial. The T stands for trial, and the number after designates the trial number. Any additional parenthetical numbers included in the title are because the videos were broken into parts, and designates the order in which they should appear.
File: GS1_Raw_DLTdv8a_output.zip
Description: Zip folders containing both the .csv and .mat output from the digitization of the trial videos for Group-Size 1. For both file types, the number immediately following "GS" in the file name indicates the number of individuals used in the trial, and the number immediately following "T" indicates the trial number. Other letters, or numbers separated by a - or _ have no significant meaning. The .mat files are generated by the MATLAB app DLTdv8a, and can only be opened and read by that software. They contain the digitization information (i.e. X and Y location for each frame, for each tracked individual), and more can be learned about this software and file formatting from https://biomech.web.unc.edu/dltdv/.
There is one trial with two .mats: Trial 1, which is associated with both GS1_T1_dvProject.mat (which was used to generate the .csv used in the analysis and is consistent with the rest of the files in the project), and 073021_GS1_T1_1_dvProject.mat, which was the first file ever digitized and the entire video was digitized, instead of only the first few minutes. This file was not used, but is included for completeness.
Each .csv contains the following, where each row corresponds to one frame of the video file:
- Latency: This column contains the word "Cup" in the row where the cup was lifted, and the word "Start" in the row where the animal began moving
- Prewall_X: The X position of the animal before it hit the wall of the arena
- Prewall_Y: The Y position of the animal before it hit the wall of the arena
- Postwall_X: The X position of the animal after it hit the wall of the arena
- Postwall_Y: The Y position of the animal after it hit the wall of the arena
- Beach_X and Beach_Y: The X and Y location of the beach. The first row is the center of the arena, the second row is the point along the edge of the arena in the direction of the beach
- North_X and North_Y: These variables are not used, but indicate the direction of True North in a manner similar to Beach
- Center_X and Center_Y: The X and Y location of the center of the arena
- Date: The Date (first row only)
- GS: The number of individuals in the trial, or group size (first row only)
- Trial: The trial number (first row only)
- Weather: A character string description of the weather (first row only)
- Temp: The temperature (first row only)
- TOD: The time of day that the trial began (first row only)
- Dig: The person who digitized the trial (first row only)
File: GS5_Raw_DLTdv8a_output.zip
Description: Zip folders containing both the .csv and .mat output from the digitization of the trial videos for Group-Size 5. These files are identical to those in GS1_Raw_DLTdv8a_output.zip except that the csv's contain the following additional columns:
- Cup: This column has the word "Cup" in the row where the cup was lifted
- Start1 - Start5: These columns contain the words "Start" in the row where the animal began moving. They also contain the word "Stop" in the rows where the animal hit the wall of the arena
- Prewall1_X, Prewall1_Y - Prewall5_X, Prewall5_Y: The X and Y positions for all animals before hitting the wall. The # following "Prewall" indicates the animal's numerical ID
File: GS10_Raw_DLTdv8a_output.zip
Description: Zip folders containing both the .csv and .mat output from the digitization of the trial videos for Group-Size 10. These files are identical to those in GS1_Raw_DLTdv8a_output.zip except that the csv's contain the following additional columns:
- Prewall1_X, Prewall1_Y - Prewall10_X, Prewall10_Y: The X and Y positions for all animals before hitting the wall. The # following "Prewall" indicates the animal's numerical ID
- Moves: This column indicates the time at which the animal started moving, where the row number corresponds to the individual in question (i.e. row 1 is the move time for Animal 1)
- WallHit: The time at which the animal hit the wall of the arena, where the row number corresponds to the individual in question (i.e. row 1 is the wall hit for Animal 1)
- Cup: The time the cup was lifted
File: GSall.csv
Description: The raw csv files output from DLTdv8a were compiled and tidied to use for further analysis using the code in the markdown document entitled "Code to Load and Wrangle Data.rmd". This markdown document also sources the following scripts which contain the functions needed to tidy the outputs from DLTdv8a and calculate the variables of interest: "Functions for GS1_Cleaned for Pub_V2.R", "Functions for GS5_Cleaned for Pub_V2.R", "Functions for GS10_Cleaned for Pub_V2.R". The output from this code is the two files: GSall.csv which contains the metadata and track summary statistics for each animal, and GSalltracks.csv which contains the raw path information for each animal. More details about each of the columns of these files can be found in the beginning of the markdown document "Main Code.rmd".
Variables
- GS: The group size
- Trial: The trial number (the NA for GS 1 is Trial 14)
- Center_X: The x coordinates of the center of the pan
- Center_Y: The y coordinates of the center of the pan
- Beach_X: The x coordinates of the spot on the edge of the pan closest to the beach/wrackline
- Beach_Y: The y coordinates of the spot on the edge of the pan closest to the beach/wrackline
- AngleBeach: The angle between the center of the pan and the beach
- Date: Date of the trial in Year-Month-Day.
- Weather: A qualitative descriptor of the weather
- Temp: Temperature at the start of the trial in Fahrenheit
- hour: The hour of the day when the trial began in 24hr
- minutes: The minute of the day when the trial began in 24hr
- Dig: The name of the person who digitized the trial
- Dist_Actual: The sum of the distance traveled between each step (cm)
- Dist_StrtLine: The beeline distance between their starting and ending coordinates (cm)
- Sinuosity: Dist_StrtLine/Dist/Actual. Not an accurate measurement of path wiggliness b/c of non-standard steplengths
- Time: The total number of seconds between when the animal started moving to when they finished the trial
- FinalPosition: The angle between where the animal ended the trial and the center of the arena relative to the wrackline. Renamed to Final Orientation
- HeadingMean: The circular mean of each heading, weighted by the distance traveled relative to the wrackline
- StartHeading: The bearing of their first step, relative to the wrackline. Renamed to InitialBearing
- Velocity: Dist_Actual / Time in cm per s
- Start_X: Their starting X location in the original coordinate system
- Start_Y: Their starting Y location in the original coordinate system
- Start_X.T: Their starting X location transformed to a new coordinate system where the center of the pan is at 0,0, and the angle between the center of the pan and the beach is = 0 (rotated such that the beach is always to the right of all figures)
- Start_Y.T: Their starting Y location transformed to a new coordinate system where the center of the pan is at 0,0, and the angle between the center of the pan and the beach is = 0 (rotated such that the beach is always to the right of all figures)
- StartLocation: Their angle between where the animal started the trial and the center of the arena relative to the wrackline. Not equivalent to StartHeading which is the angle between their first and second step. Renamed to InitialOrientation
- End_X: Their final X location in the original coordinate system
- End_Y: Their final Y location in the original coordinate system
- Latency: The time between when the trial started (the cup was lifted) and when the animal began moving in s. This time is not included in their total time
- Animal: a designator for which animal was being digitized within each trial
- Dist_Diff: Dist_Actual-Dist_StrtLine. Renamed to ExcessDist
- TOD: Time of Day at the beginning of the trial
- Smoky: A designator for the days during which there was a nearby forest fire
- Year: The year the trial was conducted
- Latency2: log(Latency), used for fig S5
- GS.Trial: GS and Trial combined. Used for MEM's
- FinalPosition.Pan: The angle between where the animal ended the trial and the center of the arena relative to the arena. Renamed to FinalOrientation_pan
- FinalPosition.L: A linearized version of the Final position. Renamed to FinalOrientation.L
- StartHeading.L: A linearized version of the Start Heading. Renamed to InitialBearing.L
File: GSalltracks.csv
Description: The raw csv files output from DLTdv8a were compiled and tidied to use for further analysis using the code in the markdown document entitled "Code to Load and Wrangle Data.rmd". This markdown document also sources the following scripts which contain the functions needed to tidy the outputs from DLTdv8a and calculate the variables of interest: "Functions for GS1_Cleaned for Pub_V2.R", "Functions for GS5_Cleaned for Pub_V2.R", "Functions for GS10_Cleaned for Pub_V2.R". The output from this code is the two files: GSall.csv which contains the metadata and track summary statistics for each animal, and GSalltracks.csv which contains the raw path information for each animal. More details about each of the columns of these files can be found in the beginning of the markdown document "Main Code.rmd".
Variables
- X: x of each step in the original coordinate system from the moment the animal starts moving till the trial ends
- Y: y of each step in the original coordinate system from the moment the animal starts moving till the trial ends
- X.Centered: x of each step rel. to CENTER OF ARENA (in this coordinate system, center=0,0 but the beach is still AngleBeach)
- Y.Centered: y of each step rel. to CENTER OF ARENA (in this coordinate system, center=0,0 but the beach is still AngleBeach)
- Dist_Center: distance between each step to CENTER OF ARENA (calculated after centering each x,y)
- AngleCenter: angle between each step and CENTER OF ARENA (calculated after centering each x,y)
- MinPrev_X: x of each step rel. to PREVIOUS STEP (calculated on the original x,y)
- MinPrev_Y: y of each step rel. to PREVIOUS STEP (calculated on the original x,y)
- Dist_Each: distance between EACH STEP (calculated on the original x,y)
- AngleCenter_Transformed: angle between each step and CENTER OF ARENA, transformed such that 0=beach (calculated as AngleCenter - AngleBeach)
- X.Transformed: new x locations transformed (in this coordinate system, center=0,0 and the angle towards the beach is = 0. Calculated as Dist_Center*cos or sin of AngleCenter_Transformed)
- Y.Transformed: new y locations transformed (in this coordinate system, center=0,0 and the angle towards the beach is = 0. Calculated as Dist_Center*cos or sin of AngleCenter_Transformed)
- Heading: the individuals actual bearing, relative to the beach. 0 = towards the beach (calculated using X/Y.Transformed)
- Frame: frame of that particular step
- Trial: Trial number
- Animal: Animal ID
- GS: Group Size
- All other variables are imported from GSall.csv and their description and explanation can be found under that file.
File: Bootstrapped_Underlying_Dist_for_GS5.csv
Description: The data from the non-parametric permutation test (i.e. "bootstrapped data"). This is done to reduce the time associated with re-running the bootstrap. These files are generated in the markdown document "Nonparametric Bootstrap Analysis.rmd". This is the bootstrapped data for group size 5
Variables
- Velocity: The mean of the bootstrapped version of Dist_Actual / Time in cm per s
- Sinuosity: The mean of the bootstrapped version of Dist_StrtLine/Dist/Actual.
- HeadingMean: The mean of the bootstrapped version of the circular mean of each heading, weighted by the distance traveled relative to the wrackline
- Latency: The mean of the bootstrapped version of the time between when the trial started (the cup was lifted) and when the animal began moving in s.
- rho_MeanHeading: The rho.circular of the circular mean of each heading, weighted by the distance traveled relative to the wrackline
- rho_FinalPosition: The rho.circular of the angle between where the animal ended the trial and the center of the arena relative to the wrackline.
- rho_StartHeading: The rho.circular of the bearing of their first step, relative to the wrackline
- Time: The mean of the bootstrapped version of the total number of seconds between when the animal started moving to when they finished the trial
- StartHeading: The circular mean of the bootstrapped version of the bearing of their first step, relative to the wrackline
- FinalPosition: The circular mean of the bootstrapped version of the angle between where the animal ended the trial and the center of the arena relative to the wrackline.
- StartPosition: The circular mean of the bootstrapped version of their angle between where the animal started the trial and the center of the arena relative to the wrackline
- rho_StartPosition: The rho.circular of their angle between where the animal started the trial and the center of the arena relative to the wrackline
- Dist_Actual: The mean of the bootstrapped version of the sum of the distance traveled between each step (cm)
- Dist_StrtLine: The mean of the bootstrapped version of the beeline distance between their starting and ending coordinates (cm)
- Dist_Diff: The mean of the bootstrapped version of Dist_Actual-Dist_StrtLine
File: Bootstrapped_Underlying_Dist_for_GS10.csv
Description: The data from the non-parametric permutation test (i.e. "bootstrapped data"). This is done to reduce the time associated with re-running the bootstrap. These files are generated in the markdown document "Nonparametric Bootstrap Analysis.rmd". This is the bootstrapped data for group size 10
Variables
These are the same as described for the file Bootstrapped_Underlying_Dist_for_GS5.csv
File: Averaged_GS1_Values.csv
Description: The data from the non-parametric permutation test (i.e. "bootstrapped data"). This is done to reduce the time associated with re-running the bootstrap. These files are generated in the markdown document "Nonparametric Bootstrap Analysis.rmd". This is the averaged data for group size 1
Variables
These are the same as described for the file Bootstrapped_Underlying_Dist_for_GS10.csv but instead of being the mean/rho over a bootstrapped version of the data, it is just the mean or rho over all GS 1 trials, with no bootstrapping performed.
Experiments were conducted between July - August, in 2021 and 2023. Animals were collected from a 50 meter stretch of sandy beach in Seattle, WA (47.66, -122.43). All animals were collected within ~20 minutes of each other and used in experiments within 30 minutes of collection. The trials were conducted between 8:00 am and 1:00 pm while air temperatures ranged from 14°C to 26°C. The arenas consisted of a circular, clear plastic dish (diameter 27 cm, depth of 3.1 cm), held 43 cm over the ground by PVC pipes. The cameras used to film all trials were the Apexcam M80 Air Action Camera. The temperature was recorded at the beginning of each trial by using the My Location section of the iOS Weather app (iPhone 14, iOS version 16.5.1). Cloud cover was categorized as either “cloudy” or “sunny” where cloudy days were defined as days where more than 50% of the sky had cloud cover and the sun was obscured. In addition, there was a nearby forest fire from August 14th-17th 2021, so trials conducted during those four days were classified as “cloudy” due to the poor air quality and low visibility.
Animals were released in experimental arenas either alone (hereafter “GS1” or “individuals”), or in groups of size five (“GS5”) or ten (“GS10”).
The arenas consisted of a circular, clear plastic dish (27 cm diameter), surrounded by white, opaque paper to obscure any landscape cues along the horizon. At the beginning of each trial, the arenas were leveled and rotated randomly relative to the beach. A camera underneath the arena filmed the animals’ movement through the bottom of the dish. The camera was always placed in the same position at the center of the arena and in the same orientation relative to the arena to identify whether the sandhoppers were orientating to any features of the experimental set up itself. Before the animals were introduced to the arena, the direction of the shortest distance to the wrack line was indicated to the camera, so that the animal’s movement relative to the beach could be calculated.
The arenas were placed two meters landward of the wrack line to motivate the amphipods to move seawards. On macrotidal beaches, such as the study site, talitrid amphipods rarely travel above the wrack line- this zone has the same risks of predation and desiccation as below the wrack line but with few of the potential benefits from foraging. The amphipods would likely only enter this area through passive displacement, such as a storm, or a navigational mistake. Therefore, talitrid amphipods who find themselves landward of the wrack line are strongly motivated to return to the optimal burrowing zone.
Before each trial began, the animals were allowed to acclimate to the arena for two minutes in a transparent cup (6.5 cm diameter) placed at the center of the arena. After that time had elapsed, the cup was lifted from a randomized direction, and the arena was covered with a translucent lid.
A total of 119 trials (n=35 for GS1, n=38 for GS5, n=46 for GS10) were digitized using the software DLTdv8a. The path of each animal was digitized from the lifting of the cup to the moment the animal reached the arena’s edge:
The Time at which the Trial Started: The lifting of the acclimation cup was the start of the trial. It was determined that the cup was “lifted” when there was a significant shift in cup location between frames and/or it was clear that individuals were able to travel under the rim of the cup. Occasionally the exact frame where the cup was lifted was difficult to determine, so a general error of + 3 frames (0.1 seconds) should be assumed for all calculations.
The Time at which the Animal Started Moving: The start of movement was determined as the moment when an individual began to clearly move in relation to the arena. For example, slight jostling of the arena itself, especially while placing the lid after lifting the cup, did not count as movement of an individual. In cases where an individual was bumped by the lifting of the cup or jumped once at the very start of the trial and then proceeded to hold still for more than ~8 frames, this was not deemed the start of their movement and the start was determined to be their next movement. Occasionally the exact frame where movement began was difficult to determine, so a general error of + 3 frames (0.1 seconds) should be assumed for all calculations.
Digitization Location: During digitization, the digitizer always clicked on the same part of the individual’s body: the center of the head when visible. If the center of the head could not be determined, the next closest possible location was selected. In general, human accuracy during digitization should be assumed to be on the order of ~0.1mm (which was the average distance found between repeated attempts to select the center of the arena).
The Time at which the Trial Ended: The trial ended whenever any part of an individual’s body touched the edge of the arena or could have conceivably touched the arena. If an individual passed behind a column, it was decided that the animal would be counted as having reached the edge on the frame where they first were lost from view of the camera. Occasionally the exact frame where the animal reached the edge was difficult to determine, so a general error of + 3 frames (0.1 seconds) should be assumed for all calculations.
Interpolating Movement if the Animal was not Visible: Occasionally animals would be lost from view during digitization, for example, due to movement that was too fast to be captured by the camera. Whenever an animal was not visible, their movement during this period was interpolated to be constant, straight line movement between the bounding visible locations using the function na.approx from the package zoo (version 1.8.12).
Direction of the Wrack Line and Data Transformation: The direction of the wrack line was indicated to the camera by placing an index card into the arena with an arrow pointing towards the shortest path to the wrack line. This was digitized by selecting a point along the edge of the arena that was along the line of the indicative arrow. From there, all data was translated such that the center of the arena was at x=0, y=0, and rotated such that the direction of the wrack line was at an angle of 0o compared to the center of the arena.
After an outlier analysis, animals that took longer than 18 s to complete the trial were excluded from all further analyses. Only trials on clear weather conditions were used in the permutation analysis.
