# Data from: Evidence for general size-by-habitat rules in actinopterygian fishes across nine scales of observation

## Cite this dataset

Clarke, John (2021). Data from: Evidence for general size-by-habitat rules in actinopterygian fishes across nine scales of observation [Dataset]. Dryad. https://doi.org/10.5061/dryad.tb2rbnzxs

## Abstract

Identifying environmental predictors of phenotype is fundamentally important to many ecological questions, from revealing broadscale ecological processes to predicting extinction risk. However, establishing robust environment—phenotype relationships is challenging, as powerful case studies require diverse clades which repeatedly undergo environmental transitions at multiple taxonomic scales. Actinopterygian fishes, with 32000+ species, fulfil these criteria for the fundamental habitat divisions in water. With four datasets of body size (ranging 10905–27226 species), I reveal highly consistent size-by-habitat-use patterns across nine scales of observation. Taxa in marine, marine-brackish, euryhaline and freshwater-brackish habitats possess larger mean sizes than freshwater relatives, and the largest mean sizes consistently emerge within marine-brackish and euryhaline taxa. These findings align with the predictions of seven mechanisms thought to drive larger size by promoting additional trophic levels. However, mismatches between size and trophic-level patterns highlight a role for additional mechanisms, and support for viable candidates is examined in 3439 comparisons.

## Usage notes

Note: Many of these files also appear as supplementary files on the journal website. This provides an opportunity to provide all files associated with the paper in one place, alongside expanded descriptions of all files so that they are easier to navigate.

## SI Text

Supplementary methods, results, and discussion.

* SI Text Clarke 2021.pdf

## SI Figures S1-S15

All 15 SI figures with captions.

* SI Figures S1-S15 Clarke 2021.pdf

**Fig. S1: **Size distributions (log10 scale) for taxa in each habitat use across four datasets: (a) ‘FB11k dataset’; (b) ‘CoF11k dataset’; (c) ‘FB31k dataset’; (d) ‘CoF31k dataset’.

**Fig. S2: **Corresponding plot to main text Fig. 1 using FishBase 31k tree dataset.

**Fig. S3: **The percentage of groups where the phylogenetic mean size of taxa for one habitat use is larger than the other, obtained for every pairwise habitat-use comparison within all four datasets (FB11k, CoF11k, FB31k and CoF31k tree datasets).

**Fig. S4:**** **The percentage of groups where the observed log10 mean size of taxa for one habitat use is larger than the other, obtained for every pairwise habitat-use comparison within all four datasets (FB11k, CoF11k, FB31k and CoF31k tree datasets).

**Fig. S5: **The percentage of groups where the size variance of taxa within one habitat-use category is greater than the other, obtained for every pairwise habitat-use comparison CoF31k tree dataset. Three different ways of comparing size variance are assessed in panels (a), (b) and (c).

**Fig. S6: **The relationship between the magnitude of body size difference between two habitats (measured as phylogenetic effect size) and the magnitude of trophic level difference between two habitats (measured as phylogenetic effect size) for all ten pairwise habitat comparisons conducted in the study.

**Fig. S7: **The relationship between the magnitude of body size difference between two habitats (measured as phylogenetic effect size) and the magnitude of mean branch length duration difference between two habitats for all ten pairwise habitat comparisons conducted in the study.

**Fig. S8: **The relationship between the magnitude of body size difference between two habitats (measured as phylogenetic effect size) and the magnitude of log10 richness difference between two habitats for all ten pairwise habitat comparisons conducted in the study.

**Fig. S9:** Size distributions (log scale) for fossil taxa in each fossil habitat type, using data from Clarke et al. 2016 and Clarke & Friedman 2018.

**Fig. S10: **Size distributions (log10 scale) of taxa with maximum length and common length measures in each habitat-use across eight datasets. See SI text for details on how these datasets were derived and compared.

**Fig. S11: **The corresponding statistical values and clade information for Fig. S5c. For each pairwise habitat-use comparison across multiple taxonomic scales, this indicates the number of times each habitat-use possesses taxa with the largest size variance (relative to simulations) at probabilities of < 0.1 and < 0.05. Dark shades of each colour represent p < 0.05, lighter shades p = 0.1–0.05, and grey p > 0.1. Data from CoF31k tree dataset.

**Fig. S12: **The corresponding statistical values and clade information for Fig. S13b. For each pairwise habitat-use comparison across multiple taxonomic scales, this indicates the number of times each habitat use possesses taxa with the larger phylogenetic mean size at probabilities of < 0.1 and < 0.05 using PGLS ANOVA. Dark shades of each colour represent p < 0.05, lighter shades p = 0.1–0.05, and grey p > 0.1. Data from CoF31k tree dataset.

**Fig. S13: **(a) For every pairwise habitat-use comparison at the taxonomic scale of order, this indicates the percentage of orders in which raw and phylogenetic means are larger for one habitat-use than the other, and whether any of these size differences occurred with probabilities of < 0.1 or < 0.05 according to three statistical tests. (b) For each pairwise habitat-use comparison across multiple taxonomic scales, this indicates the number of times each habitat-use possesses taxa with the larger phylogenetic mean size at probabilities of < 0.1 and < 0.05 using PGLS ANOVA. Dark shades of each colour represent p < 0.05, lighter shades p = 0.1–0.05, and grey p > 0.1. For definitions of taxonomic scales, see methods.

**Fig. S14:** The corresponding statistical values and clade information for Fig. S13a. For every pairwise habitat-use comparison at the taxonomic scale of order, this indicates the number of orders in which raw and phylogenetic means are larger for one habitat-use than the other, and whether any of these size differences occurred with probabilities of < 0.1 or < 0.05 according to three statistical tests. Dark shades of each colour represent p < 0.05, lighter shades p = 0.1–0.05, and grey p > 0.1. Data from CoF31k tree dataset.

**Fig. S15:** The corresponding statistical values and clade information for Fig. 2a. The numbers of groups where the phylogenetic mean size of taxa for one habitat-use is larger than the other, obtained for every pairwise habitat-use comparison across multiple taxonomic scales. Data from CoF31k tree dataset.

## SI Tables S1-S6

All 6 SI tables with captions.

* Tables S1-S4 Clarke 2021.pdf

* Table S5 Clarke 2021.xlsx

* Table S6 Clarke 2021.xlsx

**Table S1: **List of mechanisms discussed in the main text that are proposed to explain the size-by-habitat patterns.

**Table S2: **The percentage of clades (Tax3 scale) in which each pair of metrics, from the nine metrics compared between habitats, were aligned. For example, if comparing size and richness outcomes for euryhaline vs. freshwater comparisons (top row, output in red), the percentage of alignments will equal the percentage of clades in which, relative to the total number of Tax3 clades in which the two habitat types could be compared, euryhaline taxa possessed either i) the smaller mean size and lower species richness, or ii) the larger mean size and higher species richness. Cumulatively, these two outcomes occurred in 18.2% of comparisons. I commonly refer to these as ‘percentage alignments’ of discrete outcomes.

**Table S3: **Numbers and percentages of migratory taxa within each habitat-use type across the four datasets. Illustrates relatively high percentages of migratory taxa within the euryhaline category.

**Table S4: **Numbers and percentages of migratory taxa in every order and habitat subdivision for the CoF 31k-tree dataset.

**Table S5:** A summary of support for various suites of mechanisms (defined A through E) presented in Table S6 for all comparisons in each of the four datasets analysed (FB11k, CoF11k, FB31k, CoF 31k). Text cites CoF31k summary percentages.

**Table S6: **Indication of whether various suites of mechanisms (defined A through E) can, or cannot be supported for every individual comparison performed in this study. 1 indicates support. A list is provided for each of the four datasets analysed (FB11k, CoF11k, FB31k, CoF 31k). Text cites CoF31k outcomes.

## Appendices 1-17

All 17 Appendices with captions in a separate pdf (* Appendix Legends.pdf) with shortened captions below.

* Appendix.1.xlsx

* Appendix 2 - FB 11k Size.pdf

* Appendix 3 - CoF 11k Size.pdf

* Appendix 4 - FB 31k Size.pdf

* Appendix 5 - CoF 31k Size.pdf

* Appendix 6 - FB 11k tSize.pdf

* Appendix 7 - CoF 11k tSize.pdf

* Appendix 8 - FB 31k tSize.pdf

* Appendix 9 - CoF 31k tSize.pdf

* Appendix 10 - FB 11k Troph.pdf

*Appendix 11 - CoF 11k Troph.pdf

* Appendix 12 - FB 31k Troph.pdf

* Appendix 13 - CoF 31k Troph.pdf

* Appendix 14 - FB 11k Var.pdf

* Appendix 15 - CoF 11k Var.pdf

* Appendix 16 - FB 31k Var.pdf

* Appendix 17 - CoF 31k Var.pdf

**Appendix 1:** The percentage of clades in which each pair of metrics, from the nine metrics compared between habitats, were aligned (e.g. the percentage of orders where euryhaline taxa possessed the smallest mean size and lower species richness, compared to freshwater relatives. An order where euryhaline taxa possessed the larger mean size and higher species richness also represents an alignment). I commonly refer to these as ‘percentage alignments’ of discrete outcomes. Clades whose metrics are aligned for a given habitat comparison fall within the white quadrants of Figure 4, while mismatched outcomes fall within grey quadrants.

**Appendices 2 to 17 **display all individual comparisons of size, trophic level, and size variance performed in the study. Grid cells in these plots contain statistical details, so please increase magnification on the pdfs to view these details.

These appendices provide full record of these results, so the reader can find an outcome for their clade of interest, with the data source, phylogeny, and analytical method they prefer.

__Size results (largest possible dataset)__

Comparisons of log10 body size using all taxa for which size data is available. Across the four datasets, the analyses represent a combined total of 5232 pairs of group + habitat comparisons (each of which were compared with five methods): The five methods are: 1. Observed log10 means; 2. Phylogenetic log10 means; 3. Wilcoxon test outcomes; 4. Simulation ANOVA test outcomes; 5. PGLS ANOVA test outcomes.

**Appendix 2:** All analyses pertaining to comparisons of taxon **size** between habitat-use types for the **FishBase 11k-tree dataset.**

**Appendix 3:** All analyses pertaining to comparisons of taxon **size** between habitat-use types for the **Catalogue of Fishes 11k-tree dataset.**

**Appendix 4:** All analyses pertaining to comparisons of taxon **size** between habitat-use types for the **FishBase 31k-tree dataset.**

**Appendix 5:** All analyses pertaining to comparisons of taxon **size** between habitat-use types for the **Catalogue of Fishes 31k-tree dataset.**

__Size results (reduced and retained size + trophic level datasets; see Methods and SI test Methods)__

Comparisons of log10 body size using all taxa in the reduced and retained size + trophic level datasets. Across the four datasets, the analyses represent a combined total of 3439 pairs of group + habitat comparisons (each of which were compared with five methods): The five methods are: 1. Observed log10 means; 2. Phylogenetic log10 means; 3. Wilcoxon test outcomes; 4. Simulation ANOVA test outcomes; 5. PGLS ANOVA test outcomes.

**Appendix 6:** All analyses pertaining to comparisons of taxon **size** (in the reduced and retained size datasets, see Methods) between habitat-use types for th**e FishBase 11k-tree dataset.**

**Appendix 7:** All analyses pertaining to comparisons of taxon **size** (in the reduced and retained size datasets, see Methods) between habitat-use types for the **Catalogue of Fishes 11k-tree dataset.**

**Appendix 8:** All analyses pertaining to comparisons of taxon **size** (in the reduced and retained size datasets, see Methods) between habitat-use types for the **FishBase 31k-tree dataset.**

**Appendix 9:** All analyses pertaining to comparisons of taxon **size** (in the reduced and retained size datasets, see Methods) between habitat-use types for the **Catalogue of Fishes 31k-tree dataset.**

__Trophic level results (reduced and retained size + trophic level datasets; see Methods and SI test Methods)__

Comparisons of log10 trophic level using all taxa in the reduced and retained size + trophic level datasets. Across the four datasets, the analyses represent a combined total of 3439 pairs of group + habitat comparisons (each of which were compared with five methods): The five methods are: 1. Observed log10 means; 2. Phylogenetic log10 means; 3. Wilcoxon test outcomes; 4. Simulation ANOVA test outcomes; 5. PGLS ANOVA test outcomes.

**Appendix 10: **All analyses pertaining to comparisons of taxon **trophic level** between habitat-use types for the **FishBase 11k-tree dataset.**

**Appendix 11:** All analyses pertaining to comparisons of taxon **trophic level** between habitat-use types for the **Catalogue of Fishes 11k-tree dataset.**

**Appendix 12:** All analyses pertaining to comparisons of taxon **trophic level** between habitat-use types for the **FishBase 31k-tree dataset.**

**Appendix 13:** All analyses pertaining to comparisons of taxon **trophic level** between habitat-use types for the **Catalogue of Fishes 31k-tree dataset.**

__Size variance results (largest possible dataset)__

Comparisons of log10 body size variance using all taxa for which size data is available. Across the four datasets, the analyses represent a combined total of 5232 pairs of group + habitat comparisons (each of which were compared with four methods): The four methods are: 1. Observed log10 variance; 2. Expected log10 variance from simulations; 3. Observed variance vs. simulated variance; 4. P values derived from observed variance vs. simulated variance.

**Appendix 14:** All analyses pertaining to comparisons of **size variance** between habitat-use types for the **FishBase 11k-tree dataset.**

**Appendix 15:** All analyses pertaining to comparisons of **size variance** between habitat-use types for the **Catalogue of Fishes 11k-tree dataset.**

**Appendix 16:** All analyses pertaining to comparisons of **size variance** between habitat-use types for the **FishBase 31k-tree dataset.**

**Appendix 17:** All analyses pertaining to comparisons of **size variance** between habitat-use types for the **Catalogue of Fishes 31k-tree dataset.**

## Analysis files (Analysis files.zip)

Input datasets and phylogenies for analyses.

__Datasets:__

**Information regarding the species, size data, and trophic level data used for any specific habitat comparison for any group of taxa compared at any scale of observation can be found in the datasets below.**

**For size analyses drawing data from the largest possible size dataset (i.e. all those conducted in Appendices 2-5 and 14-17) **using scales of observation that concern non-evolutionary hotspot scales of observation (Fam, Ord, Tax 3, Tax 4, Tax 5, Tax 6, Full dataset), the following datasets provide all the neccesary information, depending on your choice of dataset (CoF or FishBase) and tree (11k molecular tree or 31k supertree):

* Rab18tax.Order log10.TL 30K CoF.dataset_SI.data.csv * Rab18tax.Order log10.TL 30K fb.dataset_SI.data.csv

* Rab18tax.Order log10.TL 12Kspec CoF.dataset_SI.data.csv * Rab18tax.Order log10.TL 12Kspec fb.dataset_SI.data.csv

For comparisons of the groups and habitats within hotspot analyses, the analagous information is provided in:

* Hotspots.only log10.TL 30K CoF.dataset_SI.data.csv * Hotspots.only log10.TL 30K fb.dataset_SI.data.csv

* Hotspots.only log10.TL 12Kspec CoF.dataset_SI.data.csv * Hotspots.only log10.TL 12Kspec fb.dataset_SI.data.csv

* Clarke.pot.grps log10.TL 30K CoF.dataset_SI.data.csv * Clarke.pot.grps log10.TL 30K fb.dataset_SI.data.csv

* Clarke.pot.grps log10.TL 12Kspec CoF.dataset_SI.data.csv * Clarke.pot.grps log10.TL 12Kspec fb.dataset_SI.data.csv

**For any analyses drawing data from the reduced and retained size + trophic level datasets (e.g. Appendices 6-13, Figures 2b, 3 and 4 in main text) **using scales of observation that concern non-evolutionary hotspot scales of observation (Fam, Ord, Tax 3, Tax 4, Tax 5, Tax 6, Full dataset), the following datasets provide all the neccesary information, depending on your choice of dataset (CoF or FishBase) and tree (11k molecular tree or 31k supertree):

* Rab18tax.Order log10.TL.ShrdW.troph 30K CoF.dataset_SI.data.csv * Rab18tax.Order log10.TL.ShrdW.troph 30K fb.dataset_SI.data.csv

* Rab18tax.Order log10.TL.ShrdW.troph 12Kspec CoF.dataset_SI.data.csv * Rab18tax.Order log10.TL.ShrdW.troph 12Kspec fb.dataset_SI.data.csv

For comparisons of the groups and habitats within hotspot analyses, the analagous information is provided in:

* Hotspots.only log10.TL.ShrdW.troph 30K CoF.dataset_SI.data.csv * Hotspots.only log10.TL.ShrdW.troph 30K fb.dataset_SI.data.csv

* Hotspots.only log10.TL 12Kspec CoF.dataset_SI.data.csv * Hotspots.only log10.TL 12Kspec fb.dataset_SI.data.csv

* Clarke.pot.grps log10.TL.ShrdW.troph 30K CoF.dataset_SI.data.csv * Clarke.pot.grps log10.TL.ShrdW.troph 30K fb.dataset_SI.data.csv

* Clarke.pot.grps log10.TL 12Kspec CoF.dataset_SI.data.csv * Clarke.pot.grps log10.TL 12Kspec fb.dataset_SI.data.csv

__Phylogenies:__

* actinopt_12k_treePL.tre - The single molecular data derived tree provided in Raboksy et al. 2018.

* actinopt_full.trees.tre - The 100 supertrees provided in Raboksy et al. 2018.

## Access

The paper is available on request from the author.

## Funding

Narodowa Agencja Wymiany Akademickiej, Award: PPN/ULM/2019/1/00248/U/00001

Eesti Teadusagentuur, Award: PRG741