# R scripts and data files associated with Mattingly et al.'s analysis of language use in the invasion biology literature. ### `Extract_word_frequencies.Rmd` - script to parse text files to calculate frequency of a given word. As an example, loops through `Text files` for frequency of "invasive" # Available on request: .txt files of the 202 papers included in the study (to run `Extract_word_frequencies.Rmd`, or you can download your own article .txt files from online databases) ### `GallardoEffectSizes.csv` - Gallardo et al. (2016) data on 150 papers, including effect sizes and additional variables collected for Mattingly et al. ### `newdata.csv` - update of Gallardo et al. by Mattingly et al., extracted effect sizes and other variables for 52 papers ### `Effect_size_aggregation.Rmd` - script for aggregating multiple effect sizes for the same focal species from a single paper ### `mattinglyetal_data.csv` - merged data for all 202 papers, with aggregated effect sizes ### `mattinglyetal_data_permanova.csv` - abbreviated version of data file with only one row per paper, to be used in PERMANOVA calculations. ### `Word_frequency_ordination.Rmd` - script to perform ordination of multiple word frequency values across papers and example of PERMANOVA. ### `Random_forest_and_regression tree.Rmd` - script for calculating random forests that predict word use based on predictors (including effect size and contextual factors) and, from that random forest, a regression tree that shows the direction/splits of predictors ### `Resampled_random_forests.Rmd` - script for iteratively calculating random forests resampled to even out an imbalanced factor, to give confidence intervals around importance values # Available on request: .csv file outputs from our regular and resampled random forest runs