Data from: Comparison of pooled semen insemination and single colony insemination as sustainable honeybee breeding strategies
Data files
Jan 22, 2024 version files 339.83 KB
-
BeeSim_and_data.zip
-
README.md
Abstract
Instrumental insemination of honeybees allows for two opposing breeding strategies. In single colony insemination (SCI), all drones to inseminate a queen are taken from one colony. In pooled semen insemination (PSI), sperm of many genetically diverse drones is mixed and queens are fertilised from the resulting drone pool. While SCI allows for maximum pedigree control, proponents of PSI claim to reduce inbreeding and maintain genetic variance. Using stochastic simulation studies, we compared genetic progress and inbreeding rates in small honeybee populations under SCI and PSI. Four different selection criteria were covered: estimated breeding values (EBV), phenotypes, true breeding values (TBV), and random selection. Under EBV-based truncation selection, SCI yielded 9.0% to 44.4% higher genetic gain than PSI, but had vastly increased inbreeding rates. Under phenotypical or TBV selection, the gap between SCI and PSI in terms of genetic progress narrowed. Throughout, PSI yielded lower inbreeding rates than SCI, but the differences were only substantial under EBV truncation selection. As a result, PSI did not appear as a viable breeding strategy due to its incompatibility with modern methods of genetic evaluation. Instead, SCI is to be preferred but instead of strict truncation selection, strategies to avoid inbreeding need to be installed.
README: Comparison of pooled semen insemination and single colony insemination as sustainable honeybee
https://doi.org/10.5061/dryad.stqjq2c8t
The dataset comprises the source code of the stochastic simulation program BeeSim that was used in the article. Along with the program comes an R-script that produces the necessary parameter files and a results file that contains the concrete simulation results presented in the article.
Description of the data and file structure
The zipped folder BeeSim_and_data.zip contains four elements.
- The folder src with the source code for the version of the Program BeeSim that was used in the article.
- A makefile to compile the code
- An R-script named par_files.r producing the parameter files that were used for the simulations of the article
- A csv-file named results.csv containing the simulation results that were used to write the article.
Description of the source code
The source code of the program is distributed over nine files that are contained in the folder src. It is written in the programming language C.
- src/CBeeSim_main_mt.c contains the
main
function. The functions specified in this file orchestrate the broad sequence of events. I.e., reading of option settings, reading of parameter files, simulation of the breeding setup, freeing of the alloc'd space. - src/CBeeSim_mt.c specifies the functions that are used for the actual simulation of individual breeding years. For the specified number of years, new queens are born, get mated, undergo a performance test, and are selected to reproduce.
- src/CBeeSim_mt.h is the common header file for all functions defined in files of the folder src.
- src/gauss.c is a small file containing a few general functions that help in the generation of random variavles of the right type. For example, it contains a function that returns a random value following a Gaussian distribution. Hence the name.
- src/initializations_mt.c contains functions that initialize important large abstract objects whose specifications are not known at compile time. For example a relationship matrix of the right size has to be created.
- src/lazy.c contains small helper functions that reduce writing effort. For example the function
xmalloc
allocates space withmalloc
and aborts the program if the allocation failed. - nullstructs_mt.c defines for the smaller structs used in the program, what their null setting should be.
- src/read_from_parameter_file.c specifies the functions that are used to read the simulation parameters from the parameter files prior to the actual simulation of the breeding process.
- src/write_to_files.c defines the functions that are used to write the output files.
Compilation
The makefile allows to compile the source code. Writing the command "make" in the command line will trigger the creation of a folder bin containing the executable file bin/BeeSim. This has only been tested under Ubuntu.
Use of the program
Generally, the program is called with a command of the form
bin/BeeSim [OPTIONS] [PARAMETER FILE]
The [OPTIONS]
are declared by a minus sign (-) followed by a letter.
Relevant examples for such options are
-
-a
invokes printing of all relevant simulation results to files -
-F
triggers the use of a Finite Locus Model for the honeybees' genetics (instead of the default Infinitesimal Model) -
-p
triggers phenotypical selection instead of the default selection for BLUP breeding values.
If the program is called with option -h
, a full list of possible options is shown. The simulations of the article used the options -a
and -F
.
The parameter file is used to relay numerical parameters to the simulations. For example, under the heading YEARS_TO_BE_SIMULATED
it is to be specified, over how many years the simulated breeding experiment should expand. REPETITIONS
specifies, how often it is to be repeated, etc. Some parameter settings are outsourced to more specialized parameter files. For example, the heading MATING_CONTROL
expects a path to another file which then specifies the parameters relating to mating control (such as PERCENTAGE_OF_CONTROLLED_MATING_BREEDING_PER_YEAR
, specifying which percentage of breeding queens is mating in a controlled fashion each year).
The file par_files.r contains R code that was used to create the parameter files used in the article. Calling
Rscript --vanilla par_files.r
triggers the creation of 128 folders containing the parameter files used for simulating the different breeding schemes of the article. Own parameter sets can be created by using one of these 128 sets of parameters and changing them according to one's needs. The headings used in the parameter files are generally self-explaining.
An earlier version of BeeSim has previously been published on Dryad (DOI: 10.5061/dryad.1nh544n) along with extensive documentation on how to use the program. There also all the individual headings are explained in detail.
To run the code, the user will further have to install the third-party programs BLUPF90 (Link) and PInCo (Link) (and specify their locations in the parameter files under the respective headings BLUPF90
and INVERSE_RELATIONSHIPS
).
Results of the study
The file results.csv contains the results of the simulations used for the article. It consists of a table with 11 columns. The first seven columns specify simulation settings while columns 8 to 11 contain simulation results. All simulation results are averaged over the 100 repetitions per setting.
- The column "birth.year" refers to the age cohort. As 70 years were simulated, it ranges from 0 to 69. So, results in a row with birth.year value t refer to queens that were born in year t.
- The column "fullsibs" takes on the values 5 and 10. Results in a row with fullsibs value n refer to breeding schemes where selected queens produced n offspring queens each.
- The column "dpq.in.population" takes on the values 40 and 80. Results in a row with dpq.in.population value n refer to breeding schemes where there were n drone producing queens (DPQ) selected per year.
- The column "no.of.iis" takes on the values 1 and 8. Results in a row with no.of.iis value n refer to breeding schemes where there were n instrumental insemination stations per year.
- The column "insemination" takes on the values "indiv" and "mixed". Here, "indiv" refers to single colony insemination and "mixed" refers to pooled semen insemination.
- The column "correlation" takes on the values "weak" and "strong". It signified how (maternal) queen effects and (direct) worker effects of the selection trait are correlated. Here, "weak" corresponds to the value rQW=-0.18 and "strong" corresponds to the value rQW=-0.53
- The column "selection" indicates the selection criterion. It takes on the values "blup", "geno", "pheno", and "rand". "blup" indicates selection for best linear unbiased prediction (BLUP) estimated breeding values, "geno" indicates selection for true breeding values (i.e. for the genotype), "pheno" indicates selection for phenotypes and "rand" indicates random selection.
- The column "mean.breeding.value" indicates the average breeding values of queens born in the specified years in the respective breeding programs defined by columns 2 to 7.
- The column "mean.inbreeding" indicates the average inbreeding coefficients of queens born in the specified years in the respective breeding programs defined by columns 2 to 7.
- The column "genetic.variance.ic" indicates the variance of true breeding values in the inheritance criterion (IC, sum of queen's direct and maternal TBV) of queens born in the specified years in the respective breeding programs defined by columns 2 to 7.
- The column "paternal.generation.interval" takes on values between 2 and 3. It indicates how old the DPQ were on average at the time when their drones were used for inseminations. The values are takes as averages over all queens born in the specified years in the respective breeding programs defined by columns 2 to 7. Queens that were born in years 0 and 1 form the base population. They are simulated without parents. Thus, for them no paternal.generation.interval could be calculated and they received the value "NA".
Methods
The zipped folder BeeSim_and_data.zip contains four elements.
- The folder "src" witht the source code (written in C) for the version of the Program BeeSim that was used in the article.
- A makefile to compile the code
- An R-script named "par_files.r" producing the parameter files that were used for the simulations of the article
- A csv-file named "results.csv" containing the simulation results that were used to write the article.
An earlier version of BeeSim has previously been published on Dryad (DOI: 10.5061/dryad.1nh544n) along with extensive documentation on how to use the program. To run the code, the user will further have to install the third-party programs BLUPF90 (Link) and PInCo (Link).
The file "results.csv" contains a table with 11 colums. The first seven colums specify simulation settings while colums 8 to 11 contain simulation results. All simulation results are averaged over the 100 repetitions per setting.
- The column "birth.year" refers to the age cohort. As 70 years were simulated, it ranges from 0 to 69. So, results in a row with birth.year value t refer to queens that were born in year t.
- The column "fullsibs" takes on the values 5 and 10. Results in a row with fullsibs value n refer to breeding schemes where selected queens produced n offspring queens each.
- The column "dpq.in.population" takes on the values 40 and 80. Results in a row with dpq.in.population value n refer to breeding schemes where there were n drone producing queens (DPQ) selected per year.
- The column "no.of.iis" takes on the values 1 and 8. Results in a row with no.of.iis value n refer to breeding schemes where there were n instrumental insemination stations per year.
- The column "insemination" takes on the values "indiv" and "mixed". Here, "indiv" refers to single colony insemination and "mixed" refers to pooled semen insemination.
- The column "correlation" takes on the values "weak" and "strong". It signified how (maternal) queen effects and (direct) worker effects of the selection trait are correlated. Here, "weak" corresponds to the value rQW=-0.18 and "strong" corresponds to the value rQW=-0.53
- The column "selection" indicates the selection criterion. It takes on the values "blup", "geno", "pheno", and "rand". "blup" indicates selection for best linear unbiased prediction (BLUP) estimated breeding values, "geno" indicates selection for true breeding values (i.e. for the genotype), "pheno" indicates selection for phenotypes and "rand" indicates random selection.
- The column "mean.breeding.value" indicates the average breeding values of queens born in the specified years in the respective breeding programs defined by columns 2 to 7.
- The column "mean.inbreeding" indicates the average inbreeding coefficients of queens born in the specified years in the respective breeding programs defined by columns 2 to 7.
- The column "genetic.variance.ic" indicates the variance of true breeding values in the inheritance criterion (IC, sum of queen's direct and maternal TBV) of queens born in the specified years in the respective breeding programs defined by columns 2 to 7.
- The column "paternal.generation.interval" takes on values between 2 and 3. It indicates how old the DPQ were on average at the time when their drones were used for inseminations. The values are takes as averages over all queens born in the specified years in the respective breeding programs defined by columns 2 to 7.