Data from: Dynamics of copy number variation in host races of the pea aphid

Duvaux L, Geissmann Q, Gharbi K, Zhou J, Ferrari J, Smadja CM, Butlin RK

Date Published: September 24, 2014

DOI: http://dx.doi.org/10.5061/dryad.jf29v

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title 01 - Output files from Picard "CalculateHsMetrics"
Downloaded 74 times
Description Raw coverage estimate per subtarget (aka "baits") per individual estimated by the Picard tool "CalculateHsMetrics". For each individual, two files are available. 1) A "*.HsMetrics" file containing the global Hsmetrics information (see http://picard.sourceforge.net/picard-metric-definitions.shtml#HsMetrics). 2) A "*.HsMetrics.Targets" file containing the coverage information per subtarget.
Download RawCoverageCount_PicardCalculateHsMetrics.rar (27.58 Mb)
Details View File Details
Title 02 - R matrix of normalized read count data
Downloaded 8 times
Description Matrix of normalized read count data in R format. Few formating steps have been performed here (see main MBE paper and script "./01_GetData/script.R" in the pipeline). 1) Normalisation of the read counts. 2) Remove targets with more than 5% of reads with PHRED score<10. Note that these data are those BEFORE the squared root and polynomial transformations.
Download NormalizedCoverageData.Rdata (9.641 Mb)
Details View File Details
Title 03 - Squared root and transformed read count data
Downloaded 4 times
Description R list resulting from the script "./02_PreProcessing/script.R" in the pipeline). This list contain several sets of data and information including the matrix of squared root and polynomially transformed read count data (i.e. the data used in the estimation of CNV).
Download 02_PreProcessing.Rdata (37.71 Mb)
Details View File Details
Title 04 - Matrix of raw alpha values
Downloaded 11 times
Description Matrix of raw alpha values (i.e. CNV estimates) obtained from the function "findOptimalSegmentations" (package "optimalCaptureSegmentation"). The values are not rounded here.
Download RawAlphaMatrix_OptimalSegmentation.R (409.2 Kb)
Details View File Details
Title 05 - Data for and results of GLMM1
Downloaded 9 times
Description R file containing the data frame and GLMM results of the first GLMM of the paper. Response variable: CNV presence/absence - a gene was considered to show CNV in a race if at least one of its subtargets presented a CN variant (i.e. CN≠1X) in at least one individual of this race.
Download 14_Binom_Polym_Div-Rel.Rdata (531.3 Kb)
Details View File Details
Title 06 - Data for and results of GLMM2
Downloaded 4 times
Description R file containing the data frame and GLMM results of the second GLMM of the paper. Response variable: complete duplication or deletion (CDD) vs partial duplication/deletion - a gene was considered CDD if all of its subtargets showed CN variants in at least one individual in the race.
Download 15_Binom_Dup-CpDup_Div-Rel.Rdata (120.8 Kb)
Details View File Details
Title 07 - Data for and results of GLMM3
Downloaded 6 times
Description R file containing the data frame and GLMM results of the third GLMM of the paper. Response variable: CNV frequency - the proportion of individuals in a race with CN≠1X.
Download 16_Binom_fqcy_Div-Rel.Rdata (125.7 Kb)
Details View File Details
Title 09 - NJ tree of CNV per capture pools and sequencing lanes
Downloaded 60 times
Description NJ tree based on CNV data (data used: square rooted and polynomial transformed read counts, alpha rounded to the closest half unit). Capture pools and sequencing lanes are shown. The distance matrix was computed using usual Euclidean distance (by opposition to the distance matrix obtained from the Random Forest analysis).
Download Res_CNV-NJ_PoolsAndLanes.pdf (79.81 Kb)
Details View File Details
Title 08 - RandomForest results and gene family importance
Downloaded 3 times
Description R file contatining all the information concerning the RF analysis and the estimation of gene family importance in discriminating host races. Although this piece of information has been realease for transparency, beware that the R files contain many objects and is quite difficult to investigate. Please refer to the script "./08_GeneFamImptceTest/script.R" or contact the main authors for more information.
Download 08_GeneFamImptceTest.Rdata (16.14 Mb)
Details View File Details
Title 10 - ExtendedFigure1_CNValongCHR_AllTargetsIncluded
Downloaded 58 times
Description Extended Figure 1. CNV along chromosomes for all targeted loci. Each line represents an individual (one colour per race plus Medicago standard in purple). For clarity, values of alpha represented here are those before rounding to the closest half unit (the red box represents the area in which alpha values were rounded to one). Vertical light grey shaded areas represent targets as originally designed whereas bottom dark grey and gold boxes represent subtargets excluded and retained for final analyses, respectively. Retained subtargets are linked by full lines when from the same target and by dotted lines where not. Gene names (and scaffold numbers) are indicated above each plot. Control genes are shown with their alias names. The “P” in gene names stands for pseudogene. The “*” symbol indicates genes partially represented due to the absence of targets upstream or downstream filtered out during cleaning steps.
Download Link to FigS2_CNV_along_CHR.pdf (9.617 Mb)
Details View File Details

When using this data, please cite the original publication:

Duvaux L, Geissmann Q, Gharbi K, Zhou J-J, Ferrari J, Smadja CM, Butlin RK (2015) Dynamics of copy number variation in host races of the pea aphid. Molecular Biology and Evolution 32(1): 63-80. http://dx.doi.org/10.1093/molbev/msu266

Additionally, please cite the Dryad data package:

Duvaux L, Geissmann Q, Gharbi K, Zhou J, Ferrari J, Smadja CM, Butlin RK (2014) Data from: Dynamics of copy number variation in host races of the pea aphid. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.jf29v
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: