A novel approach for pollen identification and quantification using hybrid capture-based DNA metabarcoding
Data files
May 07, 2025 version files 8.02 GB
-
Poll_713_M7a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
115.90 MB
-
Poll_713_M7a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
117.70 MB
-
Poll_714_M7b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
96.25 MB
-
Poll_714_M7b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
96.98 MB
-
Poll_715_M7c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
24.75 MB
-
Poll_715_M7c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
24.23 MB
-
Poll_716_M8a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
287.95 KB
-
Poll_716_M8a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
289.90 KB
-
Poll_717_M8b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
151.05 MB
-
Poll_717_M8b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
147.68 MB
-
Poll_718_M8c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
52.40 MB
-
Poll_718_M8c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
50.87 MB
-
Poll_719_M9a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
135.29 MB
-
Poll_719_M9a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
133.23 MB
-
Poll_720_M9b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
64.76 MB
-
Poll_720_M9b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
65.72 MB
-
Poll_721_M9c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
60.76 MB
-
Poll_721_M9c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
60.67 MB
-
Poll_722_M10a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
36.31 MB
-
Poll_722_M10a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
37.60 MB
-
Poll_723_M10b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
20.13 MB
-
Poll_723_M10b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
20.96 MB
-
Poll_724_M10c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
51 MB
-
Poll_724_M10c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
51.50 MB
-
Poll_725_M11a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
167.76 MB
-
Poll_725_M11a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
164.30 MB
-
Poll_726_M11b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
112.11 MB
-
Poll_726_M11b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
110.13 MB
-
Poll_727_M11c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
99.41 MB
-
Poll_727_M11c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
101.24 MB
-
Poll_728_M12a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
50.70 MB
-
Poll_728_M12a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
52.61 MB
-
Poll_729_M12b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
42.43 MB
-
Poll_729_M12b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
43.28 MB
-
Poll_730_M12c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
29.92 MB
-
Poll_730_M12c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
30.76 MB
-
Poll_731_M13a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
286.39 MB
-
Poll_731_M13a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
283.25 MB
-
Poll_732_M13b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
338.31 MB
-
Poll_732_M13b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
333.80 MB
-
Poll_733_M13c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
328.66 MB
-
Poll_733_M13c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
330.72 MB
-
Poll_734_M14a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
295.06 MB
-
Poll_734_M14a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
289.52 MB
-
Poll_735_M14b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
47.84 MB
-
Poll_735_M14b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
49.27 MB
-
Poll_736_M14c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
474.45 MB
-
Poll_736_M14c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
457.63 MB
-
Poll_737_M15a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
33.19 MB
-
Poll_737_M15a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
34.19 MB
-
Poll_738_M15b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
11.59 MB
-
Poll_738_M15b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
11.46 MB
-
Poll_739_M15c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
2.87 MB
-
Poll_739_M15c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
2.92 MB
-
Poll_740_M16a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
23.41 MB
-
Poll_740_M16a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
23.42 MB
-
Poll_741_M16b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
1.56 MB
-
Poll_741_M16b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
1.57 MB
-
Poll_742_M16c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
4.31 MB
-
Poll_742_M16c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
4.27 MB
-
Poll_743_M17a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
2.73 MB
-
Poll_743_M17a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
2.79 MB
-
Poll_744_M17b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
3.47 MB
-
Poll_744_M17b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
3.52 MB
-
Poll_745_M17c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
5.74 MB
-
Poll_745_M17c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
5.77 MB
-
Poll_746_M18a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
32.81 MB
-
Poll_746_M18a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
32.05 MB
-
Poll_747_M18b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
40.72 MB
-
Poll_747_M18b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
39.88 MB
-
Poll_748_M18c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
57.72 MB
-
Poll_748_M18c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
58.72 MB
-
Poll_749_M19a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
280.85 MB
-
Poll_749_M19a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
279.26 MB
-
Poll_750_M19b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
48.91 MB
-
Poll_750_M19b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
49.76 MB
-
Poll_751_M19c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
102.24 MB
-
Poll_751_M19c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
103.70 MB
-
Poll_752_M20a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
88.99 MB
-
Poll_752_M20a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
87.11 MB
-
Poll_753_M20b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
29.05 MB
-
Poll_753_M20b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
29.60 MB
-
Poll_754_M20c_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
171.81 MB
-
Poll_754_M20c_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
172.46 MB
-
Poll_760_EBC2_Blank_R1_001.fastq.gz
447.30 KB
-
Poll_760_EBC2_Blank_R2_001.fastq.gz
464.26 KB
-
README.md
11.15 KB
Abstract
Efforts to explore optimal molecular methods for identifying plant mixtures, particularly pollen, are increasing. Pollen identification (ID) and quantification is important in many fields, including pollination ecology and agricultural sciences, but quantifying mixture proportions remains challenging. Traditional pollen ID using microscopy is time-consuming, requires expertise, and has limited accuracy and throughput. Molecular barcoding approaches being explored offer improved accuracy and throughput. The common approach, amplicon sequencing, employs PCR amplification to isolate DNA barcodes, but introduces significant bias, impairing downstream quantification. We apply a novel molecular hybridisation capture approach to artificial pollen mixtures, to improve upon current taxon ID and quantification methods. The method randomly fragments DNA, and uses RNA baits to capture DNA barcodes, which allows for PCR duplicate removal, reducing downstream quantification bias. Metabarcoding was tested using two reference libraries constructed from publicly available sequences; the matK plastid barcode, and RefSeq complete chloroplast references. Single barcode-based taxon ID did not consistently resolve to species or genus level. The RefSeq chloroplast database performed better qualitatively but had limited taxon coverage (relative to species used here) and introduced ID issues. At family level, both databases yielded comparable qualitative results, but the RefSeq database performed better quantitatively. A restricted matK database containing only mixture species yielded sequence proportions highly correlated with input pollen proportions, demonstrating that hybridization capture usefulness for metabarcoding and quantifying pollen mixtures. The choice of reference database remains one of the most important factors affecting qualitative and quantitative accuracy.
Dataset DOI: 10.5061/dryad.73n5tb37z
Description of the data and file structure
This dataset contains raw Illumina sequencing reads (FASTQ format) derived from artificial pollen mixtures composed of three plant species: Eucalyptus baxteri, Echinacea arctotheca, and Prunus dulcis. The purpose of the experiment was to evaluate the accuracy of DNA-based taxonomic identification and quantification using targeted chloroplast hybrid capture followed by high-throughput sequencing.
The pollen was combined in varying proportions to create 14 artificial mixes, each with three replicates, resulting in 42 DNA libraries plus negative controls. Each sample was assigned a unique internal barcode and dual index combination for multiplex sequencing.
Libraries were prepared using the NEBNext Ultra II kit with custom stubby Y-adapters and enriched using the OZBaits_CP V1.0 chloroplast bait set targeting 19 conserved chloroplast genes. The final library pool was sequenced on an Illumina HiSeq X Ten platform (2 × 150 bp reads).
Files and variables
File description:
Each sample is represented by two compressed FASTQ files: a forward read file (R1) and a reverse read file (R2), containing raw Illumina sequencing data. These files have already been demultiplexed based on the internal in-line barcodes used during library preparation.
The filenames follow a naming convention that includes:
- the mix ID (e.g.,
M07,M14) corresponding to a specific artificial pollen mixture, and - the replicate ID (
a,b, orc) indicating one of the three replicates per mix.
For example:
Poll_713_M7a_R1.fastq.gz and Poll_713_M7a_R2.fastq.gz represent the forward and reverse reads for replicate a of Mix M7.
Blank (negative control) samples follow a similar naming convention (e.g., Poll_760_EBC2_Blank_R1.fastq.gz).
Files:
File: Poll_716_M8a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_716_M8a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_715_M7c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_715_M7c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_713_M7a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_714_M7b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_713_M7a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_760_EBC2_Blank_R1_001.fastq.gz
File: Poll_760_EBC2_Blank_R2_001.fastq.gz
File: Poll_718_M8c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_714_M7b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_718_M8c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_722_M10a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_720_M9b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_723_M10b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_723_M10b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_721_M9c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_720_M9b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_721_M9c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_722_M10a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_724_M10c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_717_M8b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_730_M12c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_724_M10c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_719_M9a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_719_M9a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_717_M8b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_729_M12b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_728_M12a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_730_M12c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_729_M12b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_728_M12a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_738_M15b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_738_M15b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_739_M15c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_727_M11c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_726_M11b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_727_M11c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_739_M15c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_726_M11b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_737_M15a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_737_M15a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_741_M16b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_742_M16c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_741_M16b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_743_M17a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_742_M16c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_735_M14b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_743_M17a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_744_M17b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_735_M14b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_744_M17b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_745_M17c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_745_M17c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_740_M16a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_740_M16a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_725_M11a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_725_M11a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_746_M18a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_746_M18a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_747_M18b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_747_M18b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_750_M19b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_750_M19b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_748_M18c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_748_M18c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_753_M20b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_753_M20b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_752_M20a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_751_M19c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_752_M20a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_751_M19c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_731_M13a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_731_M13a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_734_M14a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_732_M13b_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_734_M14a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_732_M13b_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_754_M20c_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_733_M13c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_754_M20c_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_733_M13c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_749_M19a_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
File: Poll_749_M19a_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_736_M14c_Mix_E_baxteri_E_arctotheca_P_dulces_R2_001.fastq.gz
File: Poll_736_M14c_Mix_E_baxteri_E_arctotheca_P_dulces_R1_001.fastq.gz
Corresponding sample names in publication
Note that in the publication, the Mix ID was modified from M7-M20, to M1-M14.
| Sample name (file) | Corresponding sample name (publication) |
|---|---|
| M7 | M1 |
| M8 | M2 |
| M9 | M3 |
| M10 | M4 |
| M11 | M5 |
| M12 | M6 |
| M13 | M7 |
| M14 | M8 |
| M15 | M9 |
| M16 | M10 |
| M17 | M11 |
| M18 | M12 |
| M19 | M13 |
| M20 | M14 |
| EBC2 | Blank |
Code/software
How to Open and Analyze the Data
The dataset consists of demultiplexed Illumina sequencing reads in compressed FASTQ format (.fastq.gz). These files can be opened in any text editor that supports gzip compression, but they are typically analyzed using command-line tools or within bioinformatics workflows.
A range of free and open-source software is available to view, quality-check, and analyze these files. Below is an overview of the tools used in this study, organized by processing step.
1. Demultiplexing (already completed)
Software:
bcl2fastq(Illumina): Used to demultiplex reads based on Illumina index sequences.Sabre: Used to demultiplex based on internal 8 bp barcodes.
Note: Sabre was run allowing one mismatch, as all internal barcodes differed by at least two base pairs.
2. Quality Filtering and Trimming
Software:
Clumpify(BBTools suite): Used to remove PCR duplicates by grouping identical reads.AdapterRemoval: Used for trimming adapter sequences, filtering short/low-quality reads, and removing trailing A-tails. Reads shorter than 30 bp or with Phred scores <20 were removed.
3. Quality Assessment
Software:
FastQC: For checking per-base quality scores, GC content, and other quality metrics.MultiQC: For summarizing FastQC reports across all samples.seqtk,zcat,less: Command-line tools for quick inspection or subsampling of reads.
4. Taxonomic Classification and Quantification
Software:
Kraken2: Used for assigning taxonomic labels to reads via a k-mer-based approach.Bracken: Used for estimating taxon abundance from Kraken2 output.
