File Pansu&al_BioLett_gh_unfiltered_dataset.fasta This file contains unique sequences produced by g (5'-GGGCAATCCTGAGCCAA-3') and h (5'-CCATTGAGTCTCTGCACCTATC-3') primers (Taberlet et al. 2007). The sequences have been produced by the Illumina technology (HiSeq 2500 platform, 2x100bp paired-end). First filtering steps were performed using the OBITOOLS software (http://metabarcoding.org/obitools). Direct and reverse reads corresponding to the same sequence were aligned and merged thanks to the IlluminaPairedEnd program. Only merged sequences with a high alignment quality score were retained (>=40). Each merged sequence was assigned to its original sample using the tags information previously added to primers thanks to the ngsfilter program. Only sequences containing both primers (with a maximum of 3 mismatches per primer) and exact tag sequences were selected. Strictly identical sequences were merged together. Data collection: Johan Pansu, Richard Winkworth, Ludovic Gielly, Hennion Francoise and Philippe Choler Richard Winkworth, Hennion Francoise and Philippe Choler collected soil samples. Ludovic Gielly created plant reference database. Johan Pansu performed the DNA extraction. Johan Pansu performed the DNA amplification. Johan Pansu performed sequences filtering. Contact author: Johan Pansu (johan.pansu@gmail.com) The header of each sequence contains: (i) the ID for the sequence (ii) the total number of occurrence of the sequence in the dataset (count) (iii) the number of occurrence in the different PCR (merged_sample) Code for identifying the PCR: (i) Digits 1-6 correspond to the sampled plot (ii) Digit 7 corresponds to the sampling replicate (A or B) (iii) Digit 9 corresponds to the DNA extraction (a or b) (iv) Digit 11 corresponds to the PCR replicate (1 or 2) e.g., KGAU05B_a_2: Second PCR replicate (2) from the first DNA extraction (a) from the second sampling replicate (B) from the plot KGAU05 (v) TEMEXTx : extraction control x (vi) TEMPCRx or TEMVIDx : PCR control x