HIV epidemic and needs beyond fast-track cities: a transmission networks analysis to study the dynamic of HIV clusters in a French region near Paris
Data files
Feb 24, 2026 version files 1.83 MB
-
PHYLOVIH_Metadata.csv
107.63 KB
-
PHYLOVIH.fasta
1.72 MB
-
README.md
2.64 KB
Abstract
The Centre-Loire Valley region is a low-density demographic and medical region near the fast-track city of Paris, with persistent elevated rate of positive HIV testing. We investigated the role and characteristics of transmission clusters in the HIV dynamics in this region for appropriate local response. HIV pol gene sequences collected in Centre-Loire Valley over a decade (2010-2020) were included in a phylogenetic analysis combined with epidemiological data. Putative transmission clusters were inferred using HIV TRACE methods. Risk factors of being part of a cluster were studied using multivariate logistic regression models. Of the 1305 participants, 579 (44%) were born out of France, mainly from Sub-Saharan Africa, 494 (38%) were women, 433 (33%) men who have sex with men (MSM), 694 (53%) heterosexuals. Migrants had lower CD4 cell count at diagnosis than those born in France (296 vs 443, p< 0.01), likely due to delayed time to diagnosis. A total of 86 clusters were identified (clustering rate of 21%) including 33 of size ≥ 3 involving 170 participants (3-16 per cluster). MSM (OR 2.16, p< 0.01) and higher viral load (OR 1.21, p< 0.01) were risk factors of clustering. Individuals born abroad were at lower risk than those born in France (OR 0.03, p < 0.01). Among large clusters, persistent virological control was achieved in a median of 75% of participants vs 83% outside clusters (p<0.01). Molecular epidemiology showed that MSM were part of local transmission networks but not heterosexual migrants, suggesting distinct epidemic features and needs.
HIV epidemic and needs beyond fast-track cities: a transmission networks analysis to study the dynamic of HIV clusters in a French region near Paris
This dataset includes HIV pol sequences isolated from 1305 adults who were followed at one of the hospitals in the Centre-Val de Loire region (France) between 2010 and 2020.
We have submitted our raw data as a fasta file (PHYLOVIH.fasta) and specimen information (PHYLOVIH_Metadata.csv).
Description of the data and file structure
PHYLOVIH.fasta
This fasta file includes 1305 Partial pol gene sequences. Each sequence is labeled with the participant's unique identifier.
For example: >VIH-000001 refers to participant 1 of the study.
PHYLOVIH_Metadata.csv
- Sequence name : unique participant identifier (same identifier used in PHYLOVIH.fasta )
- Sample date : date of the blood sample (month-year)
- Country : where the sample was taken (all samples from the study )
- Subtype : HIV-subtype determined by uploading sequences individually into the REGA HIV-1 Automated Subtyping Tool version 3.47 (https://www.genomedetective.com/app/typingtool/hiv) and HIV Blast (https://www.hiv.lanl.gov/)
- Host : all samples were collected from human participants
- Sequencing technology : all sequences were obtained using Sanger sequencing
- Length : total number of nucleotides in each sequence
- GenomeStart : sequence start relative to the reference sequence for HIV-1(HXB2)
- GenomeEnd : sequence end relative to the reference sequence for HIV-1 (HXB2)
- Stop Codons : total number of stop codons in each sequence (corresponding to codon TAA, TAG and TGA)
- Hypermutation : detection of mutations fitting a specific parttern of hypermutation by the HYPERMUT 3 tool (https://www.hiv.lanl.gov/content/sequence/HYPERMUT/hypermutv3.html)
- Non-ACGT : frequence of ambiguous nucleotides in each sequence, defined IUPAC nucleotide ambiguity codes (Cornish-Bowden (1985) IUPAC-IUB SYMBOLS FOR NUCLEOTIDE NOMENCLATURE. Nucl. Acids Res. 13: 3021-3030)
Human subjects data
All subjects of this study gave their informed and written consent for the reuse of their anonymized data for research purposes. Metadata only contain sequence features but no identifying information (date of birth, home address or identification number). The clinical data of participants involved in clusters were not disclosed in order to minimise the risk of re-identification and stigmatisation of participants when communicating molecular clusters.
Drug resistance genotyping was realized in the two specialized virology laboratories of the CLV region (903 from Tours and 402 from Orléans), by sequencing pol gene encoding HIV-1 reverse transcriptase (RT) and protease from patients’ plasma samples, using Sanger as previously described (Chaix M-L, Descamps D, Harzic M, et al. Stable prevalence of genotypic drug resistance mutations but increase in non-B virus among patients with primary HIV-1 infection in France. AIDS 2003; 17:2635–2643).
