Abstract:
Human papillomavirus 16 (HPV16) drives precursor cervical lesions that often progress to cervical
cancer (CC). Variation within the HPV16 genome has been associated with CC risk. Here, we
developed an affordable and portable amplicon-based long-read whole genome sequencing (WGS)
approach using Oxford Nanopore Technologies (ONT) to investigate HPV16 genetic diversity among
women in sub-Saharan African countries. Applied to a control CaSki cell line and clinical samples
(n = 12), our method generated complete HPV16 genomes at high coverage (median read coverage
5,899–15,279×). Benchmarking our HPV16 controls showed high accuracy for two variant calling
pipelines (Clair3 and PEPPER-Margin DeepVariant). Phylogenetic analysis identified all four
previously defined HPV16 lineages (A–D) and their high-risk sublineages. All lineages exhibited
strong concordance across de novo assembly, reference-based phylogenetics, and unsupervised
clustering. Our pipeline effectively captured the full extent of genomic variation, including putative
lineage-informative SNPs. This method offers a robust amplicon-based WGS and analysis pipeline for
HPV16, making it well-suited for integration into surveillance, diagnostics, and epidemiological
efforts in low-resource areas.