Abstract:
Molecular epidemiologic studies of malaria parasites and other pathogens commonly
employ amplicon deep sequencing (AmpSeq) of marker genes derived from dried blood
spots (DBS) to answer public health questions related to topics such as transmission and
drug resistance. As these methods are increasingly employed to inform direct public health
action, it is important to rigorously evaluate the risk of false positive and false negative
haplotypes derived from clinically-relevant sample types. We performed a control experiment
evaluating haplotype recovery from AmpSeq of 5 marker genes (ama1, csp, msp7,
sera2, and trap) from DBS containing mixtures of DNA from 1 to 10 known P. falciparum
reference strains across 3 parasite densities in triplicate (n = 270 samples). While false
positive haplotypes were present across all parasite densities and mixtures, we optimized
censoring criteria to remove 83% (148/179) of false positives while removing only 8% (67/
859) of true positives. Post-censoring, the median pairwise Jaccard distance between replicates
was 0.83. We failed to recover 35% (477/1365) of haplotypes expected to be present
in the sample. Haplotypes were more likely to be missed in low-density samples with
<1.5 genomes/μL (OR: 3.88, CI: 1.82–8.27, vs. high-density samples with �75 genomes/
μL) and in samples with lower read depth (OR per 10,000 reads: 0.61, CI: 0.54–0.69).
Furthermore, minority haplotypes within a sample were more likely to be missed than dominant
haplotypes (OR per 0.01 increase in proportion: 0.96, CI: 0.96–0.97). Finally, in clinical
samples the percent concordance across markers for multiplicity of infection ranged
from 40%-80%. Taken together, our observations indicate that, with sufficient read depth,
the majority of haplotypes can be successfully recovered from DBS while limiting the false
positive rate.