Method Article

Comprehensive Evaluation of Genotype Imputation Tools for Ultra-low-depth Whole-Genome Sequencing Data

DOI:

10.3791/68879

December 12th, 2025

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Three imputation tools-STITCH, QUILT2, and GLIMPSE2-were benchmarked across varying sequencing depths and sample sizes, using CKB and EAS reference panels. The results provide a practical framework for selecting appropriate imputation strategies in ultra-low-depth sequencing data, facilitating large-scale population genomic and complex trait studies.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Ultra-low-depth sequencing (ULDS) is a cost-effective strategy for large-scale genomic studies, but its utility hinges on accurate genotype imputation. This study evaluates three imputation tools -- STITCH, QUILT2, and GLIMPSE2 -- across varying sequencing depths and sample sizes, using the China Kadoorie Biobank (CKB) and The 1000 Genomes Project (1KGP) East Asian (EAS) reference panels. Critical performance divergences are demonstrated: Sample size sensitivity: STITCH's accuracy improved markedly with larger samples, whereas QUILT2 and GLIMPSE2 showed minimal dependence on sample size. Reference panel optimization: Population-specific CKB significantly enhanced accuracy for QUILT2 and GLIMPSE2 but had a negligible impact on STITCH, which relies on internal haplotype inference. Depth thresholds: All tools achieved robust accuracy at moderate sequencing depths (≥ 0.5x), but STITCH underperformed drastically at ultra-low depths (≤ 0.1x). GLIMPSE2 with CKB delivered the highest overall accuracy, while QUILT2 balanced precision and computational efficiency. For non-invasive prenatal testing (NIPT) data, GLIMPSE2+CKB maintained sufficient accuracy for downstream analyses. A decision framework is proposed, prioritizing population-matched panels and depth-adapted tools, offering actionable guidelines for optimizing ULDS-WGS in diverse research settings. These insights bridge methodological advancements with practical implementation, enabling cost-effective scaling of genomic studies without compromising data quality.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Ultra-low-depth sequencing (ULDS), defined as sequencing coverage below 1x, has gained traction due to its low cost, broad genome coverage, and compatibility with diverse sample types. It has already shown clinical value in applications such as non-invasive prenatal testing (NIPT)1, cancer monitoring2, and chromosomal copy number variation (CNV) detection3,4. Beyond clinical diagnostics, the decreasing cost of sequencing and rapid advances in bioinformatics have enabled ULDS to play a growing role in population genomics and complex trait research. By combining UL....

Access restricted. Please log in or start a trial to view this content.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

All participants provided written informed consent prior to participation. The study involving high-depth WGS data was reviewed and approved by the BGI Institutional Review Board (BGI-IRB 23058-T2), and approval for the collection of human genetic resources was obtained from the Human Genetic Resources Administration of China ([2023] CJ0262). The study involving ULDS data from NIPT was approved by the Institutional Review Board of Wuhan Children's Hospital (2021R062) and the BGI Institutional Review Board (BGI-IRB 21088), with additional approval from the Human Genetic Resources Administration of China ([2021] CJ2002).

NOTE: This study ....

Access restricted. Please log in or start a trial to view this content.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Impact of sample size on imputation accuracy
Increasing the sample size from N = 200 to N = 500 improved the imputation accuracy of STITCH, particularly under low-coverage conditions. For example, with the CKB reference panel at 1x coverage, STITCH achieved an R2> of 0.916 (N=500) compared to 0.882 (N=200), representing a 3.4% increase (Figure 3; Supplementary File 2). Similarly, at 0.5x coverage, its accuracy rose from .......

Access restricted. Please log in or start a trial to view this content.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study systematically assessed the performance of three widely used genotype imputation tools for ULDS, with high-depth WGS serving as the gold standard. A key methodological strength lies in the adoption of a unified preprocessing pipeline -- encompassing alignment, quality control, and base quality score recalibration -- that minimizes batch effects and ensures comparability across tools and conditions. By downsampling deeply sequenced samples, ultra-low-depth data were simulated under controlled settings, thereby .......

Access restricted. Please log in or start a trial to view this content.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors declare no competing interests.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study was supported by Shenzhen Medical Research Fund (B2404004), National Key Research and Development Program of China (2023YFC2605400, 2022YFC2502402), Shenzhen Science and Technology Program (SYSPG20241211173852024), Open Research Project in State Key Laboratory of Vascular Homeostasis and Remodeling (Peking University) (2025-SKLVHR-013), and Key-Area Research and Development Program of Guangdong Province (2023B0303040001).

....

Access restricted. Please log in or start a trial to view this content.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
Data
10,000 NIPT low-depth samplesThis paperUltra-low-depth whole genome sequencing data used for imputation benchmark.
500 high-depth WGS samplesThis paper30× high-depth WGS used as gold standard/truth set.
Reference Panel
1KGP-EAS reference panel1000 Genomes Project (East Asia)Subset of 1KGP for East Asian ancestry-specific imputation.
CKB reference panelChina Kadoorie BiobankCustom population-specific panel for genotype imputation.
Software and algorithms
BCFtools v1.11GitHub (samtools/bcftools)Used for merging and sorting chromosome-level results, and filtering variants.
BQSR of the GATK 4.0.4.0 toolsetBroad InstituteUsed for base quality score recalibration (BQSR).
BWA-MEM .7.16a-r1181Heng Li / GitHubFor aligning raw reads to GRCh38.
DPGT (Distributed Population Genetics Tool)BGIA distributed population genetics analysis tool which enabled joint calling on millions of WGS samples. Available at [GitHub - BGI-flexlab/DPGT](https://github.com/BGI-flexlab/DPGT)
fastp.0.23.4Open-source (Chen et al., 2018)For quality control and adapter trimming.
GLIMPSE2University of OxfordFast genotype phasing and imputation for low-coverage WGS
Original Code for analysesThis paperSupplementary File 1 Original Code for analyses
Picard toolkitBroad InstituteUsed for marking duplicates and file format conversion.
Plink 2.0C. Chang, S. Purcell / Broad InstituteFor genotype format conversion and association analysis.
Python 3.8Python Software FoundationUsed for scripting, automation, and data analysis.
QUILT2Oxford Big Data InstituteHMM-based imputation using external reference panels
R 4.1.3The R FoundationUsed for running STITCH, QUILT2 and plotting/statistics.
SAMtools v1.3GitHub (samtools/samtools)For manipulating SAM/BAM files.
Seqtk-1.5GitHub (lh3/seqtk)Toolkit for processing sequences in FASTA/Q formats. Available at [GitHub - lh3/seqtk](https://github.com/lh3/seqtk)
SOAPnukeBGIFor NGS data quality control and filtering.
STITCH v1.6.6University of OxfordImputation tool optimized for ultra-low coverage sequencing
tabixGitHub (samtools/tabix)Used for indexing and querying bgzipped VCF files.
Other Materials
GATK bundle filesGATKAvailable at [https://github.com/gatk-workflows/gatk4-data-processing/blob/master/processing-for-variant-discovery-gatk4.hg38.wgs.inputs.json]
Genetic Map for 1000G (GRCh38)Oxford / 1000 Genomes ProjectRequired for phasing/imputation tools
GRCh38Genome Reference ConsortiumUsed for read alignment and variant calling

References

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,
  1. Zhang, H., et al. Non-invasive prenatal testing for trisomies 21, 18 and 13: clinical experience from 146,958 pregnancies. Ultrasound Obstet Gynecol. 45 (5), 530-538 (2015).
  2. Heitzer, E., et al.

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Genotype ImputationUltra Low Depth SequencingWhole Genome SequencingImputation ToolsReference Panel OptimizationPopulation Specific PanelsSample Size SensitivityNon Invasive Prenatal TestingSequencing Depth ThresholdsComputational Efficiency

Related Articles