| Data | | | |
| 10,000 NIPT low-depth samples | This paper | | Ultra-low-depth whole genome sequencing data used for imputation benchmark. |
| 500 high-depth WGS samples | This paper | | 30× high-depth WGS used as gold standard/truth set. |
| Reference Panel | | | |
| 1KGP-EAS reference panel | 1000 Genomes Project (East Asia) | | Subset of 1KGP for East Asian ancestry-specific imputation. |
| CKB reference panel | China Kadoorie Biobank | | Custom population-specific panel for genotype imputation. |
| Software and algorithms | | | |
| BCFtools v1.11 | GitHub (samtools/bcftools) | | Used for merging and sorting chromosome-level results, and filtering variants. |
| BQSR of the GATK 4.0.4.0 toolset | Broad Institute | | Used for base quality score recalibration (BQSR). |
| BWA-MEM .7.16a-r1181 | Heng Li / GitHub | | For aligning raw reads to GRCh38. |
| DPGT (Distributed Population Genetics Tool) | BGI | | A distributed population genetics analysis tool which enabled joint calling on millions of WGS samples. Available at [GitHub - BGI-flexlab/DPGT](https://github.com/BGI-flexlab/DPGT) |
| fastp.0.23.4 | Open-source (Chen et al., 2018) | | For quality control and adapter trimming. |
| GLIMPSE2 | University of Oxford | | Fast genotype phasing and imputation for low-coverage WGS |
| Original Code for analyses | This paper | | Supplementary File 1 Original Code for analyses |
| Picard toolkit | Broad Institute | | Used for marking duplicates and file format conversion. |
| Plink 2.0 | C. Chang, S. Purcell / Broad Institute | | For genotype format conversion and association analysis. |
| Python 3.8 | Python Software Foundation | | Used for scripting, automation, and data analysis. |
| QUILT2 | Oxford Big Data Institute | | HMM-based imputation using external reference panels |
| R 4.1.3 | The R Foundation | | Used for running STITCH, QUILT2 and plotting/statistics. |
| SAMtools v1.3 | GitHub (samtools/samtools) | | For manipulating SAM/BAM files. |
| Seqtk-1.5 | GitHub (lh3/seqtk) | | Toolkit for processing sequences in FASTA/Q formats. Available at [GitHub - lh3/seqtk](https://github.com/lh3/seqtk) |
| SOAPnuke | BGI | | For NGS data quality control and filtering. |
| STITCH v1.6.6 | University of Oxford | | Imputation tool optimized for ultra-low coverage sequencing |
| tabix | GitHub (samtools/tabix) | | Used for indexing and querying bgzipped VCF files. |
| Other Materials | | | |
| GATK bundle files | GATK | | Available at [https://github.com/gatk-workflows/gatk4-data-processing/blob/master/processing-for-variant-discovery-gatk4.hg38.wgs.inputs.json] |
| Genetic Map for 1000G (GRCh38) | Oxford / 1000 Genomes Project | | Required for phasing/imputation tools |
| GRCh38 | Genome Reference Consortium | | Used for read alignment and variant calling |