Method Article

Bioinformatics Approach to Cancer Prediction using Quantum Clustering Algorithm for Behavioral Similarity in Gene Expression

DOI:

10.3791/68890

January 9th, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This protocol aims to cluster gene expression data for cancer classification using a Hybrid Quantum K-Means algorithm that automatically detects the optimal number of clusters and separates them efficiently, advancing bioinformatics applications on Noisy Intermediate-Scale Quantum ( NISQ) devices.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study introduces a Hybrid Quantum K-Means Clustering Algorithm with automatic cluster detection for classifying cancerous and non-cancerous gene expression data. The method employs Quantum Multi-Feature Mapping for state encoding, Swap Test-based quantum distance estimation, and Quantum Gradient-Based Optimization to dynamically identify the optimal number of clusters by minimizing intra-cluster variance. Initial centroids are selected through a probability-proportional distance strategy, improving stability and accuracy. Applied to breast cancer datasets, the approach surpasses the existing quantum K-Means algorithm, achieving a Silhouette Score of 0.641 (compared to 0.601), a Calinski-Harabasz Index of 766.57 (compared to 617.65), and a Davies-Bouldin Index of 0.659 (compared to 0.704). These results indicate superior cluster compactness and separation. Although the proposed algorithm exhibits slightly higher time complexity O (N×Kmax×Mobs) due to iterative optimization, it significantly outperforms predefined-K quantum K-Means in clustering accuracy, error reduction, and practical feasibility. Its efficiency in handling high-dimensional data and resilience to quantum noise highlights its potential for real-world bioinformatics applications, particularly in cancer classification using gene expression profiles.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

In biomedical engineering, bioinformatics, statistics, social sciences, and economics, clustering is a fundamental technique for organizing data into meaningful homogeneous groups. For example, topological data analysis (TDA) has been applied to cancer gene expression datasets to reveal structural patterns in high-dimensional spaces1, Clustering organizes data such that objects with high similarity are placed within the same cluster, while dissimilar objects are assigned to different clusters. This falls under unsupervised learning and does not require labeled training data.

Over the past several decades, numerous cl....

Access restricted. Please log in or start a trial to view this content.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

1. Quantum Feature Mapping

Encoding classical data points into quantum states is achieved by mapping them into a quantum Hilbert space, which can be efficiently accessed and manipulated by a quantum computer16˒17,19. This process employs a nonlinear quantum feature map that embeds classical data into the Hilbert space (Figure 1). A fixed quantum circuit feature map transforms the input data points into quantum states17, while variational circuits enable machine learning tasks by adapting t....

Access restricted. Please log in or start a trial to view this content.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

A good cluster will depend upon various no of factors like, separation distance between clusters, within cluster distance, variance ratio criterion etc. So, Clustering performance was evaluated using three standard indices: the Silhouette Score, the Calinski-Harabasz Index (CH Index), and the Davies-Bouldin Index (DB Index). The Silhouette Score measures separation between clusters as

Access restricted. Please log in or start a trial to view this content.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study proposes a novel Hybrid Quantum K-Means Clustering Algorithm with Optimal Cluster Detection, specifically designed to classify cancerous and non-cancerous samples using high-dimensional gene expression data. The approach integrates Quantum Multi-Feature Mapping, Swap Test-based quantum distance estimation, and Quantum Gradient-Based Optimization to dynamically determine the optimal number of clusters. Unlike traditional K-Means algorithms, which require a predefined number of cluster.......

Access restricted. Please log in or start a trial to view this content.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors have no conflict of interest.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors acknowledge the use of open-access gene expression datasets and quantum simulators that made the practical validation of this work feasible.

....

Access restricted. Please log in or start a trial to view this content.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
Apple MacBook Pro (M1 chip)Apple Inc.-8?core CPU / 8?core GPU, 16?GB unified memory — used for local simulation
Breast Cancer Gene Expression DatasetKaggle-Dataset with 569 samples, 32 features (reduced via PCA in study)
macOS Monterey (Operating System)Apple Inc.12.6.9Runtime environment used on local machine
math (Python standard library)Python Software Foundationbuilt-inBasic mathematical functions
MatplotlibMatplotlib community3.8.4Plotting and visualization
NoiseModel, QuantumError, ReadoutError (Qiskit Aer)IBM / Qiskit projectpart of Aer 0.13.3Used to simulate realistic quantum noise
NumPyNumPy developers1.26.4Numerical operations and array manipulation
pandaspandas development team2.2.2Data handling, I/O, tabular operations
PythonPython Software Foundation3.10.12Programming language, used in Jupyter / IPython environment
Qiskit AerIBM / Qiskit project0.13.3Simulator backend, with noise modeling and execution
Qiskit IBM Runtime – Session, SamplerV2IBM / Qiskit project0.41.1Execution framework for circuits in simulator
Qiskit TerraIBM / Qiskit project0.45.0Quantum framework for circuit construction and transpilation
scikit-learnscikit-learn developers1.4.2PCA, clustering metrics, data preprocessing

References

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,
  1. Mehta, V., Agarwal, M., Kaliyar, R. K. A comprehensive and analytical review of text clustering techniques. Int. J. Data Sci. Anal. 18 (3), 239-258 (2024).
  2. Bezdek, J. C. Pattern recognition with fuzzy objective function algorithms. , Springer Science & Business Media. (2013).

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Quantum ClusteringGene ExpressionCancer PredictionHybrid Quantum K MeansCluster DetectionQuantum Feature MappingSwap TestQuantum OptimizationBreast Cancer DataCluster Compactness

Related Articles