Method Article

A Novel k-Nearest Neighbor Method with Distribution Discrepancy and Differential Feature Importance for Rolling Bearing Fault Diagnosis

DOI:

10.3791/70568

June 5th, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study proposes a k-nearest neighbor method that integrates distribution discrepancy and differential feature importance for accurate rolling bearing fault diagnosis.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Rolling bearings are among the most vulnerable components in various types of rotary machinery, and accurate fault detection and localization are essential. When a rolling bearing fails, the signal is non-stationary, and the energy distribution of the vibration signal varies depending on the fault location. In traditional k-nearest neighbor (KNN) fault diagnosis algorithms, Euclidean distance is primarily used to measure the distance between sample points, which is not effective at capturing similarity across different spatial distributions. Moreover, these algorithms assume equal feature importance, which does not reflect the actual characteristics of fault vibration signals. This study proposes a KNN-based rolling bearing fault diagnosis method that incorporates distribution discrepancy and differential feature importance. First, vibration signals are decomposed using three-level wavelet packet decomposition, and the energy of each node at the third level is used as the fault feature. Then, the mean impact value (MIV) algorithm is used to determine the relative importance of each feature, and the Earth mover’s distance (EMD) is applied to measure differences between spatial distributions. By integrating Euclidean distance with MIV and EMD and applying the KNN majority voting rule, fault diagnosis is performed. The experimental results indicate that this method achieves a diagnostic accuracy of 99.43%, representing a 5.97% improvement compared to traditional KNN methods. The proposed method demonstrates accurate and effective fault diagnosis performance on the rolling bearing datasets used in this study.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

With the progress of technology, rotary machinery is increasingly developing toward integration, large-scale systems, high speed, and intelligent operation1. Among various types of rotary electrical machinery, rolling bearing components are the most vulnerable and commonly damaged parts. They offer advantages such as easy start-up, low friction, simple lubrication, and convenient replacement, and are widely used in precision instruments, aerospace, automobiles, machine tools, robots, and other fields. According to relevant statistical data on rotating machinery failures, abnormal vibration faults account for 70%, and 30% of these vibration faults are closely related to rolling bearing abnormalities2. Therefore, accurate diagnosis of rolling bearing faults is a highly important and widely studied field.

The mainstream methods for diagnosing rolling bearing faults include general methods and machine learning methods. General diagnostic methods analyze and decompose signals in the time and frequency domains. Time-domain analysis methods describe the nature and characteristics of signals by observing waveform patterns, statistical features, and temporal relationships, thereby enabling fault diagnosis. Common indicators include mean value, root mean square, correlation coefficient, margin, effective value, and impulse factor3. However, these methods are susceptible to external noise, which reduces accuracy. In complex systems, they may not fully characterize system behavior and often need to be combined with other analytical approaches. Chen et al. preprocessed vibration time-domain signals to extract different dimensionless features and then built a training model using a decision-tree-based random forest algorithm4. The effectiveness of this method was validated using rolling bearing competition data and simulated marine bearing fault data.

Frequency-domain analysis methods transform signals into the frequency domain, enabling a better understanding of frequency components, spectral characteristics, and frequency distribution. These methods include the Fourier transform, spectral analysis, and power spectral density. Li et al. analyzed the frequency distribution of vibration signals in the generated envelope spectrum to diagnose bearing faults5. Wang et al. identified characteristic frequencies of vibration signals using different frequency-domain analysis methods and compared them with the equipment’s inherent characteristic frequencies to achieve mechanical fault identification in spindle systems6.

With the increasing complexity of equipment, the demand for signal analysis has become more diverse. Since nonlinear and non-stationary signals contain time-evolving frequency components, conventional analyses based on linearity and stationarity assumptions fail to fully reveal their transient behaviors and temporal correlations. In contrast, time-frequency analysis provides a joint representation of signal energy distribution across both time and frequency dimensions, enabling a more comprehensive interpretation. These methods allow observation of how signals change over time and how frequency components vary across different intervals, helping to capture dynamic signal characteristics7,8. Several prominent techniques, such as discrete wavelet transform, ensemble empirical mode decomposition, and variational mode decomposition, are widely used for time-frequency analysis9. Combining continuous wavelet transform with a transfer learning-enhanced residual neural network, Diao et al. proposed a hybrid diagnostic framework10.

Traditional diagnostic approaches, which are largely manual and experience-driven, are prone to subjective bias and operator-dependent inconsistencies, leading to uncertain, non-uniform diagnoses. Even after signal processing, the extracted multidomain features often require further optimization to achieve accurate fault diagnosis. Machine learning methods, by contrast, classify bearing faults using mathematical models and automatically identify patterns in feature datasets, reducing reliance on human judgment. Consequently, many researchers have combined signal processing with machine learning to diagnose and classify bearing fault types. Commonly used methods include ensemble models such as random forests, kernel-based methods such as support vector machines, and single-layer feedforward networks such as extreme learning machines11.

Notable efforts have been devoted to advancing machine learning techniques for fault diagnosis. Recently, more advanced methods based on graph neural networks have been developed. Zhang et al. proposed a multiscale channel attention-driven graph dynamic fusion learning method for robust fault diagnosis under noisy signals12. You et al. developed a channel-adaptive generative reconstruction and fusion framework for few-shot fault diagnosis13. While these methods achieve state-of-the-art accuracy, they require substantial computational resources and large labeled datasets. For rolling bearing fault diagnosis, Guo et al. proposed a data-level fusion method with adaptive weighting14. This method processes multisource vibration signals using the k-nearest neighbor (KNN) algorithm to determine optimal weighting schemes. Fault-induced non-stationarity in vibration signals alters the spectral energy distribution, and these per-band energy variations serve as discriminative features for different fault states. However, traditional KNN algorithms rely on Euclidean distance for similarity measurement, which is insufficient for data with complex or diverse distributions. Additionally, they assume equal feature importance, which does not reflect the actual characteristics of fault vibration features15,16,17. Traditional distribution similarity measures are also heavily influenced by distribution overlap, limiting their ability to capture true discrepancies18,19,20,21,22. In contrast, Earth mover’s distance (EMD) measures the minimum cost required to transform one distribution into another, effectively capturing distribution differences regardless of overlap or positional displacement. This property makes EMD particularly suitable for this study, as it enables robust similarity measurement under varying operating conditions where distribution shifts are common.

To address these limitations, this study develops a KNN-based fault diagnosis method for rolling bearings that incorporates both distribution differences and feature importance. The proposed method consists of four main steps. First, a three-level wavelet packet decomposition is applied to the vibration signals, and the energy values of all nodes at the third level are calculated to construct the fault feature set. Second, the mean impact value (MIV) algorithm is used to quantify the relative importance of each feature. Third, EMD is introduced to measure distribution differences between feature vectors, capturing underlying structural discrepancies. Finally, the conventional Euclidean distance in the KNN algorithm is enhanced by integrating MIV-based feature weights and EMD-based distribution metrics. This improved similarity measure, combined with majority voting, is used to classify faults and improve diagnostic accuracy.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study did not involve human participants or animal subjects; therefore, ethical approval and informed consent were not required. The proposed method was implemented using MATLAB R2014b. The following toolboxes were used: Wavelet Toolbox for wavelet packet analysis (wpdec and wpcoef), Neural Network Toolbox for BP neural network implementation (feedforwardnet and train), Optimization Toolbox for solving the linear programming problem in EMD computation (linprog), and Statistics and Machine Learning Toolbox for KNN classification (fitcknn and predict). The EMD was implemented by solving the transportation problem using linear programming. All experiments were conducted on a Windows 10 PC with an Intel Core i7-10700 CPU (2.90 GHz) and 16 GB RAM. The proposed method was validated using the CWRU Bearing Dataset (https://engineering.case.edu/bearingdatacenter/download-data-file) available publicly. Faults were introduced into SKF6205 drive-end bearings using electro-discharge machining (EDM) at three diameters (0.007″, 0.014″, and 0.021″) and at three locations (inner race, outer race, and ball). Vibration signals were collected at a sampling frequency of 12 kHz under four motor loads (0–3 hp), corresponding to speeds of 1797–1730 rpm. Each signal sample consisted of 2048 data points, obtained using a sliding window segmentation approach with a step size of 598 points (70.8% overlap).

The proposed KNN-based rolling bearing fault diagnosis framework (Figure 1) consists of seven sequential stages, where the output of each stage serves as the input to the next. In Stage 1, energy features are extracted from rolling bearing vibration signals using three-level wavelet packet decomposition with the Daubechies 3 (db3) wavelet basis. The normalized energy values of the eight sub-bands at the third decomposition level are compiled into the feature set Q = (q1, q2, ..., qm), where m = 8 is the feature dimension. The normalization is performed using sum normalization (relative energy normalization) as defined in Equation 66.

Vibration signal analysis diagram using MIV-BP neural networks, KNN-EMD, majority voting.
Figure 1: Implementation flowchart of the proposed algorithm. Flowchart illustrating the workflow of the proposed KNN–MIV–EMD method, including vibration signal input, wavelet packet energy feature extraction, MIV-based feature weighting, EMD-based similarity calculation, and final classification using majority voting. Please click here to view a larger version of this figure.

In Stage 2, the MIV of each feature is calculated using a BP neural network (single hidden layer with 10 neurons, maximum 2000 iterations, target error 1.0 × 10−5, perturbation step size δ = ±10% of the mean value of each feature), as defined in Equations 10–1323. The network employs the hyperbolic tangent sigmoid function (tansig) as the activation function θs in the hidden layer and a linear function (purelin) in the output layer. The network is trained using the Levenberg–Marquardt algorithm (trainlm). The MIV value is then assigned as the relative importance weight of that feature.

In Stage 3, raw vibration signals are segmented into 200 samples using a sliding window of 2048 data points with a step size of 598 points (70.8% overlap). The window is moved sequentially from the beginning of the signal. The dataset (Q) is partitioned into a training set (Qtrain) and a testing set (Qtest) using a 52/48 split ratio. For each fault condition, 104 samples are randomly selected as the training set, and the remaining 96 samples are used as the testing set. The partitioning is repeated 10 times using different random seeds, and the average performance metrics are reported to assess statistical robustness. This approach ensures that the results are not dependent on a specific random partition. Because the split is performed after segmentation and at the sample level, there is no data-point overlap between training and testing sets.

In Stage 4, the optimal number of nearest neighbors K is selected using five-fold cross-validation on the training set. The folds are generated randomly using a fixed random seed and are stratified by class to preserve class distribution. Candidate K values are searched within the range K range set notation equation, K ∈ [1,10], mathematical inequality concept., specifically evaluating K = 1, 3, 5, 7, and 9. The value of K that achieves the highest average classification accuracy across the five folds is selected as the optimal value. In this study, the optimal K is determined to be 3.

In Stage 5, the similarity measure is enhanced by integrating feature importance weights (from MIV) and distribution differences (from EMD). Each sample is represented as an 8-dimensional normalized energy feature vector obtained using sum normalization (as defined in Equations 4–76). The conventional KNN classifier uses Euclidean distance to measure similarity between samples; however, this approach is extended to better capture differences in feature distributions. EMD measures the distance between feature distributions and is particularly suitable for analyzing the energy distribution of bearing vibration signals obtained through wavelet packet decomposition. No additional normalization is applied prior to EMD computation. The ground distance used in EMD is the Euclidean distance between feature components. When measuring distances between multiple distributions, EMD is not affected by the positional differences of the distributions, enabling effective comparison of extracted energy features and improved classification when combined with the KNN decision rule.

Equations 1–3 are novel formulations proposed in this study. Equations 4–7 and 8–26 represent standard formulations in their respective fields. For a test sample (I) and a training sample (Qtrain), the weighted Euclidean distance incorporating MIV is defined as Equation 1, where m = 8 is the feature dimension, wi is the normalized MIV weight for the i-th feature, Qtest, i and Qtrain, i are the i-th feature values of the test and training samples, respectively.

dMIV formula, symbolic representation, equation for test-train data evaluation, educational use. (1)

The EMD-based distribution distance is defined in Equation 2, where the feature distributions of the test (Htest) and training (Htrain) samples are used. EMD measures the minimum cost required to transform one distribution into another.

Mathematical equation for Earth Mover's Distance, EMD concept, formula representation. (2)

The final enhanced distance combining both components is defined as Equation 3, where λ is a balancing parameter that controls the contribution of the EMD-based distribution distance. In this study, λ is set to 0.5 based on empirical tuning to achieve optimal classification performance. The value λ = 0.5 was determined by grid search on the validation set over the range [0, 1] with a step size of 0.1, and the value that achieved the highest classification accuracy was selected as optimal. The optimal λ may be dataset-specific; for other datasets, we recommend re-tuning λ using cross-validation on the training data.

Equation describing statistical method for data analysis optimization, showing d_final calculation. (3)

In Stage 6, all training samples are sorted based on their enhanced distance to the test sample, which incorporates MIV-based feature importance weights and EMD-based distribution metrics. The top K = 3 samples are then selected as the nearest neighbors.

In Stage 7, the majority voting rule is applied among the K = 3 nearest neighbors to determine the final class label for each test sample.

Wavelet Packet Analysis and Energy Extraction
Wavelet packet decomposition is based on wavelet transform but is more refined than conventional wavelet decomposition. A distinctive feature of wavelet packet decomposition is its ability to perform a more balanced and complete time–frequency analysis by decomposing both low- and high-frequency components, unlike conventional wavelet decomposition, which refines only the low-frequency part5. In contrast to the fixed resolution characteristic of wavelet decomposition, this approach enables a more balanced representation, alleviating the typical compromise between time and frequency localization across the signal bandwidth.

In the multiresolution process, wavelet packet decomposition is regarded as the stepwise orthogonal decomposition of a function space6. The formula for wavelet packet decomposition is given in Equation 46:

Wavelet transform equations, Σ-formula, mathematical concepts, diagram for educational use. (4)

In this formulation, the variables (Equation d<sub>l</sub><sup>j,2n</sup>, mathematical symbol for sequence indexing, educational use., Mathematical equation, dj,2n+1, illustrates variable transformation., and Equation for wavelet transform coefficient, showing subscripts and superscripts in symbolic form.) correspond to the coefficients obtained from the wavelet packet decomposition, while the symbols (hk-2l [low-pass] and gk-2l [high-pass]) represent the filter coefficients central to the decomposition process.

In this study, the db3 wavelet is selected as the wavelet basis function due to its compact support and orthogonality, which are well suited for extracting transient features from vibration signals. A three-level wavelet packet decomposition is performed on the original vibration signals, resulting in 23 =8 sub-bands at the third decomposition level.

Compared to the standard wavelet transform, the wavelet packet transform enables more granular signal decomposition. By decomposing the original signal to a specified scale, it isolates the frequency bands of interest and extracts their energy distribution as effective features. The wavelet packet transform decomposes a signal into sub-bands whose energy distribution characterizes the original signal’s frequency content, and this derived energy feature vector serves as a robust basis for signal classification.

The feature vector of a signal is defined as the normalized energy distribution across the 2j frequency bands obtained from its j-layer wavelet packet decomposition, where the total signal energy is partitioned into these orthogonal sub-bands. The energy contained in the k-th frequency band of the j-th decomposition layer is represented by Equation 56, and the normalized energy feature is obtained as shown in Equation 66.

Mathematical concept, equation Ej,k=Σ|xj,k(n)|², analysis of signal energy distribution. (5)

Equation showing probability calculation formula \(pj,k\) for statistical analysis in research. (6)

Following this procedure, for each vibration signal sample, an energy feature vector is constructed as defined in Equation 76:

Equation of force vector components; sequence of parameters p3,1 to p3,8 in analysis setup. (7)

KNN–MIV–EMD Implementation Diagnosis Procedure Execution
In the supervised KNN algorithm, the classification of a new instance is determined by the plurality class among its K most similar training samples, as measured by a predefined distance metric. The classification result is therefore dependent on the selection of K and the nature of the similarity calculation. This simple yet effective principle underpins its extensive application in diverse classification domains.

The workflow of the KNN algorithm is outlined as follows. First, the k-nearest neighbors are identified from the training samples by computing the Euclidean distances between the test sample and each training instance, as defined in Equation 824.

Distance formula, \( d_{ij}=\|x(i)-x(j)\|_2 \), mathematical equation, vector space analysis. (8)

In this notation, the variables x(i) and x(j) correspond to a training sample and a test sample, respectively.

Next, the class probability distribution for the test sample is estimated based on its k-nearest neighbors. Here, k represents the number of nearest neighbors, and the number of these k neighbors that belong to a specific class a (a = 1,2,...,c) is used to compute the probability P(a) that the test sample belongs to that class a, as defined in Equation 924. where c denotes the total number of classes in the dataset.

Static equilibrium; formula P(a)=ka/k; equation; physics concept; balance analysis. (9)

Finally, fault diagnosis is performed by identifying the k-nearest neighbors using Equation 8, tallying the counts according to Equation 9, sorting these counts in descending order, and assigning the class with the highest count as the fault class of the test sample.

The MIV is based on the BP neural network structure and is used to reflect the importance weight of each variable with respect to the output. The BP neural network is a feedforward network with a typical three-layer topology, including input, hidden, and output layers. It propagates errors backward and iteratively adjusts the weights of neurons to achieve self-learning.

Let X denote the sample dataset with L groups, as defined in Equation 1023 and Equation 1123:

Let X be the sample dataset with L groups:

Static equilibrium formula ΣFx=0, diagram illustrating linear equation X=[x(1),x(2),...,x(L)]. (10)

State vector formula, x(k)=[x1(k),x2(k),...,xn(k)]^T, used in control systems analysis. (11)

Here, x(k) denotes the sampled data at time k, xi is the i-th component of x(k), with k = 1, 2, …, L and i = 1, 2, …, n.

The working principle of the BP neural network is as follows. The input sample x(k) is first weighted by the connection weights ωT and propagated to the hidden layer to generate the input data si of the hidden layer, as defined in Equation 1223, where the activation function (θ) governs the transformation.

Neural network function: s<sub>j</sub>=θ[ω<sub>j</sub><sup>T</sup>x(k)+b<sub>j</sub>] equation, computation model. (12)

The output of the network is then obtained as defined in Equation 1323:

Neural network regression formula, ŷ(k)=∑βjθ[ωTx(k)+bj]; equation for predictive modeling. (13)

where ωj = [ω1j, ω2j,...,ωnj] denotes the input weight vector, β denotes the output weight vector, and j = 1, 2, …, n. In this study, the single hidden layer is set to 10 neurons, the maximum number of iterations is set to 2000, and the minimum expected target error is set to 1.0 × 10−5.

When a small perturbation (Δωij) is applied to the weights between the input and hidden layers, it is propagated to the hidden layer output (Sj), resulting in a variation that ultimately leads to a change (ΔSj) in the network output. The corresponding weights (ωij and ωjk) are updated through BP, and the loss function is defined in Equation 1423.

Static equilibrium equations, Σe², error estimation, scientific formula. (14)

To augment the fault feature set Time series data sequence equation, X=[x(1), x(2),..., x(L)], showing indexed numerical values., small positive and negative perturbations are applied independently to each feature variable in the sample data, as defined in Equation 1523 and Equation 1623.

Multivariate time series equation, formula for sequential data analysis. (15)

Static equilibrium equations, ΣFx=0, vector formula, educational diagram, mathematical analysis. (16)

In this formulation, L and n represent the number of fault feature factors and sample groups, respectively. In this study, the perturbation step size is set to δ = ±10% of the mean value of each feature, which is a commonly used setting in MIV-based feature importance analysis. Accordingly, the neural network fitting outputs are obtained as defined in Equation 1723 and Equation 1823.

Mathematical equation for signal processing, featuring summation notation and index variables. (17)

Neural network node activation formula, Zj=f(Σωjx(i)+aj), used in computational models. (18)

If Multivariable function formula, statistical model representation, equation analysis., as shown in Equation 1923, the corresponding outputs represent the results of the perturbed sample sets.

Multivariate calculus formula; IVi = Ŷi,+ - Ŷi,-; function evaluation; symbolic representation.    (19)

Here, static equilibrium diagram; ΣF=0; illustrating forces balance in physics experiment setup and Mathematical symbol Ŷi used in statistical model prediction equation diagram., respectively represent output results of sample sets Static equilibrium equation, X subscript delta t, formula symbol. and Static equilibrium equation ΣFx=0, diagram showing balanced forces, mechanical analysis.. The impact degree of each fault feature variable on the fault type is expressed as defined in Equation 2023.

Equation vector notation for input variables IV in mathematical analysis. (20)

By averaging the impact values over the number of observations, the mean impact value of each fault feature on the final output fault type is calculated as defined in Equation 2123.

Mathematical equation showing mean IV calculation using summation: \( MIV = \frac{1}{L} \sum_{i=1}^{L} IV_i \) (21)

The EMD is a measure of similarity between two distributions. Let Set theory equation, G with elements and weights pairs, mathematical analysis formula.  denote the source distribution and Mathematical set notation, formula illustration, equation for educational research analysis. denote the target distribution, where gi and hj are the positions (or feature vectors) of the i-th and j-th clusters in the source and target distributions, respectively. ωgi is the probability mass (weight) at position gj, satisfying Mathematical equation Σm i=1 ωgi=1; static equilibrium principle, balance condition.. ωhj is the probability mass (weight) at position hj, satisfying Equation of summation, symbol Σ, for weight normalization in mathematical analysis.. m and n are the numbers of clusters in the source and target distributions, respectively.

The EMD between G and H is defined as the minimum cost required to transform the source distribution into the target distribution, as given in Equation 227:

Earth Mover's Distance formula; mathematical equation analysis; optimization method. (22)

Here, the optimal flow (fij) is subject to the constraints defined in Equations 23–267:

Linear programming inequality constraints formula, \( f_{ij} \geq 0, i=1,2,...,m; j=1,2,...,n \). (23)

Optimization constraint formula, Σfij≤ωgi, linear programming equation in mathematics. (24)

Linear inequality equation Σfij ≤ ωhj; j=1,2,...n; in algorithmic optimization context. (25)

Mathematical optimization equation Σ_i=1^m Σ_j=1^n f_ij formula in algorithm analysis. (26)

Here, fij is the flow (amount of mass transported) from the i-th cluster of the source distribution to the j-th cluster of the target distribution. Its dimension is m × n. dij is the ground distance between gi and hj, typically defined as the Euclidean distance: Mathematical formula, calculating Euclidean distance \( d_{ij} = \| g_i - h_j \|_2 \).. Its dimension is also m × n. The first constraint ensures non-negative flows, the second and third constraints ensure that the total flow from each source cluster and into each target cluster does not exceed the available mass, and the fourth constraint ensures that the total flow equals the total mass, which is 1 for normalized distributions.

In practice, EMD is computed by solving a transportation problem using linear programming methods (e.g., the simplex algorithm) to determine the optimal flow (fij) that minimizes the total transportation cost. The resulting EMD value represents the minimum cost required to transform one distribution into another and serves as a robust similarity metric for comparing feature distributions in the proposed fault diagnosis method.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The experimental environment and platform are shown in Figure 2. From left to right, the platform consists of a fan-bearing assembly, an induction motor, and a drive unit. The central portion connects these components to a torque transducer/encoder via a coupling, and the rightmost section serves as the dynamometer. The control electronics are not depicted. EDM technology was used to simulate pitting faults on bearings, ranging from weak to severe conditions. Vibration data were collected using sensors positioned near the bearings at the motor drive end and the fan end. The dataset includes four conditions: normal state, inner race fault, rolling element fault, and outer race fault. For each condition, 200 samples were obtained. The dataset for each fault type was divided into 104 samples for training (parameter estimation) and 96 samples for testing model accuracy.

Induction motor test setup and diagnostic diagram showing motor, coupling, and dynamometer.
Figure 2: Experimental environment and platform. Experimental setup used for vibration signal acquisition, including the induction motor, coupling, torque transducer/encoder, dynamometer, and bearing assemblies at the drive and fan ends. Please click here to view a larger version of this figure.

Parameter Selection Based on Energy Distribution Analysis
Due to the low discriminability of raw time-domain vibration signals, wavelet packet energy distributions were extracted across the four operating conditions to provide a more separable feature representation. The resulting distributions are shown in Figure 3 and were used to determine the optimal sub-band for fault feature extraction. As shown in Figure 3, the energy distribution of each sub-band differs significantly across operating states. Under normal conditions, energy is primarily concentrated in the first three sub-bands, which differs from fault conditions. For fault states, energy is mainly concentrated in the third sub-band, although some differences remain among fault types. In the first sub-band, the normal state can be clearly distinguished from fault states; however, the differences among the three fault states are minimal. In the second sub-band, the energy distribution of the normal state is similar to that of the inner race fault, making differentiation difficult. Similarly, the rolling element fault and outer race fault show comparable distributions. In contrast, the third sub-band provides clear separation between the normal and fault states, as well as improved differentiation among the fault types. Based on this analysis, the third sub-band was selected as the fault feature for subsequent classification. This selection is a methodological design choice based on the discriminability analysis shown in Figure 3.

Energy spectra bar chart; bearing states; sub-band energy distribution; fault analysis results.
Figure 3: Energy spectra of four bearing states. Energy distribution of wavelet packet sub-bands for four bearing conditions (normal, inner race fault, rolling element fault, and outer race fault). The third sub-band shows the highest discriminability among fault states. Please click here to view a larger version of this figure.

Parameter Selection for Wavelet Basis Function
Daubechies and Haar wavelets are commonly used for digital signal processing. The selection criteria for the wavelet basis in this study are: (1) support length between 5 and 10 for practical engineering applications, and (2) compact support and orthogonality. The Haar wavelet has a support length of 1, which limits its applicability. Among the Daubechies (db) series, the db3 wavelet satisfies these criteria (support length of 5) and was therefore selected. This choice is a methodological decision based on established criteria rather than an experimental result. The energy values of each node in the third layer are combined to form a one-dimensional energy distribution. Traditional KNN assumes equal importance for all features, which does not reflect the characteristics of fault vibration signals. Therefore, the MIV algorithm is applied to evaluate the contribution of each feature, and feature weights are updated accordingly. Due to random initialization in the BP neural network, the computed IV values vary across training runs. Therefore, multiple calculations are performed, and the final MIV values are obtained by averaging. The resulting MIV values for the eight features are 0.44, 0.60, 0.66, 0.51, 0.46, 0.31, 0.83, and 0.46. The dataset was evaluated using different K values (K = 1, 3, 5, 7, and 9) across three methods: traditional KNN, KNN–EMD, and the proposed KNN–MIV–EMD. The classification results are presented in Tables 1–3. Across all methods, K = 3 yields the best classification performance. For traditional KNN, the highest accuracy at K = 3 is 93.46%; for KNN–EMD, it is 95.23%; and for the proposed KNN–MIV–EMD, it is 99.43%. Accuracy increases from K = 1 to K = 3 and then decreases as K increases further, indicating that moderate K values balance local relevance and generalization, while larger K values introduce noise. The proposed KNN–MIV–EMD method achieves the highest overall accuracy. At K = 3, it outperforms traditional KNN by 5.97 percentage points and KNN–EMD by 4.20 percentage points. Even at K = 1, it achieves 97.36%, which exceeds the optimal accuracies of the comparison methods. These results demonstrate that integrating feature importance (MIV) and distribution similarity (EMD) enhances diagnostic performance.

K13579
Accuracy91.53%93.46%88.71%88.71%88.70%

Table 1: Classification accuracy of the traditional k-nearest neighbor (KNN) algorithm under different K values. Classification accuracy (%) of the traditional KNN algorithm evaluated at K = 1, 3, 5, 7, and 9.

K13579
Accuracy92.31%95.23%90.36%88.31%85.76%

Table 2: Classification accuracy of the KNN–EMD algorithm under different K values. Classification accuracy (%) of the KNN–EMD algorithm evaluated at K = 1, 3, 5, 7, and 9.

K13579
Accuracy97.36%99.43%95.25%93.45%90.56%

Table 3: Classification accuracy of the proposed KNN–MIV–EMD algorithm under different K values. Classification accuracy (%) of the proposed KNN–MIV–EMD algorithm evaluated at K = 1, 3, 5, 7, and 9.

From a methodological perspective, the proposed model integrates three key components: wavelet packet energy decomposition for feature extraction, MIV for feature weighting, and EMD for distribution-based similarity measurement. This combination improves feature discriminability and similarity evaluation, resulting in an overall accuracy close to 99.5% for multistate bearing classification. To further illustrate the relationships among the three methods, Figure 4 presents the classification accuracy curves across different K values. As shown in Figure 4, all methods exhibit a consistent trend: accuracy increases with K, reaches an optimum, and then decreases. The proposed KNN–MIV–EMD method consistently outperforms the other methods across nearly all K values, with the most significant advantage observed at K = 3, where it achieves 99.43% accuracy. This demonstrates the effectiveness of combining feature importance weighting with distribution-based similarity measurement.

Classification accuracy curve graph; KNN, KNN-EMD, KNN-MIV-EMD analysis; classification method.
Figure 4: Classification accuracy curves of three algorithms. Comparison of classification accuracy for KNN, KNN–EMD, and KNN–MIV–EMD methods across different values of K. The proposed method achieves the highest accuracy, with optimal performance at K = 3. Please click here to view a larger version of this figure.

Comprehensive Efficacy Evaluation
Repeated runs (mean ± standard deviation):
To account for randomness in dataset partitioning (10 random splits) and MIV computation, the mean accuracy and standard deviation (SD) are reported for each method at K = 3 (Table 4). The low standard deviations indicate stable and reproducible results. The proposed KNN–MIV–EMD method achieves both the highest mean accuracy and the lowest variability, demonstrating robustness to random partitioning and MIV computation.

MethodMean Accuracy (%)Standard Deviation (%)Min–Max (%)
Traditional KNN93.460.5292.8–94.1
KNN–EMD95.230.4894.6–95.8
KNN–MIV–EMD (Proposed)99.430.3198.70–99.85

Table 4: Classification accuracy (mean ± standard deviation) over 10 repeated runs at K = 3. Mean accuracy, standard deviation (SD), and minimum–maximum accuracy (%) of the evaluated methods based on 10 independent runs.

Confusion matrix:
Table 5 presents the confusion matrix for the proposed KNN–MIV–EMD method at K = 3. To provide a conservative estimate, results from the run with the lowest accuracy (98.70%) are shown. In this representative run, the overall accuracy is 98.70% (379/384). The class-wise F1-scores are: Normal 100.00%, Inner Race 98.96%, Rolling Element 97.92%, and Outer Race 97.92%, with a macro-average F1-score of 98.70%.

Actual\PredictedNormalInner RaceRolling ElementOuter Race
Normal96000
Inner Race09510
Rolling Element00942
Outer Race01194

Table 5: Confusion matrix of the proposed KNN–MIV–EMD method at K = 3. Confusion matrix for the proposed method based on a representative run with the lowest accuracy (98.70%), showing classification results across four bearing conditions.

DATA AVAILABILITY STATEMENT:
The raw vibration data used in this study are obtained from the CWRU Bearing Dataset, which is publicly available at the official repository: https://engineering.case.edu/bearingdatacenter/download-data-file. All data are publicly accessible. The authors confirm that no permission or license is required for academic use of this dataset.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The present study proposes a KNN-based fault diagnosis method for rolling bearings that integrates distribution differences through EMD and feature importance through MIV. Experimental results on a rolling bearing dataset containing four health states (normal, inner race fault, rolling element fault, and outer race fault) demonstrate the effectiveness of the proposed approach. The proposed KNN–MIV–EMD method achieves the highest classification accuracy of 99.43% at K = 3, outperforming traditional KNN (93.46%) and KNN–EMD (95.23%). Across all three methods, K = 3 consistently yields the best performance, confirming that a moderate number of nearest neighbors balances local sensitivity and noise robustness for this fault diagnosis task. These results indicate that improving similarity measurement and feature weighting within the KNN framework can significantly enhance diagnostic accuracy. In this context, the proposed method contributes to the advancement of rolling bearing fault diagnosis by addressing key limitations of traditional KNN approaches, particularly the assumptions of equal feature importance and reliance on Euclidean distance for similarity measurement15,16,17,18,19,20,21,22.

Three critical steps in the proposed workflow contribute significantly to its diagnostic performance. First, wavelet packet decomposition with the db3 wavelet basis enables effective extraction of energy distribution features from non-stationary vibration signals. Unlike raw time-domain waveforms, which exhibit low discriminability across bearing health states, the energy distribution across frequency sub-bands provides a more separable feature representation. The selection of the third sub-band, which showed the largest energy distribution differences among the four bearing states, was based on discriminability analysis rather than arbitrary choice, ensuring that the selected feature captures the most relevant fault information9. Second, the MIV algorithm quantifies the relative importance of each feature, allowing the KNN classifier to adaptively weight features according to their contribution to fault discrimination. This directly addresses a fundamental limitation of traditional KNN, which treats all features equally and can be misled by irrelevant or noisy features11. Third, the EMD-based distribution similarity measurement replaces the conventional Euclidean distance to capture underlying structural differences between feature distributions regardless of overlap or positional displacement. This property is particularly valuable when distribution shifts occur under varying operating conditions, where traditional distance metrics may fail to reflect true similarity18,19,20,21,22. The synergistic integration of these three components—wavelet packet energy features, MIV-based feature weighting, and EMD-based distribution metrics—is therefore the key to the superior performance observed in this study, as it simultaneously improves feature representation and similarity evaluation.

Beyond the comparison with traditional KNN and KNN–EMD presented in the Results section, it is important to position the proposed method relative to other commonly used rolling bearing fault diagnosis approaches. Support vector machines with radial basis function kernels have been widely used for bearing fault classification and can achieve good performance with limited samples; however, they require careful kernel selection and hyperparameter tuning and do not naturally provide interpretable similarity measures4,11. Random forests offer robustness to feature scaling and provide feature importance estimates, but they may struggle with highly imbalanced datasets and do not explicitly incorporate distribution similarity information. Deep learning methods, such as convolutional neural networks and autoencoders, have achieved state-of-the-art performance on large-scale bearing datasets by automatically learning hierarchical features, but they require substantial labeled data, significant computational resources, and are often treated as black-box models10. In contrast, the proposed KNN–MIV–EMD method occupies a favorable middle ground: it retains interpretability, allows flexible design of similarity metrics, and does not require extensive parameter tuning. A direct three-way comparison among KNN-based methods further illustrates these advantages. Traditional KNN uses Euclidean distance with equal feature weights, suffering from two main limitations: inability to differentiate feature importance and sensitivity to distribution shifts. At K = 3, it achieves 93.46% accuracy. KNN–EMD replaces Euclidean distance with EMD, capturing distribution differences but still treating all features equally, achieving 95.23% accuracy, representing a modest improvement. The proposed KNN–MIV–EMD method integrates both MIV-based feature weighting and EMD-based distribution similarity, addressing both limitations simultaneously and achieving 99.43% accuracy at K = 3. The combination of MIV and EMD yields a substantially larger improvement than EMD alone, confirming that these two components address complementary aspects of the classification problem and jointly contribute to improved diagnostic performance.

Several limitations of this study should be acknowledged. First, dataset constraints: the proposed method was validated on a single dataset collected under controlled laboratory conditions, with 800 samples across four fault types. This relatively small and clean dataset does not fully represent the complexity of real industrial environments, where data may be noisy, imbalanced, or partially labeled. Second, parameter sensitivity: the method’s performance depends on several key parameters, including the wavelet basis function (db3), the selected sub-band (third sub-band), the number of nearest neighbors (K = 3), and the balancing parameter λ (set to 0.5). While these values were selected based on discriminability analysis and cross-validation, the optimal settings may vary with different datasets or operating conditions. Third, dependence on feature extraction choices: the method relies on wavelet packet energy features, and alternative feature extraction techniques (e.g., statistical features or entropy-based features) may yield different performance outcomes. Fourth, computational cost: as noted, the EMD calculation involves solving a linear programming problem, which is more computationally intensive than Euclidean distance and may limit its applicability in real-time or resource-constrained environments. Fifth, distribution assumption: the method assumes that training and testing data share similar distribution characteristics. In scenarios with continuous domain shifts, such as gradually changing load or speed, performance may degrade without incorporating domain adaptation techniques. Sixth, validation scope: further validation on additional public datasets (e.g., CWRU, Paderborn, and XJTU-SY) and under varying operating conditions (different loads, speeds, and noise levels) is necessary to fully assess the generalizability of the proposed approach.

From a practical perspective, the proposed method provides several avenues for adaptation and troubleshooting in real-world applications. If classification performance declines under different K values, the optimal K should be re-determined using cross-validation on the training set, as it may vary depending on dataset size and complexity. If performance is unsatisfactory with the selected wavelet settings (db3 and the third sub-band), alternative wavelet bases (e.g., db4, db5, and sym4) or different decomposition levels should be explored, with sub-band selection guided by discriminability analysis similar to that presented in Figure 3. When applying the method to datasets with different feature dimensions, the MIV weights should be recalculated, and the EMD formulation should be adjusted accordingly. If computational cost is a concern, approximate EMD methods, such as the Sinkhorn algorithm or wavelet-based approximations, may be used to reduce complexity at the expense of slight accuracy loss. If training and testing distributions differ significantly, domain adaptation techniques such as transfer learning or domain-adversarial training may be integrated into the framework. For highly imbalanced datasets, resampling strategies, including oversampling of minority classes or undersampling of majority classes, should be considered prior to applying the method. These practical considerations provide a roadmap for extending the proposed method to diverse application scenarios and addressing potential performance issues.

Several directions for future research are worth pursuing. First, broader validation on additional benchmark datasets (such as CWRU, Paderborn, and XJTU-SY) and under real industrial conditions with varying loads, speeds, and noise levels is necessary to evaluate generalizability. Second, adaptive parameter selection mechanisms could be developed to automatically determine the optimal wavelet basis, sub-band, K value, and balancing parameter λ, thereby reducing manual tuning. Third, alternative distribution similarity measures, such as the Sinkhorn distance, Wasserstein distance, or maximum mean discrepancy, could be explored to identify more efficient or robust metrics. Fourth, integration with deep learning approaches, such as autoencoders for feature extraction or Siamese networks for metric learning, may further enhance performance, particularly on large-scale datasets. Fifth, domain adaptation techniques could be incorporated to improve robustness when training and testing data originate from different distributions. Finally, real-time implementation on embedded systems or edge devices should be investigated to facilitate deployment in industrial condition monitoring systems, with particular attention to computational efficiency and latency constraints. Overall, the results of this study demonstrate that the proposed KNN–MIV–EMD method achieves high classification accuracy (99.43%) on the tested dataset, indicating its strong potential for rolling bearing fault diagnosis. However, these results were obtained under controlled experimental conditions with a relatively small dataset. Real industrial environments introduce additional challenges, including sensor noise, missing data, varying operating conditions, and class imbalance. Therefore, while the method shows clear promise, further validation on real industrial data and integration with robust preprocessing and domain adaptation techniques are necessary before practical deployment can be considered. The proposed method should thus be regarded as a robust and interpretable framework for further development rather than a fully production-ready solution at this stage.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors declare no conflicts of interest.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This work was financially supported by the Key Scientific Research Projects of Colleges and Universities in Henan Province (25A580011) and the Scientific and Technological Research Project in Henan Province (262102210057).

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
Acquisition System
Data Recorder (16-channel)Anti-aliasing filter, 24-bit ADC; dynamic range >90 dB; Case Western Reserve University, Cleveland, OH, USA
Fault Type and Size (inch): All fault types
Load (HP): All
Sampling Frequency: 12 kHz or 48 kHz
Purpose in Study: High-quality data acquisition
File Naming Convention: N/A
Data Subset
Data Selection for This StudyDrive-end data, 12 kHz sampling, 4 load conditions; CWRU Bearing Data Center (https://engineering.case.edu/bearingdatacenter/download-data-file)
Fault Type and Size (inch): 7 conditions × 4 loads = 28 subsets
Load (HP): All
Sampling Frequency: 12 kHz
Purpose in Study: Model training and testing
File Naming Convention: Custom selection
Label Information
Fault Class LabelsOne-hot encoding format
Fault Type and Size (inch): [1,0,0,0,0,0,0] for Normal …
Load (HP): All
Sampling Frequency: N/A
Purpose in Study: Supervised learning labels
File Naming Convention: Label_vector.mat
Sensor
Accelerometer (Drive End)ICP accelerometer; position: 12 o’clock; sensitivity ~500 mV/g; Drive end bearing housing
Fault Type and Size (inch): All fault types
Load (HP): All
Sampling Frequency: 12 kHz or 48 kHz
Purpose in Study: Primary vibration signal acquisition
File Naming Convention: DE_time_series
Accelerometer (Fan End)ICP accelerometer; sensitivity ~500 mV/g; Fan end bearing housing
Fault Type and Size (inch): All fault types
Load (HP): All
Sampling Frequency: 12 kHz or 48 kHz
Purpose in Study: Auxiliary/comparison signal
File Naming Convention: FE_time_series
Test Bearing
Ball Bearing (Healthy)Deep groove ball bearing (6205 type); SKF (commonly used in CWRU setup)
Fault Type and Size (inch): Normal
Load (HP): 0, 1, 2, 3
Sampling Frequency: 12 kHz or 48 kHz
Purpose in Study: Baseline condition
File Naming Convention: Normal_0.mat
Ball Bearing (Inner Race Fault)Single-point fault via EDM
Fault Type and Size (inch): Inner Race (IR) @ 0.007", 0.014", 0.021", 0.028"
Load (HP): 0, 1, 2, 3
Sampling Frequency: 12 kHz or 48 kHz
Purpose in Study: Inner race fault validation
File Naming Convention: IR007_1.mat
Ball Bearing (Outer Race Fault)Single-point fault via EDM (6 o’clock position)
Fault Type and Size (inch): Outer Race (OR) @ 0.007", 0.014", 0.021", 0.028"
Load (HP): 0, 1, 2, 3
Sampling Frequency: 12 kHz or 48 kHz
Purpose in Study: Outer race fault validation
File Naming Convention: OR021_2.mat
Ball Bearing (Ball Fault)Single-point fault via EDM
Fault Type and Size (inch): Ball (B) @ 0.007", 0.014", 0.021", 0.028"
Load (HP): 0, 1, 2, 3
Sampling Frequency: 12 kHz or 48 kHz
Purpose in Study: Ball element fault validation
File Naming Convention: B014_3.mat
Test Rig
Machinery Fault SimulatorMotor-driven system with adjustable load; accelerometer mounted on bearing housing; Case Western Reserve University, Cleveland, OH, USA
Fault Type and Size (inch): N/A
Load (HP): 0, 1, 2, 3
Sampling Frequency: 12 kHz or 48 kHz
Purpose in Study: Source of fault dataset
File Naming Convention: N/A

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Rolling Bearing FaultFault DiagnosisK Nearest NeighborDistribution DiscrepancyFeature ImportanceVibration SignalWavelet Packet DecompositionEarth Mover s DistanceMean Impact ValueEuclidean Distance

Related Articles