RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
A subscription to JoVE is required to view this content. Sign in or start your free trial.
Research Article
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
Here, we present a hybrid GARCH-BiLSTM-KAN model for forecasting crude oil prices. The model integrates volatility estimation, bidirectional temporal learning, and nonlinear refinement to enhance prediction accuracy, offering a robust tool for energy market participants.
Crude oil prices, as a cornerstone of global energy markets, exhibit intricate dynamics---including volatility clustering, asymmetric temporal dependencies, and nonlinear responses to geopolitical, economic, and supply-demand shocks---posing formidable challenges to accurate forecasting. Existing models often struggle to simultaneously capture these multifaceted characteristics, limiting their predictive robustness. To address this, this study proposes a novel hybrid framework which synergistically integrates three complementary components: (1) the Generalized Autoregressive Conditional Heteroskedasticity model to quantify time-varying volatility and address clustering effects; (2) the Bidirectional Long Short-Term Memory network to model bidirectional temporal relationships, capturing both historical and future contextual influences on price movements; and (3) the Kolmogorov-Arnold Network to refine nonlinear patterns through univariate basis functions, enhancing the mapping of complex high-dimensional dependencies beyond the capabilities of traditional neural networks. Empirical validation is conducted using 39 years of daily West Texas Intermediate crude oil prices (1986-2025), a dataset encompassing critical events such as the 2008 financial crisis, 2020 COVID-19 pandemic, and 2022 geopolitical tensions, ensuring robustness across diverse market conditions. The proposed model is rigorously compared against benchmark models, including traditional volatility models, standalone deep learning architectures, and other hybrid models. Results demonstrate that the proposed hybrid achieves superior performance with the lowest root mean squared error, mean absolute error, and the highest coefficient of determination. Statistical tests confirm the significance of its outperformance, highlighting the synergistic value of integrating volatility modeling, bidirectional sequence learning, and advanced nonlinear refinement. This research advances energy economics by providing a robust forecasting tool, with implications for policymakers in strategic energy planning, energy firms in risk hedging, and financial institutions in derivative pricing and portfolio optimization.
Energy commodities, particularly crude oil, serve as fundamental drivers of global economic systems, influencing production costs, inflation rates, and international trade balances. The complex pricing dynamics of crude oil reflect intricate interactions among geological constraints, technological innovations in extraction, geopolitical tensions, and macroeconomic policies. Since the oil crises of the 1970s, understanding and forecasting oil price movements has emerged as a critical research domain intersecting energy economics, financial engineering, and computational intelligence1,2. The inherent volatility of oil markets, compounded by periodic supply disruptions and demand shocks, creates substantial challenges for diverse stakeholders, including national governments, energy corporations, and financial institutions.
Crude oil, as a pivotal strategic resource, plays an indispensable role in global energy security, economic stability, and industrial development3. Its price fluctuations exert far-reaching impacts on key macroeconomic indicators, including inflation, employment, and trade balances4, while simultaneously influencing micro-level decision-making across industries ranging from transportation to manufacturing5. However, crude oil prices exhibit complex dynamic characteristics including volatility clustering (periods of high volatility followed by high volatility), nonlinear temporal dependencies, and heightened sensitivity to geopolitical events, economic policies, and supply-demand shocks6,7. These multifaceted characteristics pose significant challenges to accurate forecasting, establishing it as a long-standing focus in energy economics and financial research.
Accurate crude oil price forecasting is essential for multiple stakeholders with distinct operational requirements. For governments and policymakers, it informs critical decisions regarding energy policy formulation, strategic reserves management, and inflation control mechanisms8. For energy companies, it supports strategic investment decisions, risk hedging strategies, and production planning optimization5. For financial market participants, it guides the precise pricing of oil-related derivatives and enhances portfolio optimization techniques9. Conversely, inaccurate forecasts may precipitate market distortions, encourage excessive speculation, or lead to suboptimal policy responses8. Consequently, developing robust forecasting models remains an urgent research priority with significant practical implications.
Crude oil prices are influenced by a complex interplay of multidimensional factors, including: (1) supply-demand fundamentals such as OPEC production quotas, shale oil extraction costs, and global energy transition trends10; (2) financialization effects through speculative activities in commodity markets that amplify price volatility11; and (3) geopolitical and macroeconomic shocks including conflicts in oil-producing regions, economic recessions, and abrupt policy shifts12. These diverse factors generate price dynamics that systematically violate the fundamental assumptions of traditional linear models, particularly homoskedasticity and stationarity13,14. Volatility clustering, for instance, implies that past price fluctuations contain valuable information about future volatility-a phenomenon first formalized by Engle (1982)15 through the Autoregressive Conditional Heteroskedasticity (ARCH) model. Additionally, nonlinear relationships among fundamental drivers and bidirectional temporal dependencies further complicate the forecasting process6, with extreme events such as the 2020 negative WTI prices demonstrating severe nonlinear anomalies that challenge conventional modeling approaches16.
Despite substantial progress in the field, three critical research gaps have yet to be adequately addressed in the existing literature. First, there remains insufficient integration between volatility modeling and sequence learning. Current hybrid approaches often treat volatility estimation and temporal learning as separate components, failing to encode time-varying risk measures as structured inputs into sequential networks. This disjuncture limits the ability of models to adapt temporal representations to dynamically changing market risk conditions. Second, bidirectional dependencies in temporal modeling are largely overlooked. Most LSTM-based hybrid models rely on unidirectional architectures, which cannot capture how future expectations and forward-looking information retroactively influence current price formation-a crucial mechanism in expectation-driven commodity markets. Finally, deep architectures exhibit limited nonlinear refinement capabilities. Conventional activation functions in neural networks struggle to approximate discontinuous market behaviors or account for extreme anomalies, such as the 2020 negative WTI prices triggered by storage capacity constraints17,18.
This study makes three key contributions: methodologically, it introduces a novel hybrid framework that synergistically integrates GARCH-based volatility modeling, bidirectional temporal learning via BiLSTM, and advanced nonlinear refinement with Kolmogorov-Arnold Networks (KAN) to capture the complex characteristics of crude oil prices; empirically, it constitutes a significant advancement by demonstrating, through rigorous evaluation on 39 years of daily WTI prices spanning diverse market regimes, superior forecasting performance against a wide range of benchmarks, with the robustness of these improvements confirmed by statistical significance tests; and practically, it offers substantial utility by providing a robust tool for energy market participants, with direct applications in risk management for energy firms, strategic planning for policymakers, and derivative pricing for financial institutions.
This study is guided by three central research questions. It first investigates how volatility modeling can be effectively integrated with deep learning architectures to enhance the accuracy of crude oil price forecasts. Furthermore, it examines the extent to which bidirectional temporal learning captures asymmetric dependencies in oil price dynamics compared to conventional unidirectional approaches. Finally, the study assesses whether the Kolmogorov-Arnold Network (KAN) provides superior nonlinear refinement over traditional activation functions within hybrid forecasting models.
The evolution of crude oil price forecasting methodologies reflects broader trends in time series analysis and financial econometrics. Early approaches predominantly relied on linear statistical models, with Autoregressive Integrated Moving Average (ARIMA) models and their variants forming the foundational framework19,20. While effective for capturing linear temporal dependencies, these models proved inadequate for addressing the heteroskedasticity and volatility clustering characteristic of financial time series.
To model time-varying volatility, Engle (1982)15 introduced the Autoregressive Conditional Heteroskedasticity (ARCH) model, where conditional variance is modeled as a function of past squared residuals. Bollerslev (1986)21extended this through the Generalized ARCH (GARCH) model, allowing conditional variance to depend on both past squared residuals and past conditional variances. The GARCH(1,1) variant, with its parsimonious specification, has become a workhorse for volatility modeling in finance22. Further developments included exponential GARCH (EGARCH) to capture leverage effects23 and fractionally integrated GARCH for long memory processes24. However, these models struggle to capture complex nonlinear relationships and long-range dependencies25.
With advances in computational power and algorithmic sophistication, machine learning models emerged as powerful alternatives for capturing nonlinear patterns in financial data26,27. Support Vector Regression (SVR)28 and Random Forests demonstrated improved performance over linear models for certain forecasting horizons29,30,31.
The interplay between energy markets, environmental policy, and forecasting methodologies constitutes a critical area of research. The real-world impact of energy dynamics is profound, as evidenced by studies quantifying the household welfare loss stemming from energy price crises, highlighting the socio-economic urgency of accurate energy market analysis31. Within the policy realm, the dynamic relationship between carbon trading systems and financial markets, such as the low-carbon stock market, further reveals the intricate connectivity between regulatory mechanisms and economic performance32. To enhance the operational efficiency and integration of renewable energy, advanced forecasting techniques are paramount. Recent advancements include adaptive spatiotemporal graph models for predicting wind power generation at the farm-cluster level, which effectively capture complex spatial and temporal dependencies33. Concurrently, the drive for model transparency in this field has led to the application of interpretable AI techniques, such as the LIME algorithm, to demystify 'black-box' wind power forecasts and build trust in their outputs34. Collectively, these studies underscore a multidisciplinary approach that combines economic analysis, market linkage research, and cutting-edge, explainable forecasting models.
Deep learning models, particularly Recurrent Neural Networks (RNNs) and their variants, have shown remarkable success in sequential data modeling. Long Short-Term Memory (LSTM) networks, with their gated mechanisms addressing the vanishing gradient problem, have demonstrated exceptional performance in oil price forecasting35,36. Recent innovations include multi-headed variational neighbour search-tuned RNNs for gasoline and crude oil prediction, decomposition-aided LSTM frameworks with SHAPley value explanation for bitcoin forecasting, and optimization-enhanced LSTM variants tuned by improved seagull optimization, salp swarm algorithms with disputation operators, and enhanced Harris hawks optimization for crude oil price forecasting.
Bidirectional LSTM (BiLSTM) architectures further enhance modeling capability by processing sequences in both temporal directions, capturing how both historical patterns and future expectations influence current price formation37. This bidirectional modeling proves particularly valuable in commodity markets where forward-looking information significantly impacts spot prices.
Recognizing the complementary strengths of statistical and machine learning approaches, researchers have developed hybrid models that integrate multiple methodologies38. GARCH-LSTM combinations represent one prominent strand, where GARCH-derived volatility estimates enrich the feature space for LSTM temporal learning39. Similarly, CNN-LSTM models use convolutional neural networks to extract local features from time series before processing with LSTM40. These hybrids typically outperform standalone models but often retain limitations in capturing bidirectional dependencies and refining complex nonlinearities.
Recent studies have explored various sophisticated hybrid approaches. The application of hybrid forecasting models to capture the complex linear and nonlinear characteristics of crude oil prices has become a prominent research direction. For instance, one study demonstrated the efficacy of combining predictions from multiple individual time series models to construct a hybrid framework for forecasting day-ahead Brent crude oil prices41. Advancing this approach, another study proposed a novel hybrid technique that integrates the linear ARIMA model with the nonlinear Long Short-Term Memory (LSTM) network, aiming to simultaneously capture intrinsic price patterns and dynamic nonlinear fluctuations42. These works collectively underscore the superior performance of hybrid strategies and establish a solid foundation for subsequent model development.
The recent introduction of Kolmogorov-Arnold Networks (KAN) by Liu et al. (2024)43 represents a paradigm shift in neural network design, replacing fixed activation functions with learnable spline-based basis functions. Drawing on the Kolmogorov-Arnold representation theorem, KANs decompose high-dimensional functions into combinations of univariate functions, offering superior accuracy and interpretability compared to traditional Multi-Layer Perceptrons (MLPs).
While KANs have shown promising results in scientific computing and physics-informed machine learning, their application in financial time series forecasting remains nascent. Preliminary applications demonstrate KAN's potential in capturing complex nonlinear patterns that elude conventional architectures. Compared to recent hybrid modeling approaches, KAN-based frameworks offer distinct advantages in interpretability through visualizable basis functions and adaptive refinement of nonlinear mappings.
The integration of KAN specifically addresses the limitation of traditional activation functions (e.g., ReLU, tanh) in approximating discontinuous jumps and extreme market anomalies, such as the 2020 negative WTI prices. By employing cubic B-spline basis functions, KAN enables fine-grained refinement of temporal features, capturing residual nonlinearities that significantly impact forecasting accuracy during market turbulence.
Our review identifies three critical gaps that motivate the current study. First, existing hybrids insufficiently integrate volatility modeling with sequence learning, treating them as separate rather than complementary components. Second, bidirectional temporal dependencies remain underexplored despite their importance in expectation-driven commodity markets. Third, nonlinear refinement capabilities are limited by traditional activation functions, particularly during extreme market events44. The GARCH-BiLSTM-KAN framework proposed in this study systematically addresses these gaps through synergistic integration of complementary modeling paradigms.
This section elaborates on the methodological framework of the proposed GARCH-BiLSTM-KAN hybrid model, which integrates the strengths of GARCH for volatility modeling, BiLSTM for bidirectional temporal dependency capture, and KAN for nonlinear refinement. The architecture is designed to address the multifaceted characteristics of financial time series, including volatility clustering, temporal asymmetry, and complex nonlinear patterns. Figure 1 shows the overall framework flowchart of the GARCH-BiLSTM-KAN hybrid model.
GARCH for volatility estimation
The Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model, proposed by Bollerslev (1986)21, serves as the foundational component for volatility estimation in this hybrid framework. Financial time series, such as stock returns, exhibit volatility clustering-periods of high volatility followed by high volatility and low volatility followed by low volatility-which violates the homoskedasticity assumption of traditional linear models. GARCH models capture this phenomenon by modeling the conditional variance as a function of past squared residuals and past conditional variances.
Mathematical formalization
For a given time series of returns rt, the GARCH (p, q) specification consists of two equations: the mean equation and the variance equation. The mean equation is defined as:

where μ is the constant mean return, and
is the error term at time t, which follows a conditional normal distribution
with Ft-1 representing the information set up to time t - 1.
The conditional variance
, which measures the volatility, is specified in the GARCH (p, q) variance equation as:
where ω > 0 is the constant term, αi ≥ 0 are the ARCH coefficients capturing the impact of past squared residuals (news about volatility from the recent past), and βj ≥ 0 are the GARCH coefficients representing the persistence of volatility. To ensure the positivity and stationarity of the conditional variance, the parameters must satisfy ω > 0, αi ≥ 0, βj ≥ 0, and
.
In this hybrid model, the GARCH (1,1) variant was employed, which is parsimonious and widely documented to effectively capture volatility dynamics in financial data21,22.The estimated conditional volatility
from GARCH (1,1) serves as a critical input to the subsequent BiLSTM and KAN components, providing a structured measure of historical volatility that complements the raw return series.
BiLSTM for bidirectional temporal learning
While GARCH models excel at volatility estimation, they are limited in capturing complex temporal dependencies, especially those involving long-range interactions and bidirectional relationships. To address this, a Bidirectional Long Short-Term Memory (BiLSTM) network is integrated, an extension of the LSTM architecture45, which is designed to model sequential data by preserving information from both past and future time steps.
Architecture specification
LSTM networks overcome the vanishing gradient problem of traditional Recurrent Neural Networks (RNNs) through a gated cell structure, enabling them to learn long-term dependencies. Each LSTM cell contains three key gates: the forget gate, input gate, and output gate, which regulate the flow of information into and out of the cell state Ct.
The mathematical formulation of each gate is defined as:
Forget Gate: Determines which information to discard from the cell state:

Input Gate: Controls the update of the cell state with new information:


Cell State Update: Combines the forget gate output and input gate output

Output Gate: Determines the hidden state based on the cell state:


where σ is the sigmoid activation function,
denotes element-wise multiplication, xt is the input at time t, h{t-1} is the hidden state from the previous time step, W and b and are weight matrices and bias vectors, respectively.
Bidirectional processing:
The Bidirectional Long Short-Term Memory Network (BiLSTM) architecture consists of two parallel LSTM networks: a forward LSTM that processes the sequence from past to future (t = 1 to t = T) and a backward LSTM that processes it from future to past (t = T to t = 1). The hidden states from both directions are concatenated at each time step to form the final output, capturing both historical and future contextual information:



In this model, the BiLSTM is configured with 32 hidden units in each direction (64 after concatenation) and 2 layers, employing a dropout rate of 0.2 between layers to prevent overfitting. The network processes sequences of 19-time steps derived from the 20-day lookback window. The BiLSTM is configured with 32 hidden units in each direction (64 after concatenation) and 2 layers, trained using the Adam optimizer with a learning rate of 0.01. This allows the BiLSTM to learn temporal patterns in both returns and volatility, with the bidirectional design enabling it to capture asymmetric dependencies. The output of the BiLSTM,
, is a high-dimensional representation of bidirectional temporal features, which is fed into the KAN layer for further refinement. Figure 2 shows the schematic diagram of the hidden state fusion process in BiLSTM.
KAN for nonlinear refinement
Despite the strength of BiLSTM in modeling temporal dynamics, financial time series often exhibit highly nonlinear relationships that are challenging to capture with standard neural network architectures. To address this, a Kolmogorov-Arnold Network (KAN)43, a novel neural network paradigm that leverages the Kolmogorov-Arnold representation theorem to model complex nonlinear functions through a combination of univariate functions and linear operations is incorporated.
Mathematical foundation
KANs differ from traditional feedforward neural networks by replacing the linear transformations followed by activation functions in each layer with a set of univariate basis functions applied to individual input dimensions, followed by a linear combination. The Kolmogorov-Arnold representation theorem states that any multivariate continuous function can be represented as a composition of univariate functions:

Implementation specification:
In this specific implementation, a KAN layer transforms an input vector = [z1,z2,...,zd ] into an output vector y = [y1,y2,...,ym ]. The transformation for each output neuron k is defined by:

where Φ{k,i}: R
R are the learnable univariate basis functions. This study implements these functions Φ(x) as cubic B-splines for their smoothness and expressive power, as defined in

where Bj; are the third-order B-spline basis functions, cj are the trainable spline coefficients, and G is the number of basis functions, determined by the grid size and spline order.
Within the forecasting framework, the KAN layer receives the final hidden state
from the preceding Bidirectional LSTM layer as its input. The configured KAN has an architecture of [64, 1], meaning it accepts the 64-dimensional BiLSTM output and produces a single scalar value as the final forecast. The spline functions Φk,i are defined using a grid of 5 equidistant knots. To mitigate overfitting, an L1 regularization term with a coefficient of λ = 0.001 is applied to the spline coefficients during training. Furthermore, the spline grid is updated every 100 optimization steps to adaptively refine the function approximations based on the evolving loss landscape.
In this framework, the KAN receives the output of the BiLSTM layer,
, as input. The KAN has a width structure of [64,1] and uses cubic B-spline basis functions with 5 knots for Φk,i, chosen for their flexibility in approximating smooth nonlinear functions and interpretability43. The output of the KAN is denoted by
, represents the refined prediction after accounting for both temporal dependencies and nonlinear patterns. Figure 3 shows the schematic diagram of the KAN (Kolmogorov-Arnold Network) model structure.
Hybrid model integration
The GARCH-BiLSTM-KAN hybrid model is integrated in a sequential manner, where each component's output serves as input to the next, culminating in a final prediction. Table 1 summarizes all critical parameters for model replication. The parameters include GARCH order, BiLSTM architecture details, KAN configuration, and training hyperparameters. All parameter values reported in the text correspond exactly to those implemented in the code.
All model parameters are consistent across mathematical formulation, textual description, and code implementation: This ensures complete reproducibility and alignment between theoretical formulation and practical implementation. Table 1 summarizes the key parameters of each component.
GARCH preprocessing: The raw return series rt is first input to the GARCH(1,1) model to estimate the conditional volatility
. This step transforms the univariate return series into a bivariate sequence
, enriching the input with volatility information.
BiLSTM feature extraction: The bivariate sequence is fed into the BiLSTM network, which processes it in both forward and backward directions to generate bidirectional temporal features
. The network uses a lookback window of 20 to construct input sequences, with 32 hidden units per direction and 2 layers, trained via the Adam optimizer.
KAN refinement: The BiLSTM features
are input to the KAN, which applies cubic B-spline basis functions (with 5 knots) to each feature dimension, followed by a linear combination to produce the final prediction yt. The KAN has a width structure of [64,1] and uses the same Adam optimizer.
The sequential integration is theoretically motivated by the 'hierarchical feature refinement' principle: GARCH first decomposes raw returns into predictable volatility patterns, reducing noise for subsequent layers. BiLSTM then encodes bidirectional dependencies-forward LSTMs capture supply-driven trends, while backward LSTMs model demand expectations. Finally, KAN refines these temporal features using cubic B-splines, which outperform ReLU in approximating non-smooth functions. This pipeline ensures that volatility, temporal asymmetry, and nonlinearity are modeled as interdependent rather than isolated phenomena.
Training and loss function: The entire model is trained end-to-end for 100 epochs using a mean squared error (MSE) loss function, defined as:

where yt is the true value and
is the model's prediction. Three evaluation metrics are used to assess performance:
Root Mean squared Error (RMSE):

Mean Absolute Error (MAE):

Coefficient of Determination (R2):

The model is trained on 80% of the data (7877 samples) and tested on the remaining 20% (1969 samples), with price data normalized using MinMaxScaler (feature range = (-1, 1)) to facilitate training.
The integration of GARCH, BiLSTM, and KAN is motivated by their complementary strengths: GARCH provides a statistically grounded measure of volatility, BiLSTM captures bidirectional temporal dependencies in both returns and volatility, and KAN refines these features by modeling residual nonlinearities. This hierarchical approach ensures that the model leverages both parametric (GARCH) and nonparametric (BiLSTM, KAN) techniques, making it robust to the diverse characteristics of financial time series.
Experimental procedure overview
The experimental procedure was meticulously designed as a systematic pipeline to ensure the reproducibility and robustness of the findings. The process commenced with data acquisition, where daily West Texas Intermediate (WTI) crude oil price data spanning from 1986 to 2025 was sourced from the U.S. Energy Information Administration (EIA) database. Subsequently, the raw price data underwent a comprehensive preprocessing stage. To achieve stationarity, the raw prices were transformed into logarithmic returns. These return series were then normalized to the range of [-1, 1] using Min-Max scaling to facilitate stable and efficient model training.
Following preprocessing, the dataset was split into training and testing sets following a temporal order, with no shuffling, to preserve the chronological structure of the time series. To prevent any data leakage, all volatility modeling was strictly confined to the training data.
Specifically, for the initial model training and evaluation with the static 80-20 split, the GARCH(1,1) model was fitted exclusively on the training set (data from 1986 to 2016). The estimated parameters from this training period were then used to generate the conditional volatility series for both the training and test sets. This ensures that the volatility inputs for the BiLSTM and KAN components during testing are based solely on information available up to the time of prediction, without incorporating any future data from the test set.
Furthermore, for the rolling window validation introduced in the results section, this principle was rigorously upheld. For each rolling window, the GARCH(1,1) model was re-estimated from scratch using only the data within that specific training window. The resulting volatility estimates were then used as inputs for training the subsequent BiLSTM-KAN network and for forecasting the corresponding test window. This recursive re-estimation mimics a real-world forecasting scenario and guarantees that no future information is leaked at any step of the evaluation process.
For model input, a feature engineering step was conducted. This involved aligning the derived volatility series with the price returns to form bivariate input sequences, which were structured using a 20-day lookback window to capture temporal dependencies. During the model training phase, all neural network architectures were trained for 100 epochs. The optimization was performed using the Adam optimizer with a learning rate of 0.01, and the Mean Squared Error (MSE) served as the loss function to guide the learning process.
The evaluation of the models was carried out on a held-out test set, comprising 20% of the total data. Performance was quantified using a suite of metrics, including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R²). To further substantiate the findings, a series of robustness tests were implemented, including ablation studies to assess component importance, rolling window validation to test temporal.
CODE AND DATA AVAILABILITY
The daily WTI crude oil price data used in this study are publicly available from the U.S. Energy Information Administration (EIA) database (https://www.eia.gov/dnav/pet/hist/RWTCd.htm) and have been validated using historical records from the Federal Reserve Economic Data (FRED) database (https://fred.stlouisfed.org/series/DCOILWTICO). All raw data, processed datasets, and the complete analysis code that support the findings of this study have been deposited in the Zenodo repository and are publicly available at DOI: 10.5281/zenodo.17614060. The provided source code encompasses the full implementation of the proposed GARCH-BiLSTM-KAN hybrid model, all benchmark models used for comparison, and the scripts necessary to replicate the entire experimental pipeline, including data preprocessing, model training, evaluation, and the generation of all figures and tables presented in this paper. To guarantee the reproducibility of the results, the repository includes detailed documentation for environment setup. Furthermore, this study employed a fixed random seed (42) across all experiments and utilized a consistent set of hyperparameters for model training. All neural networks were optimized using the Adam optimizer with a learning rate of 0.01 and a batch size of 32, trained for 100 epochs. This rigorous standardization ensures that the reported outcomes can be reliably replicated.
Data Description:
The analysis in this study relies on daily price data of West Texas Intermediate (WTI) crude oil, a benchmark for global oil markets due to its liquidity and widespread use in pricing agreements. The dataset spans a 39-year period from January 2, 1986, to March 10, 2025, encompassing 9,866 daily observations. This time frame is carefully chosen to capture diverse market conditions, including periods of economic expansion, recession, geopolitical crises, and energy policy shifts, which are crucial for testing the robustness of the proposed forecasting model.
The raw WTI crude oil price data were retrieved from the U.S. Energy Information Administration (EIA) database (https://www.eia.gov/dnav/pet/hist/RWTCd.htm), a trusted source for energy market data. Additional validation was performed using historical records from the Federal Reserve Economic Data (FRED) database (https://fred.stlouisfed.org/series/DCOILWTICO) to ensure consistency (Federal Reserve Bank of St.).
To address the non-stationarity of crude oil prices-a common feature in financial time series (Box, 1970) - we computed daily returns as the first difference of the natural logarithm of prices: rt = ln(Pt) - ln(P{t-1}). The Augmented Dickey-Fuller (ADF) test confirms that the raw price series is non-stationary (p-value > 0.05), while the derived return series is stationary (p-value < 0.01), thus validating this preprocessing step. Where Pt denotes the price at time t. This transformation stabilizes the mean and variance of the series, facilitating model training46. Additionally, all price data were normalized using the Min-Max Scaler with a feature range of (-1, 1) to standardize input values, which enhances the convergence speed of neural network components (BiLSTM and KAN) during training47.
Table 2 summarizes the key descriptive statistics of the WTI crude oil prices over the sample period. The mean price is $47.73 per barrel, with a standard deviation of $29.64, indicating significant price volatility-a characteristic consistent with previous studies on energy commodities6. The price ranges from a minimum of -$36.98 (a historical anomaly during the 2020 COVID-19 crisis, when storage constraints led to negative prices16) to a maximum of $145.31 (observed during the 2008 global financial crisis, driven by supply concerns4). The median price is $40.69, slightly lower than the mean, suggesting a right-skewed distribution-typical of commodity prices influenced by supply shocks. The interquartile range (IQR = 75th percentile - 25th percentile = $71.47 - $20.22 = $51.25) further highlights the substantial price variability over the period.
Figure 4 presents the trend of WTI crude oil prices from 1986 to 2025, with several notable periods observable: the late 1980s to early 1990s, when prices remained relatively stable, averaging around $20-$30 per barrel, reflecting balanced supply-demand dynamics48; the 2000s, marked by a sharp upward trend that peaked at $145.31 in 2008, driven by rapid economic growth in emerging markets and geopolitical tensions in the Middle East49; the 2014-2016 crash, during which prices plummeted from over $100 to below $30 due to a supply glut from U.S. shale oil production and OPEC's decision to maintain output1; the 2020 COVID-19 pandemic, which led to an unprecedented drop to -$36.98 in April 2020 as lockdowns caused a collapse in demand50; and the post-2020 recovery, characterized by a gradual rebound influenced by economic reopening, supply chain disruptions, and geopolitical conflicts. Figure 5 illustrates the first difference of WTI prices, representing daily price changes, and the plot confirms volatility clustering-periods of high volatility (e.g., 2008, 2020) interspersed with relatively calm periods-validating the need for volatility modeling components like GARCH in the hybrid framework21.
To evaluate the out-of-sample forecasting performance of the GARCH-BiLSTM-KAN model, the dataset was split into training (80%) and testing (20%) sets. The training set includes 7,877 observations (1986-2016), and the testing set includes 1,969 observations (2017-2025). This split ensures that the model is trained on historical data and tested on a more recent period, capturing evolving market dynamics to assess real-world applicability20.
GARCH-BiLSTM-KAN
To systematically assess the predictive efficacy of the proposed GARCH-BiLSTM-KAN framework, a rigorous comparative analysis was conducted against a spectrum of benchmark models, encompassing traditional volatility models (GARCH, EGARCH), standalone deep learning architectures (LSTM), and hybrid configurations (LSTM-KAN, GARCH-LSTM, CNN-LSTM, CNN-LSTM-KAN). The evaluation was predicated on three core metrics: root mean squared error (RMSE), mean absolute error (MAE), and coefficient of determination (R2), with all computations performed on the out-of-sample test set. The comprehensive performance metrics are tabulated in Table 3.
As delineated in Table 3, the GARCH-BiLSTM-KAN model exhibited superior predictive performance across all metrics, registering the lowest RMSE (2.4649) and MAE (1.5044), coupled with the highest R² value (0.9813). This indicates that the proposed model accounts for approximately 98.13% of the variance in the observed WTI crude oil prices, signifying a robust fit to the underlying data-generating process. Among hybrid alternatives, the CNN-LSTM-KAN model, which integrates convolutional feature extraction with sequential learning and nonlinear refinement, demonstrated the closest performance, albeit with marginally higher error metrics (RMSE = 2.6047, MAE = 1.6464) and a lower explanatory power (R2 = 0.9792). The CNN-LSTM architecture, despite its proficiency in capturing local spatiotemporal patterns, yielded inferior results (RMSE = 2.7639, MAE = 1.7693, R2 = 0.9765), underscoring the incremental value of incorporating KAN for nonlinear feature refinement.
The remarkably poor performance of traditional volatility models (GARCH and EGARCH) warrants further explanation. As shown in Table 4, both models produce implausibly high error metrics (RMSE = 876.94 for GARCH, RMSE = 10,416.88 for EGARCH) and strongly negative R2 values. This outcome is theoretically expected and consistent with the fundamental limitations of these models for direct price forecasting. GARCH-family models are specifically designed for volatility estimation rather than price level prediction. When applied to forecast non-stationary price levels directly, they fail to capture the underlying trend components and complex nonlinear dynamics that characterize crude oil markets. Their performance degradation underscores the necessity of the proposed hybrid framework, which leverages GARCH for its intended purpose-volatility feature extraction-while delegating price prediction to more sophisticated sequence learning components.
Standalone deep learning models displayed comparatively diminished performance: the LSTM network achieved an RMSE of 2.8802, MAE of 1.8445, and R2 of 0.9745, while the LSTM-KAN hybrid exhibited a slight degradation in accuracy (RMSE = 2.9223, MAE = 1.8780, R2 = 0.9738). This suggests that the efficacy of KAN's nonlinear mapping is contingent upon prior integration of volatility modeling and bidirectional temporal learning components. The GARCH-LSTM model, which combines volatility estimation with unidirectional sequence processing, demonstrated a higher error profile (RMSE = 3.0753, MAE = 1.9996) and lower R2 (0.9710) relative to the proposed framework, highlighting the critical role of bidirectional temporal dependency modeling in capturing asymmetric price dynamics.
Notably, traditional volatility models (GARCH and EGARCH) proved highly ineffective, with GARCH yielding an RMSE of 876.9448, MAE of 724.0509, and a strongly negative R2 (-2360.2337), while EGARCH produced implausible results (RMSE = 10416.8764, MAE = 3826.0185, R2 = -333171.7174). These findings confirm the inadequacy of pure volatility modeling approaches in capturing the complex nonlinear and temporal characteristics inherent in crude oil price dynamics.
Due to the high deviations of the GARCH and EGARCH models (e.g., GARCH has an RMSE of 876.9448, EGARCH has an RMSE of 10416.8764, and both have strongly negative R² values), directly displaying them with other models in the same charts would mask the prediction details and performance differences of other models (such as GARCH-BiLSTM-KAN, CNN-LSTM-KAN, etc.) due to the excessive differences in value scales.
Therefore, Figure 6 is split into two parts: one part includes the GARCH and EGARCH models to fully present the comparison of all models; the other part excludes these two models, making it easier to observe the prediction trends, error distributions, and performance differences of other models, ensuring an intuitive analysis of the relative performance of each effective model. Figure 6 presents a localized comparison of test set predictions, clearly illustrating the close alignment between GARCH-BiLSTM-KAN forecasts and actual prices, in contrast to the substantial deviations exhibited by alternative models-particularly GARCH and EGARCH. This pattern is reinforced in Figure 7, which focuses on a sub-period of the test set, where the proposed model effectively captures both short-term fluctuations and medium-term trends, outperforming CNN-LSTM-KAN and LSTM-based architectures that exhibit noticeable lag effects and overshooting behavior.
Training dynamics, visualized in Figure 8, provide additional insights into model stability and convergence properties. The GARCH-BiLSTM-KAN framework exhibits rapid loss reduction during initial epochs, stabilizing around 0.025 MSE after 60 epochs-indicating efficient learning and robust convergence. In contrast, GARCH-LSTM and LSTM models display slower convergence with higher terminal loss values (approximately 0.075 and 0.100, respectively), while LSTM-KAN fails to achieve comparable loss minimization despite integrating nonlinear refinement. This suggests that the sequential integration of GARCH, BiLSTM, and KAN not only enhances predictive accuracy but also improves training efficiency, enabling more stable learning of complex patterns.
The exceptional performance of the GARCH-BiLSTM-KAN model, as quantified by its best-in-class metrics (RMSE = 2.4649, MAE = 1.5044, R2 = 0.9813), finds robust empirical evidence in the granular visualization of Figure 9. This figure, focusing on the final 100 test samples, shows that the model's predictions accurately track fine-grained price movements. In contrast, even the best-performing benchmark, CNN-LSTM-KAN, fails to capture certain abrupt changes. This visual evidence underscores the model's superior capability in refining nonlinear features and modeling bidirectional temporal relationships.
Collectively, these empirical findings provide compelling evidence for the superiority of the GARCH-BiLSTM-KAN model in crude oil price forecasting, with consistent outperformance across metrics and visual assessments. This validates the effectiveness of synergistically integrating volatility modeling, bidirectional temporal learning, and advanced nonlinear refinement to address the multifaceted characteristics of crude oil price dynamics.
Robustness analysis
To validate the robustness of the proposed GARCH-BiLSTM-KAN model and ensure that its performance is not contingent on a specific architectural configuration, a series of stress tests were conducted with alternative model designs. These tests aimed to assess the sensitivity of the model to its core components, including the type of recurrent neural network and the structural capacity of the Kolmogorov-Arnold Network.
Specifically, we evaluated the following model variants:
GARCH-BiGRU-KAN: This variant replaces the Bidirectional LSTM layer with a Bidirectional Gated Recurrent Unit (BiGRU) layer. The BiGRU is a simpler recurrent architecture with fewer parameters, and this substitution tests whether the performance is tied to the specific gating mechanisms of the LSTM.
GARCH-BiLSTM-KAN-Narrow: This configuration employs a KAN layer with reduced capacity (specifically, a width of [16, 1] but with only 3 B-spline knots) to examine the impact of under-parameterization on the model's ability to capture nonlinearities.
GARCH-BiLSTM-KAN-Wide: This configuration uses a higher-capacity KAN layer (a width of [64, 1] with 7 B-spline knots) to investigate potential over-parameterization and its effect on generalization. The performance metrics of these variants, alongside the original proposed model, are summarized in Table 4. As delineated in Table 4, the original GARCH-BiLSTM-KAN model maintains its superior performance, achieving the lowest RMSE and MAE and the highest R2 among all tested configurations. The GARCH-BiGRU-KAN variant demonstrates highly competitive results, with only a marginal performance degradation. This indicates that the core strength of the hybrid framework lies in its bidirectional temporal learning capability, while the specific type of recurrent unit (LSTM vs. GRU) has a relatively minor impact.
In contrast, alterations to the KAN structure yielded more pronounced effects. The narrow KAN configuration (GARCH-BiLSTM-KAN-Narrow) exhibited a significant drop in performance across all metrics, underscoring that insufficient network capacity severely limits the model's ability to refine complex nonlinear patterns. Conversely, the wide KAN configuration (GARCH-BiLSTM-KAN-Wide) also underperformed relative to the original model, suggesting that over-parameterization can lead to suboptimal generalization, likely due to an increased risk of overfitting to noise in the training data.
Collectively, these robustness checks affirm that the selected architecture for the GARCH-BiLSTM-KAN model represents a robust and near-optimal configuration. The results demonstrate the model's resilience to certain architectural changes (e.g., BiGRU substitution) while highlighting the importance of carefully calibrated network capacity, as exemplified by the KAN component.
Temporal robustness with rolling window validation
To further mitigate concerns regarding potential overfitting to a single static train-test split and to rigorously evaluate the model's performance stability across different temporal periods, a rolling window forecasting experiment was conducted. This method provides a more realistic and robust assessment of the model's predictive capability in a pseudo-real-time setting.
The rolling window validation scheme was implemented as follows: the initial training window consisted of the first 70% of the data (approximately 6,906 samples), with the subsequent 10% (approximately 987 samples) used as the testing window for a one-step-ahead forecast. The window was then rolled forward by a fixed step size, and the model was retrained and re-evaluated. This process was repeated over 200 windows, ensuring the model was tested across diverse market regimes contained within the full dataset.
Figure 10 presents the evolution of the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) across all 200 rolling windows. The results demonstrate the remarkable temporal stability of the proposed GARCH-BiLSTM-KAN model. Both RMSE and MAE fluctuate within a narrow and stable band throughout the validation process. There is no observable upward trend or structural break in the error metrics, indicating that the model does not suffer from performance degradation or overfitting to a specific market condition.
Notably, the spikes in error (e.g., around windows 50, 100, and 150) correspond to periods of extreme market volatility, such as the 2008 financial crisis and the 2020 COVID-19 pandemic, which are inherently more difficult to forecast. Crucially, the model rapidly recovers to its previous error levels after these volatile events, demonstrating its resilience and adaptability. This consistent performance across two decades of data, encompassing both calm and turbulent markets, provides strong evidence for the generalizability and temporal robustness of the GARCH-BiLSTM-KAN framework.
Ablation Study
To rigorously evaluate the individual contribution of each component within the proposed GARCH-BiLSTM-KAN framework, a comprehensive ablation study was conducted. The performance of the complete model was compared against several strategically designed variants: a standalone GARCH(1,1) model for direct price prediction (GARCH Only), a Bidirectional LSTM network utilizing only raw price returns (BiLSTM Only), a hybrid combining BiLSTM with KAN for nonlinear refinement (BiLSTM-KAN), and an integration of GARCH volatility features with BiLSTM but excluding KAN (GARCH-BiLSTM). The results, summarized in Table 5, which presents the performance metrics of all ablated model variants, provide compelling evidence for the synergistic design of the full model.
The standalone GARCH model exhibited the poorest performance with an RMSE of 43.62 and a strongly negative R2 value of -4.84, unequivocally confirming its inadequacy for direct price level forecasting and underscoring the necessity of integrating it with sequence learning components. The BiLSTM-only architecture demonstrated a substantially stronger capability with an R2 of 0.96, yet its RMSE of 3.61 indicated significant room for improvement. The incorporation of KAN for nonlinear refinement within the BiLSTM-KAN variant yielded an appreciable enhancement, reducing the RMSE to 3.35 and increasing the R2 to 0.97, which highlights KAN's inherent effectiveness in mapping complex relationships. Interestingly, the GARCH-BiLSTM model, which combines volatility features with bidirectional learning but lacks the KAN component, underperformed the BiLSTM-only model with an RMSE of 3.83. This result suggests that the raw integration of volatility information, without a sophisticated mechanism for nonlinear refinement, can introduce noise that hampers predictive accuracy. Ultimately, the complete GARCH-BiLSTM-KAN framework achieved superior performance across all metrics, registering the lowest RMSE (2.46) and MAE (1.50), and the highest R2 (0.98). The substantial performance gap between GARCH-BiLSTM and the full model-evidenced by a 35.6% reduction in RMSE-demonstrates that the KAN layer is not merely additive but plays a crucial, synergistic role in effectively reconciling and refining the interplay between historical volatility patterns and bidirectional temporal dependencies.
Validation on an Independent dataset
To further validate the generalizability and robustness of the proposed GARCH-BiLSTM-KAN model, we conducted an out-of-sample test using an independent dataset: daily Brent crude oil prices. Brent crude serves as a major global benchmark alongside WTI, and its inclusion allows us to assess whether the model's performance extends beyond the original dataset and market context. The Brent price data, sourced from the U.S. Energy Information Administration (EIA), spans from May 20, 1987, to March 10, 2025, comprising 9,658 daily observations. The same preprocessing steps were applied as in the WTI analysis, including logarithmic differencing to obtain stationary returns and Min-Max scaling to the range [-1, 1]. The dataset was split chronologically, with 80% used for training and 20% reserved for testing.
The proposed GARCH-BiLSTM-KAN model was trained and evaluated on the Brent dataset under identical hyperparameter and architectural settings as those used for WTI. Its performance was compared against the same set of benchmark models, including GARCH-BiGRU-KAN, CNN-LSTM-KAN, CNN-LSTM, GARCH-LSTM, LSTM, and LSTM-KAN. The results, summarized in Table 6, reaffirm the superior forecasting capability of the proposed hybrid framework.
As shown in Table 6, the GARCH-BiLSTM-KAN model again achieved the lowest RMSE (2.0053) and MAE (1.4122), along with the highest R2 value (0.9882), indicating that it explains approximately 98.82% of the variance in Brent crude oil prices. The GARCH-BiGRU-KAN variant followed closely, with an RMSE of 2.2921 and R2 of 0.9846, demonstrating that bidirectional recurrent architectures consistently outperform unidirectional or non-hybrid models. The CNN-LSTM-KAN hybrid also performed competitively, though it was slightly less accurate than the proposed model. In contrast, standalone LSTM and LSTM-KAN models exhibited higher prediction errors, reinforcing the necessity of integrating volatility modeling and bidirectional learning.
Supplementary File 1 is the raw data file ("Data.csv") and Supplementary File 2 is the source code file ("GBK.py"). These results confirm that the GARCH-BiLSTM-KAN framework is not overfitted to the WTI market but generalizes effectively to other major crude oil benchmarks. The consistent outperformance across both WTI and Brent datasets underscores the model's robustness and adaptability to different market dynamics and data-generating processes. This external validation enhances the credibility of the proposed approach and supports its applicability in diverse energy forecasting contexts.

Figure 1: Overall framework flowchart of the GARCH-BiLSTM-KAN hybrid model. The schematic illustrates the end-to-end forecasting pipeline, which sequentially integrates the GARCH module for conditional volatility estimation, the BiLSTM network for bidirectional temporal feature extraction, and the KAN layer for nonlinear refinement of the final prediction. Please click here to view a larger version of this figure.

Figure 2: Schematic diagram of hidden state fusion process in bidirectional long short-term memory network (BiLSTM). The architecture demonstrates the concatenation of hidden states from the forward and backward LSTM passes at each timestep, enabling the model to capture contextual dependencies from both past and future information within the input sequence. Please click here to view a larger version of this figure.

Figure 3: Schematic diagram of the Kolmogorov-Arnold network (KAN) model structure: The diagram depicts the KAN layer, which replaces fixed activation functions with learnable univariate spline-based functions (e.g., cubic B-splines) applied to each input dimension, followed by a linear combination, allowing for fine-grained approximation of complex nonlinear mappings. Please click here to view a larger version of this figure.

Figure 4: Trend chart of WTI crude oil prices from 1986 to 2025. The price series exhibits significant volatility and structural breaks, capturing major historical events including the 2008 global financial crisis, the 2014-2016 supply glut, the unprecedented negative pricing event in April 2020, and the post-pandemic recovery phase. Please click here to view a larger version of this figure.

Figure 5: Trend chart of the first difference of WTI crude oil prices. The plot of daily returns clearly illustrates the characteristic volatility clustering phenomenon-periods of high volatility tend to persist, validating the incorporation of a GARCH component for modeling time-varying variance. Please click here to view a larger version of this figure.

Figure 6: Local comparison of prediction results of various models on the test set. Predictions from the proposed GARCH-BiLSTM-KAN model closely align with the actual price trajectory, whereas traditional volatility models (GARCH, EGARCH) show substantial deviations, highlighting their inadequacy for direct price level forecasting. Please click here to view a larger version of this figure.

Figure 7: Test set prediction comparison (partial time range): This detailed view contrasts the forecasting performance of different models during a specific period, demonstrating the superior ability of the GARCH-BiLSTM-KAN model to capture both short-term fluctuations and medium-term trends compared to benchmark hybrids and standalone models. Please click here to view a larger version of this figure.

Figure 8: Training loss comparison across models (MSE vs. Epochs): The proposed GARCH-BiLSTM-KAN model achieves faster convergence and a lower stable loss value, indicating more efficient and stable training compared to other model configurations such as GARCH-LSTM and LSTM-KAN. Please click here to view a larger version of this figure.

Figure 9: Test set prediction (last 100 samples): The GARCH-BiLSTM-KAN model accurately tracks fine-grained price movements, including abrupt changes, outperforming the closest benchmark (CNN-LSTM-KAN) and demonstrating its enhanced capability for refining nonlinear temporal features. Please click here to view a larger version of this figure.

Figure 10: Model performance stability across rolling windows: The stability of the Root Mean Squared Error (RMSE, left) and Mean Absolute Error (MAE, right) across all windows, including periods of extreme market volatility, confirms the model's consistent out-of-sample performance and generalizability over time. Please click here to view a larger version of this figure.
| Component | Parameter | Value | Description |
| GARCH | Order | (1,1) | ARCH and GARCH terms |
| Distribution | Normal | Error distribution | |
| BiLSTM | Hidden units | 32×2 | Bidirectional architecture |
| Layers | 2 | Stacked LSTM layers | |
| Lookback window | 20 | Input sequence length | |
| KAN | Width | [64, 1] | Network structure |
| Basis functions | Cubic B-spline | Activation type | |
| Knots | 5 | Spline complexity | |
| Training | Optimizer | Adam | Optimization algorithm |
| Learning rate | 0.01 | Step size | |
| Batch size | 32 | Mini-batch size | |
| Epochs | 100 | Training iterations |
Table 1: Key parameters of the GARCH-BiLSTM-KAN hybrid model. This table summarizes the critical hyperparameters and architectural specifications for each component of the proposed GARCH-BiLSTM-KAN hybrid model. The GARCH(1,1) model is configured with a normal error distribution for volatility estimation. The BiLSTM network employs a bidirectional architecture with 32 hidden units per direction (resulting in a 64-dimensional concatenated hidden state), 2 stacked layers, a dropout rate of 0.2 for regularization, and processes input sequences with a lookback window of 20 timesteps. The Kolmogorov-Arnold Network (KAN) is defined with a layer width of [64, 1], utilizing cubic B-spline basis functions with 5 knots for flexible nonlinear activation. The model is trained end-to-end for 100 epochs using the Adam optimizer with a learning rate of 0.01 and a batch size of 32. All parameter values are consistent with the mathematical formulation and code implementation to ensure full reproducibility.
| Statistic | Count | Mean | Std | Min | 25% | 50% | 75% | Max |
| Value | 9,866 | 47.73 | 29.64 | -36.98 | 20.22 | 40.69 | 71.47 | 145.31 |
Table 2: Descriptive statistics of WTI crude oil prices (1986-2025). This table presents the descriptive statistics for the entire sample of daily WTI crude oil prices, encompassing 9,866 observations from January 1986 to March 2025. Key statistics include the mean price ($47.73/barrel), standard deviation ($29.64), minimum value (-$36.98), maximum value ($145.31), and quartiles (25th: $20.22, 50th: $40.69, 75th: $71.47). The substantial standard deviation and the wide range between the minimum (occurring during the April 2020 COVID-19 demand collapse) and maximum (during the 2008 financial crisis) underscore the significant volatility and extreme price movements inherent in the crude oil market over the nearly four-decade period.
| Model | RMSE | MAE | R² |
| GARCH-BiLSTM-KAN | 2.4649 | 1.5044 | 0.9813 |
| CNN-LSTM-KAN | 2.6047 | 1.6464 | 0.9792 |
| CNN-LSTM | 2.7639 | 1.7693 | 0.9765 |
| LSTM | 2.8802 | 1.8445 | 0.9745 |
| LSTM-KAN | 2.9223 | 1.878 | 0.9738 |
| GARCH-LSTM | 3.0753 | 1.9996 | 0.971 |
| GARCH | 876.9448 | 724.0509 | -2360.2337 |
| EGARCH | 10416.8764 | 3826.0185 | -333171.7174 |
Table 3: Performance metrics of various models on the test set. This table provides a comparative evaluation of forecasting performance between the proposed GARCH-BiLSTM-KAN model and a suite of benchmark models on the held-out test set. Performance is quantified using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R2). The proposed model achieves superior performance with the lowest RMSE (2.4649) and MAE (1.5044), and the highest R2 (0.9813). Traditional GARCH-family models (GARCH, EGARCH) demonstrate catastrophic failure for direct price level forecasting, evidenced by implausibly high error metrics and strongly negative R2 values, confirming their design limitation to volatility modeling. Other hybrid and deep learning benchmarks (CNN-LSTM-KAN, LSTM, GARCH-LSTM) are consistently outperformed, highlighting the synergistic contribution of the integrated components in the full model.
| Model | RMSE | MAE | R² |
| GARCH-BiLSTM-KAN | 2.4649 | 1.5044 | 0.9813 |
| GARCH-BiGRU-KAN | 2.4762 | 1.5459 | 0.9812 |
| GARCH-BiLSTM-KAN-Narrow | 2.8412 | 1.8029 | 0.9752 |
| GARCH-BiLSTM-KAN-Wide | 2.5746 | 1.6237 | 0.9787 |
Table 4: Robustness check results with alternative model configurations. This table presents the results of robustness checks conducted by altering key architectural components of the proposed framework. Variants include replacing the BiLSTM with a Bidirectional GRU (GARCH-BiGRU-KAN), and modifying the KAN capacity to be narrower ([16,1] width, 3 knots) or wider ([64,1] width, 7 knots). The original GARCH-BiLSTM-KAN configuration maintains the best performance, confirming its near-optimal design. The competitive performance of the BiGRU variant suggests the core strength lies in bidirectional learning, while the performance degradation of both narrow and wide KAN variants underscores the importance of appropriately calibrated network capacity for nonlinear refinement to avoid underfitting or overfitting.
| Model | RMSE | MAE | R² |
| GARCH Only | 43.6224 | 41.0928 | -4.8427 |
| BiLSTM Only | 3.6079 | 2.677 | 0.96 |
| BiLSTM-KAN | 3.3489 | 2.3605 | 0.9656 |
| GARCH-BiLSTM | 3.8299 | 2.9156 | 0.955 |
| GARCH-BiLSTM-KAN | 2.4649 | 1.5044 | 0.9813 |
Table 5: Results of the ablation study. This table details the findings from the ablation study, which systematically evaluates the contribution of each component by comparing the full model against strategically ablated variants. These include a standalone GARCH(1,1) model, a BiLSTM-only model, a BiLSTM-KAN model (without GARCH), and a GARCH-BiLSTM model (without KAN). The standalone GARCH model fails completely for price prediction. The BiLSTM-only model establishes a strong baseline, which is improved by adding KAN (BiLSTM-KAN). Notably, simply adding GARCH features to BiLSTM (GARCH-BiLSTM) without KAN degrades performance, indicating that the volatility input requires sophisticated nonlinear refinement. The full GARCH-BiLSTM-KAN model achieves the best results, demonstrating that the synergistic integration of all three components is crucial for its superior forecasting accuracy.
| Model | RMSE | MAE | R² |
| GARCH-BiLSTM-KAN | 2.0053 | 1.4122 | 0.9882 |
| GARCH-BiGRU-KAN | 2.2921 | 1.5816 | 0.9846 |
| CNN-LSTM-KAN | 2.4253 | 1.7414 | 0.9827 |
| CNN-LSTM | 2.6147 | 1.9298 | 0.9799 |
| GARCH-LSTM | 2.7417 | 1.9089 | 0.978 |
| LSTM | 2.782 | 1.9373 | 0.9773 |
| LSTM-KAN | 2.8235 | 2.0167 | 0.9766 |
Table 6. Performance comparison of different models on the independent Brent crude oil price dataset. This table presents the forecasting performance of the proposed GARCH-BiLSTM-KAN model and benchmark models evaluated on the independent Brent crude oil price dataset (May 20, 1987-March 10, 2025). Performance is measured by Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R2). The proposed GARCH-BiLSTM-KAN model achieves the lowest RMSE (2.0053) and MAE (1.4122), and the highest R2 (0.9882), demonstrating its superior generalizability and robustness on a major alternative benchmark.
Supplementary File 1: The raw data file ("Data.csv"). Please click here to download this file.
Supplementary File 2: The source code file ("GBK.py"). Please click here to download this file.
The empirical findings presented above underscore the superior performance of the GARCH-BiLSTM-KAN hybrid model in forecasting WTI crude oil prices, outperforming both traditional volatility models and alternative hybrid architectures across key metrics. This section discusses the theoretical and practical implications of these results, contextualizes them within existing literature, and identifies potential avenues for further inquiry.
The outperformance of the GARCH-BiLSTM-KAN model stems from the synergistic integration of its three components, each addressing distinct characteristics of crude oil price dynamics. The GARCH(1,1) module effectively captures volatility clustering, a defining feature of energy markets22, by incorporating historical volatility estimates into the input sequence. This structured volatility information enriches the feature space for subsequent layers, addressing a critical limitation of standalone deep learning models that treat input data as homoscedastic51. The BiLSTM layer builds on this foundation by modeling bidirectional temporal dependencies, allowing the model to capture asymmetric relationships between past and future price movements-an advantage over unidirectional LSTM or GARCH-LSTM configurations, which struggle with complex sequential asymmetries30. Finally, the KAN module refines these temporal features through its use of cubic B-spline basis functions, efficiently modeling high-dimensional nonlinearities that traditional activation functions may miss44. This hierarchical refinement-from volatility estimation to bidirectional sequence learning to nonlinear feature tuning-explains the model's ability to account for 98.10% of price variance in the test set.
These results contribute to the ongoing debate on the relative merits of statistical versus machine learning (ML) approaches in time series forecasting. Traditional volatility models like GARCH and EGARCH performed poorly in this study, with implausibly high error metrics and negative R2 values, confirming their inadequacy in capturing the nonlinear and multifaceted dynamics of crude oil prices26. The severe underperformance of GARCH and EGARCH models (-2360.23 and -333171.72 R2, respectively) aligns with their theoretical design constraints. These models operate under the assumption that price returns follow a stationary process with time-varying conditional variance. However, when tasked with forecasting raw price levels-which exhibit strong non-stationarity, long-term trends, and structural breaks-they systematically fail. This failure manifests as extreme forecast errors because the models cannot disentangle persistent price trends from transient volatility shocks. Our implementation adhered to standard econometric practice: GARCH(1,1) with normal error distribution and EGARCH(1,1) to capture leverage effects, yet both proved fundamentally inadequate for the price forecasting objective. This empirical evidence strongly validates the core thesis: volatility modeling alone is insufficient, and must be integrated with architectures capable of learning complex temporal patterns.
Standalone deep learning models, while superior to GARCH, exhibited limitations: LSTM and LSTM-KAN failed to match the hybrid model's accuracy, highlighting the need to explicitly model volatility rather than relying solely on sequential pattern recognition. Similarly, the GARCH-LSTM model, which combines volatility estimation with unidirectional learning, underperformed GARCH-BiLSTM-KAN, emphasizing the value of bidirectional temporal learning in capturing asymmetric price responses to shocks. These findings align with recent studies advocating for hybrid frameworks that merge statistical rigor with ML flexibility40,41, but extend this literature by demonstrating the incremental value of bidirectional learning and advanced nonlinear refinement via KAN.
Practically, the GARCH-BiLSTM-KAN model offers a robust tool for stakeholders in energy markets. For policymakers, its high accuracy enhances the reliability of energy policy simulations, strategic reserve planning, and inflation forecasts. Energy companies can leverage its volatility-aware predictions to improve risk hedging strategies and production planning, particularly during periods of market turbulence (e.g., the 2020 COVID-19 crisis or 2022 geopolitical tensions, as observed in Figure 9). Financial market participants, including traders and portfolio managers, may benefit from more precise pricing of oil derivatives and optimized asset allocation, reducing exposure to forecast errors9. Notably, the model's performance remains strong across diverse market conditions-from the 2008 financial crisis to the post-2020 recovery-suggesting generalizability to both stable and volatile periods.
The model demonstrates notable robustness during periods of extreme market stress, such as the 2008 financial crisis and the 2020 COVID-19 pandemic, which are inherently challenging for forecasting models. This resilience can be attributed to the synergistic design of its components. The GARCH module explicitly accounts for the volatility clustering that intensifies during such crises, providing a structured measure of risk. Concurrently, the BiLSTM's bidirectional processing allows it to capture not only the persistent downward pressure from panic selling (a backward-looking signal) but also the emergent market expectations of recovery or further collapse (a forward-looking signal), which are critical during turning points. Finally, the KAN layer refines these complex, often nonlinear interactions between extreme volatility and price trends that standard activation functions might oversimplify. While traditional models like GARCH and standalone LSTMs falter due to their inability to holistically model these intertwined dynamics, the integrated framework navigates these conditions by design, as evidenced by its stable performance in the rolling window validation (Figure 10) that spans these turbulent episodes.
Beyond statistical metrics, the performance improvement of the GARCH-BiLSTM-KAN model carries substantial practical and economic implications. The reduction in RMSE from 3.61 (BiLSTM-only) to 2.46 (the model described here) represents a 32% increase in forecasting accuracy. For stakeholders in the multi-trillion-dollar global oil market, this enhancement translates directly into tangible economic value. For instance, an energy firm hedging a million-barrel position could see the potential forecast error reduced by approximately 1,150 barrels per day (calculated based on the RMSE improvement). This translates to significantly lower hedging costs and reduced exposure to adverse price movements5,8. For policymakers, this improved precision supports more effective strategic petroleum reserve management, potentially saving billions in public funds by optimizing the timing of purchases and releases. For financial institutions engaged in derivative pricing and portfolio allocation, even marginal forecast improvements can compound into substantial gains and risk reduction across large portfolios9. Therefore, the proposed model is not merely a statistical advancement but a tool with direct utility for financial decision-making and economic policy.
The model's enhanced accuracy has direct policy relevance. For instance, in a scenario analysis of strategic petroleum reserve (SPR) management, the model's reliable forecasts during the 2022 price surge-driven by geopolitical tensions-could have provided a stronger quantitative basis for the timing and volume of SPR releases, thereby improving the effectiveness of such interventions in stabilizing markets and controlling inflation.
Despite its strengths, this study has limitations that warrant consideration. First, the analysis focuses exclusively on WTI crude oil; future research should validate the model's performance on other benchmarks or related commodities to assess its broader applicability. Second, while KAN improves nonlinear modeling, its interpretability remains limited compared to parametric models. Exploring explainable AI techniques to unpack the KAN layer's contributions-for example, identifying which basis functions drive specific price predictions-could enhance trust among practitioners. Third, the model's input features are restricted to historical prices and volatility. Incorporating external variables such as macroeconomic indicators, geopolitical risk indices, or OPEC production data may further improve predictive power, as these factors are known to influence oil prices10,21.
In summary, the GARCH-BiLSTM-KAN model advances crude oil price forecasting by harmonizing volatility modeling, bidirectional temporal learning, and advanced nonlinear refinement. Its empirical success validates the utility of hybrid frameworks in addressing the complexities of energy markets, offering both theoretical insights and practical benefits for decision-making. Future work should focus on expanding the model's scope, enhancing interpretability, and integrating additional predictive features to solidify its role as a leading forecasting tool in energy economics.
This study proposes a novel hybrid forecasting framework, the GARCH-BiLSTM-KAN model, which synergistically integrates parametric volatility modeling, bidirectional temporal learning, and nonparametric nonlinear refinement. Scientifically, it contributes to the field by demonstrating a viable pathway to harmonize econometric theory with state-of-the-art machine learning architectures, providing empirical evidence that such a hybrid paradigm can effectively address the "elephant in the room" in financial forecasting: the coexistence of volatility clustering, long-range dependencies, and severe nonlinearities. The work presented here moves beyond simple model stacking to offer a principled, hierarchical feature refinement pipeline, thereby enriching the methodological toolkit for complex time series analysis.
Despite its compelling performance, this study is subject to several limitations that warrant acknowledgment. The model's input feature space is currently restricted to historical prices and their derived volatility, whereas alternative or complementary approaches could incorporate a broader set of fundamental and technical predictors, such as OECD inventory levels, OPEC+ production decisions, and geopolitical risk indices, to more fully capture the drivers of oil price dynamics. Furthermore, while the KAN component enhances nonlinear modeling, its interpretability in high-dimensional settings remains less straightforward than that of linear models or traditional MLPs, which could present a challenge for immediate practitioner adoption. Finally, the empirical validation, conducted on a single major benchmark (WTI), may limit the immediate generalizability of the findings to other commodities or markets.
Notwithstanding these limitations, the proposed model holds significant potential for practical application. It offers policymakers a more robust tool for energy security planning and macroeconomic forecasting, assists energy firms in enhancing risk management and hedging strategies during periods of extreme volatility, and provides financial institutions with improved accuracy for derivative pricing and portfolio allocation decisions involving energy assets. Building upon this work, we envision several promising future research directions. These include integrating the model with a wider array of external macro-financial and geopolitical variables, employing eXplainable AI (XAI) techniques such as SHAP (SHapley Additive exPlanations) to demystify the KAN layer's predictions by quantifying feature importance and interaction effects, validating the framework's generalizability across other critical energy commodities and financial assets, and developing mechanisms for real-time adaptation. Future work will also focus on validating the model's performance on other key benchmarks, such as Brent crude oil, and related energy commodities to further assess its generalizability.
This study addresses the long-standing challenge of crude oil price forecasting by proposing a novel hybrid model, GARCH-BiLSTM-KAN, which synergistically integrates the strengths of volatility modeling, bidirectional temporal learning, and advanced nonlinear refinement. By systematically combining GARCH(1,1) for capturing time-varying volatility, BiLSTM for modeling bidirectional temporal dependencies, and KAN for refining complex nonlinear relationships, the proposed framework effectively navigates the multifaceted dynamics of crude oil prices-including volatility clustering, asymmetric temporal interactions, and high-dimensional nonlinearities-that have historically stymied single-model approaches.
Empirical results using 39 years of daily WTI crude oil prices (1986-2025) demonstrate the superiority of GARCH-BiLSTM-KAN over a range of benchmark models. With the lowest RMSE (2.49), MAE (1.52), and highest R² (0.98), the model outperforms traditional volatility models (GARCH, EGARCH), standalone deep learning architectures (LSTM), and alternative hybrids (GARCH-LSTM, CNN-LSTM-KAN). This consistent outperformance validates the value of its integrated design: GARCH enriches inputs with structured volatility information, BiLSTM captures asymmetric past-future relationships often missed by unidirectional models, and KAN refines residual nonlinearities beyond the capacity of conventional neural network activations.
The findings carry significant implications for both academia and practice. Theoretically, this research advances the field of energy economics by demonstrating how hybrid frameworks can harmonize parametric and nonparametric methods to address the complexities of financial time series-offering a new paradigm for modeling assets with volatile, nonlinear dynamics. Practically, the model provides a robust tool for governments, energy firms, and financial institutions: policymakers can leverage its accuracy for more informed energy policy and strategic reserve management; energy companies can enhance risk hedging and production planning, particularly during turbulent periods like the 2020 COVID-19 crisis or 2022 geopolitical tensions; and investors can improve derivative pricing and portfolio optimization.
Despite these contributions, limitations remain. The focus on WTI crude oil calls for validation across other benchmarks or energy commodities to confirm generalizability. Additionally, the model's reliance on historical price and volatility data leaves room to incorporate external factors-such as macroeconomic indicators, geopolitical risk indices, or OPEC production quotas-that shape oil markets. Future work could also explore explainable AI techniques to unpack KAN's nonlinear mappings, enhancing transparency for practical adoption.
In conclusion, the GARCH-BiLSTM-KAN model represents a significant step forward in crude oil price forecasting, blending statistical rigor with cutting-edge machine learning to tackle the inherent complexities of energy markets. Its success underscores the potential of hybrid approaches in financial time series analysis, offering both theoretical insights and actionable tools to navigate the uncertainties of global oil markets.
The authors have nothing to disclose.
The authors thank all colleagues for their support and helpful comments.
| Computing Workstation | Standard research computing environment | Configuration: NVIDIA GPU, 32GB RAM, multi-core CPU | Used for all model training, validation, and experimental procedures. |
| Daily Brent Crude Oil Spot Price | U.S. Energy Information Administration (EIA) | Zenodo Repository: DOI 10.5281/zenodo.17614060 | Served as an independent dataset for validating model generalizability. |
| Daily West Texas Intermediate (WTI) Crude Oil Spot Price | U.S. Energy Information Administration (EIA) | Zenodo Repository: DOI 10.5281/zenodo.17614060 | Used as the primary dataset for model development and evaluation. |
| GARCH-BiLSTM-KAN Model Implementation | This study | Zenodo Repository: DOI 10.5281/zenodo.17614060 | Complete source code for the proposed hybrid model and all benchmark models. |
| Key Python Libraries (NumPy, Pandas, Scikit-learn, Matplotlib) | Open-source community | Versions as specified in repository requirements.txt | Used for data processing, statistical analysis, and visualization. |
| Python Programming Language | Python Software Foundation | Version 3.9; https://www.python.org/ | Main programming language for implementing all models and analyses. |
| PyTorch Library | PyTorch Foundation | Version 2.0; https://pytorch.org/ | Primary deep learning framework for implementing BiLSTM and KAN components. |