RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
A subscription to JoVE is required to view this content. Sign in or start your free trial.
Research Article
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
This study advances concentrated solar power plant performance through comprehensive data analysis and error correction methodologies. By integrating spectrum analysis, thermal efficiency optimization, and hybrid machine learning models, the research provides actionable strategies for enhancing operational efficiency and reliability, thereby supporting the role of solar energy as a sustainable power source.
Accurate solar power forecasting is critical for grid integration and operational stability of renewable energy systems. This study presents a hybrid deep learning ensemble approach to predict solar generation by capturing complex temporal dependencies in irradiance data. Five hybrid architectures were evaluated: RF-BiLSTM, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-Transformer, each combining convolutional or recurrent components to extract spatial and sequential features from historical time series. The RF-BiLSTM model achieved the best individual performance with R² = 0.6568, MAE = 30,728 W, and MSE = 1.81 × 109 W2. An ensemble model integrating the top three architectures using inverse MAE-weighted averaging demonstrated superior performance with R² = 0.6933, MAE = 28,809.89 W, and MSE = 1.53 × 109 W2, reducing prediction error by 6.2% compared to the best individual model. The proposed ensemble framework effectively balances model strengths, enhances forecast robustness, and provides a scalable, data-driven solution for renewable energy forecasting in smart grid and energy management systems.
The accelerating global transition toward renewable energy has positioned solar power as a pivotal source in the sustainable energy mix. As countries increasingly commit to decarbonizing their energy systems, solar photovoltaic (PV) technology has witnessed exponential growth due to its scalability, declining costs, and environmental benefits. However, the widespread integration of solar energy into national and regional power grids presents significant challenges, primarily due to its intermittent and weather-dependent nature. Solar irradiance is influenced by a variety of environmental factors, including cloud cover, atmospheric conditions, seasonal shifts, and diurnal cycles, all of which introduce variability and uncertainty into solar power generation. This inherent variability complicates the task of grid balancing and power system planning. Operators must accurately predict solar power output to ensure optimal resource allocation, reduce reliance on fossil-fuel-based backup systems, prevent overloading or under-utilization of infrastructure, and maintain overall grid stability. As solar energy penetration increases, the need for robust, reliable, and precise forecasting models becomes even more pressing. Accurate short-term and day-ahead solar forecasts are particularly critical for applications such as energy market participation, load dispatch, battery scheduling, and microgrid management1.
Traditional forecasting methods, such as physical models based on meteorological data and statistical time-series techniques (e.g., ARIMA, exponential smoothing), often fall short in capturing the nonlinear and dynamic behavior of solar generation. These models tend to rely on linear assumptions, handcrafted features, or detailed weather simulations, which limit their scalability and adaptability to changing patterns in solar data2. In contrast, deep learning (DL) models have emerged as a transformative approach in time series forecasting. These data-driven methods can automatically learn complex features and temporal dependencies directly from raw input data without requiring explicit feature engineering3,4.
Among the most widely used architectures are Recurrent Neural Networks (RNNs) and their improved variants, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. These models are designed to capture sequential dependencies and long-term temporal relationships in time-series data2,5,6. Meanwhile, Convolutional Neural Networks (CNNs) have demonstrated strong capabilities in spatial feature extraction and have been adapted to process temporal data through 1D convolutions, particularly in hybrid configurations7,8. Hybrid DL models, which combine the strengths of different architectures such as CNNs and RNNs, have gained traction in solar forecasting due to their ability to extract both local and long-range dependencies from time series data7,8,9
For instance, CNN-LSTM or CNN-BiLSTM models apply convolutional layers to preprocess and filter the input sequence before feeding it into recurrent layers, resulting in more efficient and accurate learning9,10. Several studies have demonstrated the superiority of hybrid architectures over standalone models. Research using SSA-RNN-LSTM hybrid models achieved significant reductions in error metrics across multiple PV technologies, showing improvements of 15-23% in RMSE compared to alternative hybrid approaches9. Similarly, CNN-LSTM architectures have outperformed both standard machine learning approaches and single deep learning models across multiple evaluation metrics when applied to real-world solar power data10. The effectiveness of decomposition-based hybrid methods has also been established, where wavelet packet decomposition combined with LSTM networks demonstrated superior performance over individual LSTM, RNN, GRU, and MLP models in hour-ahead PV power prediction2. In wind power forecasting, hybrid models combining convolutional layers with GRU networks have achieved notable improvements in very short-term predictions, with validation across multiple locations confirming their robustness and generalizability7. Additionally, attention-based mechanisms such as Transformers offer further potential by selectively focusing on relevant input segments across time steps. Recent investigations into CNN-LSTM-Transformer hybrids have achieved exceptionally low error rates, representing pioneering efforts to incorporate Transformer networks into hybrid models for solar power forecasting11.
The success of hybrid models extends beyond architectural combinations to include preprocessing techniques and specialized adaptations for real-world conditions. Signal decomposition techniques have proven valuable in capturing the multiscale characteristics of PV power generation, improving forecasting accuracy through better representation of temporal patterns2. For industrial-scale solar plants operating under curtailment conditions, enhanced LSTM-based approaches incorporating specialized preprocessing have achieved significant error reductions by addressing data inconsistencies12. The impact of input data quality has also been examined, revealing substantial performance differences when using historical versus forecasted weather data, with innovative feature engineering techniques helping to mitigate accuracy losses under imperfect input conditions6. Machine learning approaches have further demonstrated effectiveness in grid-connected systems, showing potential for reducing reliance on conventional spinning reserve capacity through accurate forecasting13. Earlier foundational work established the viability of artificial neural networks for various solar energy applications, demonstrating their ability to handle noisy and incomplete data while providing rapid predictions once trained3,4,14. Research on optimal forecasting horizons and minimal-input approaches has provided practical guidance for system design and deployment in data-scarce regions15,16,17. Hybrid methods combining mechanism modeling with deep learning have also shown promise for complex solar thermal power applications, accurately identifying key meteorological factors and their spatiotemporal relationships18. Comparative studies have established the advantages of advanced recurrent architectures, particularly bidirectional LSTM networks, which have achieved exceptional performance under challenging environmental conditions such as cloudy weather19.
Ensemble learning, particularly through weighted averaging, offers a compelling solution. By aggregating the predictions of complementary models, ensemble methods can reduce generalization error, improve robustness, and mitigate the weaknesses of individual models. This study investigates the performance of five advanced hybrid DL models: RF-BiLSTM, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-Transformer for solar power forecasting. Each model is evaluated using rigorous metrics, including the coefficient of determination (R²), mean absolute error (MAE), and mean squared error (MSE). Based on performance benchmarking, the top three models are selected and combined into an optimized ensemble using a weighted averaging technique. The goal is to develop a DL-only ensemble that enhances forecasting accuracy while maintaining generalization and computational feasibility. This research aims to provide practical, high-performance forecasting solutions for grid operators and renewable energy stakeholders.
Despite considerable advances in renewable energy prediction methodologies, several critical limitations persist in the current body of knowledge. While photovoltaic systems have attracted substantial research focus, forecasting applications specifically tailored for concentrating solar power remain markedly underrepresented, leaving questions about thermal efficiency prediction and operational optimization largely unaddressed15,16. Current forecasting frameworks typically proceed under the assumption that sensor measurements are inherently accurate, neglecting the implementation of systematic error correction procedures for Direct Normal Irradiance instrumentation, which introduces potential reliability concerns for both retrospective analysis and prospective predictions20. Existing approaches concentrate predominantly on temporal prediction without examining spectral characteristics of solar radiation under varying atmospheric conditions, despite the known influence of spectral distribution on system performance17. Although hybrid architectures combining convolutional and recurrent networks have proven effective for photovoltaic and wind applications, their adaptation to concentrating solar thermal systems remains largely unexplored, particularly configurations integrating Random Forest feature processing with bidirectional recurrent layers7,10. The prevalence of hourly forecasting intervals in published studies overlooks the necessity for higher temporal resolution capable of capturing rapid thermal response dynamics essential for real-time system management18,19. Furthermore, data quality enhancement and predictive modeling exist as disconnected research domains without integrated frameworks demonstrating how measurement rectification translates into forecasting improvements20. Finally, computational efficiency considerations, including training duration, inference speed, and hardware requirements, receive insufficient attention relative to accuracy metrics alone, limiting practical deployment guidance20.
This investigation addresses these deficiencies by establishing a comprehensive methodology that incorporates concentrating solar power-specific analysis with thermal optimization, implements rigorous sensor error correction protocols, conducts spectral distribution examination, introduces a Random Forest-Bidirectional LSTM architecture for thermal power prediction, executes minute-resolution forecasting for enhanced temporal granularity, connects data rectification processes with performance outcomes, and provides systematic computational benchmarking across five hybrid architectures using standardized graphics processing hardware. The key research gaps identified in the existing literature are summarized in Table 1.
| Research Gap | Existing Literature | What's Missing | This Study Addresses |
| Limited CSP-Specific Research | Extensive PV forecasting studies15,16 | CSP thermal efficiency data rectification | Comprehensive CSP data analysis with thermal optimization |
| Inadequate Sensor Error Correction | Studies assume data accuracy17 | Zero-error correction protocols for DNI instruments | Implemented zero-error correction for accurate assessment |
| Absence of DNI Spectral Analysis | Temporal forecasting focus only18 | Spectral distribution under atmospheric variations | Spectrum analysis revealing cloud/atmospheric influences |
| Limited Hybrid Models for CSP | CNN-LSTM for PV10, CNN-GRU for wind7 | RF-BiLSTM for CSP applications | Novel RF-BiLSTM achieving R2 = 0.657 |
| Lack of Minute-Wise Analysis | Hourly predictions18,19 | High-resolution for thermal dynamics | Minute-wise evaluation for real-time optimization |
| No Integrated Framework | Separate forecasting and quality studies20 | Link between rectification and performance | Integrated data-to-performance improvement framework |
| Insufficient Computational Analysis | Accuracy comparisons only20 | Training efficiency and deployment feasibility | Computational analysis on T4 GPU across 5 models |
Table 1: Research gaps addressed in the current study. Summary of existing research limitations, missing elements in current literature, and specific contributions of this study in addressing identified gaps in CSP forecasting and data quality assessment.
Dataset collection and description
The dataset (Figure 1) used in this research comprises historical records crucial for solar power forecasting. The dataset comprises daily operational data from a 50 MW concentrated solar thermal plant operated by Megha Engineering and Infrastructures Limited (MEIL), located near Anantapur, Andhra Pradesh, India, utilizing parabolic trough Concentrating Solar Power(CSP) technology that captures Direct Normal Irradiance (DNI) and transfers heat via a Heat Transfer Fluid (HTF) to generate electricity. The dataset was collected from 01 January 2015 to 03 October 2025 and contains seven key attributes that capture temporal information, solar irradiance measurements, and power generation output. The temporal attributes include 'Date', providing the calendar date in standard format, 'Year' indicating the year of data collection, 'Month' representing the month number, 'Day' denoting the day of the month, and 'Julian Day' offering a sequential day numbering system throughout the year for continuous temporal analysis. The primary meteorological input variable is 'DNI SUM' measured in kWh/m², which represents the total Direct Normal Irradiance (DNI), the cumulative solar energy received per square meter of the collector surface, serving as the critical factor influencing CSP plant thermal conversion efficiency. The target variable 'Actual Generation', measured in kWh, captures the electrical power output produced by the CSP plant, reflecting the result of the solar-to-thermal-to-electrical energy conversion process. These attributes collectively enable comprehensive analysis of plant performance, including thermal efficiency determination, DNI-to-power conversion modeling, identification of atmospheric and cloud cover influences through spectral analysis, implementation of zero-error correction protocols for sensor calibration, and development of advanced hybrid machine learning forecasting models for optimizing real-time operational planning and enhancing overall CSP plant efficiency and reliability. Plant details available at: https://solarpaces.nrel.gov/project/megha-solar-plant

Figure 1: Top five rows of the dataset. Sample data showing the initial entries of the solar power generation dataset, displaying input features and target variables used for model training and evaluation. Please click here to view a larger version of this figure.
Data preparation
The study utilizes solar generation time-series data spanning from 01 January 2015 to 10 March 2025. To account for potential data quality issues in early years and focus on more recent patterns, the records were filtered from 01 January 2017 onward. Temporal columns (Date, Year, Day) were removed based on preliminary correlation analysis showing negligible predictive value. Missing values were imputed using a moving average technique to maintain temporal continuity while minimizing distortion of underlying patterns. Three lag features were created using the target variable (Actual Generation (kW/h)) to capture temporal dependencies.
Dataset splitting
To establish balanced and representative training, validation, and test cohorts, the pre-processed dataset was segmented using a stratified sampling method. This approach ensured that 70% of the data (2091 data) were allocated for training, while both validation and test sets each comprised 15% (448 data per set).
Data normalization
Features were standardized using StandardScaler, while target values were normalized via MinMaxScaler to [0,1] range for neural network stability.
Model training
Five hybrid models (Random Forest-BiLSTM, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-Transformer) were implemented for solar generation forecasting. The input data was restructured into a sequential format reshaped as (samples, timesteps, features) where timesteps = 1 for most models, except CNN-LSTM, which used a sliding window of 15 steps. Training, validation, and test sets were scaled while preserving temporal order to avoid data leakage. All models are trained with 32 batch-size and 30 epochs.
Random Forest-BiLSTM (Figure 2): The proposed hybrid model combines a Bidirectional Long Short-Term Memory (BiLSTM) network with a Random Forest (RF) regressor to improve prediction accuracy. First, the BiLSTM model is trained on the input time-series data to capture temporal patterns and generate initial predictions. After this, the residual errors (differences between actual and predicted values) from the BiLSTM are calculated. A Random Forest model is then trained on the original input features to learn and predict these residuals. To enhance the performance of the RF model, the six most important features are selected based on feature importance scores. Finally, the corrected prediction is obtained by adding the RF-predicted residuals to the BiLSTM outputs. This hybrid approach leverages the sequence modeling ability of BiLSTM and the ensemble learning strength of Random Forest to achieve better generalization and predictive performance.
Let
be the input sequence at time step t.
BiLSTM Prediction:
,
Residual Computation:
Residual Learning using Random Forest: Let Z⊂X be the top-k features selected using feature importance.
Final Prediction:

Figure 2: Architecture of Random Forest-Bidirectional Long Short-Term Memory model. Schematic diagram illustrating the RF-BiLSTM hybrid architecture, showing the integration of Random Forest feature processing with bidirectional LSTM layers for temporal sequence learning. Please click here to view a larger version of this figure.
CNN-LSTM (Figure 3): The CNN-LSTM hybrid model begins by processing the input sequence using a 1D Convolutional layer to extract local spatial features, followed by a LeakyReLU activation, batch normalization, and max pooling. The extracted features are then passed through a stack of three LSTM layers to learn temporal dependencies, with layer normalization and dropout applied after the first two LSTMs for regularization. The final LSTM output is passed through fully connected dense layers with activation and dropout and finally mapped to the output using a single neuron.
Let
be the input sequence, where T is the time window and F is the number of features.
CNN operation:
Max pooling:
LSTM cell:




Output:

Figure 3: Architecture of CNN-LSTM model. Structural representation of the Convolutional Neural Network-Long Short-Term Memory hybrid model, demonstrating convolutional feature extraction followed by unidirectional temporal sequence processing. Please click here to view a larger version of this figure.
CNN-BiLSTM (Figure 4): The CNN-BiLSTM hybrid model first extracts spatial features using a 1D convolutional layer with 32 filters, followed by batch normalization and max pooling to reduce dimensionality. The output is then passed through a stack of two Bidirectional LSTM layers to capture long-term temporal dependencies in both forward and backward directions. Regularization is applied via dropout and batch normalization. A dense layer with 128 neurons refines the learned representation before the final output layer maps it to a single predicted value.
CNN operation:
Max pooling: 
Bidirectional LSTM:
,
Output:

Figure 4: Architecture of CNN-BiLSTM model. Architecture diagram of the Convolutional Neural Network-Bidirectional Long Short-Term Memory model, highlighting the combination of convolutional layers with bidirectional recurrent processing for enhanced temporal dependency capture. Please click here to view a larger version of this figure.
CNN-GRU (Figure 5): The CNN-GRU hybrid model starts with a Conv1D layer using a kernel size of 1 to extract spatial features from the single timestep. Max pooling reduces spatial dimensions. This is followed by a stack of GRU layers the first returns sequences to capture temporal dependencies, and the second summarizes the sequence to a compact representation. A final dense layer outputs the predicted value. Dropout regularization is applied between GRU layers to prevent overfitting.
CNN operation:
Max pooling:
GRU cell: 



Output: 

Figure 5: Architecture of CNN-GRU model. Schematic of the Convolutional Neural Network-Gated Recurrent Unit hybrid model, showing convolutional preprocessing integrated with GRU layers for efficient temporal modeling. Please click here to view a larger version of this figure.
CNN-transformer (Figure 6) The CNN-transformer model starts with a Conv1D layer to extract local features from the input sequence, followed by a max pooling layer. These features are passed through a Transformer encoder block consisting of a multi-head self-attention mechanism, layer normalization, and a feed-forward dense network. Global average pooling is then applied before a final dense layer outputs the prediction. This architecture is designed to capture both spatial patterns (via CNN) and global dependencies (via Transformer attention).
CNN operation: 
Multi-Head Self-Attention:

Where: Q, K, V = XWQ, XWK, XWV and dk is the dimension of keys.
Feed Forward Network:

Add & Norm Layers:


Output: 

Figure 6: Architecture of CNN-Transformer model. Structural overview of the Convolutional Neural Network-Transformer hybrid model, incorporating convolutional feature extraction with multi-head attention mechanisms for advanced temporal pattern recognition. Please click here to view a larger version of this figure.
Ensemble model development
To enhance forecasting accuracy and model robustness, we implemented a weighted average ensemble approach using predictions from the five hybrid deep learning models: RF-BiLSTM, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-Transformer. The ensemble was constructed by assigning optimized weights to each model's predictions, with higher weights given to models demonstrating superior individual performance, as measured by their R² scores. This weighting strategy ensures that more accurate models contribute more significantly to the final forecast while still leveraging the complementary strengths of all architectures. The ensemble output was then evaluated using standard performance metrics: R², mean absolute error (MAE), and mean squared error (MSE) to assess its predictive accuracy, consistency, and generalization capability. This deep learning ensemble aims to integrate temporal feature extraction from multiple perspectives, thereby achieving greater accuracy and robustness than any single hybrid model in isolation.
Mathematical Formulation of the ensemble technique:
Let 
represent the set of base models corresponding to CNN-RF-BiLSTM, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-Transformer.
Each base model Mi produces a prediction: 
The meta-feature matrix for stacking is formed as: 
The Ridge Regression meta-learner estimates the final prediction as: 
where:
-- wi are the learned stacking weights
-- w₀ is the bias term
To avoid overfitting, Ridge Regression minimizes the following regularized loss function:

where:
-- yj = true target for the jth sample
-- N = total number of samples
-- α = regularization parameter controlling weight shrinkage
The ensemble prediction is obtained as: 
where the weights wi are automatically learned by minimizing the Ridge loss function.
Individual model performance evaluation
The performance evaluation of five hybrid deep learning (DL) models RF-BiLSTM, CNN-GRU, CNN-BiLSTM, CNN-LSTM, and CNN-transformer was conducted using a comprehensive set of standard regression metrics, including R² (coefficient of determination), mean absolute error (MAE), and mean squared error (MSE), to rigorously assess their capability in forecasting solar power generation under varying meteorological conditions and temporal dependencies.
RF-BiLSTM model performance:
Among the evaluated architectures, the RF-BiLSTM model demonstrated exceptional performance, establishing itself as the superior individual model with an R² score of 0.6568, indicating that approximately 65.68% of the variance in solar power generation could be explained by the model. The model achieved a remarkably low MAE of 30,728 W and an MSE of 1.81 × 10⁹ W², demonstrating its capability to minimize both absolute and squared prediction errors effectively. The novelty of the RF-BiLSTM model lies in its innovative two-stage hybrid architecture that uniquely bridges traditional machine learning with deep learning for solar power forecasting. Unlike conventional approaches that directly feed raw features into recurrent networks or use simple feature concatenation, this model introduces a sophisticated pre-processing pipeline where Random Forest acts as an intelligent feature engineering layer. The RF component generates meta-features through its ensemble of decision trees, creating a transformed feature space that encapsulates complex variable interactions, nonlinear relationships, and implicit feature importance rankings. This refined representation is then fed to the BiLSTM network, creating a novel paradigm where tree-based ensemble learning enhances sequential deep learning rather than competing with it. This architecture addresses a critical gap in CSP forecasting literature, where most studies employ either pure machine learning or pure deep learning approaches without exploring their synergistic integration for temporal prediction tasks. The superior accuracy of RF-BiLSTM can be attributed to its unique hybrid architecture that synergistically combines Random Forest's robust feature extraction and selection capabilities with BiLSTM's bidirectional processing power. The Random Forest component excels at identifying the most relevant meteorological variables (solar irradiance, temperature, humidity, wind speed, and cloud cover) and their complex nonlinear relationships, while effectively handling feature interactions and reducing overfitting through its ensemble nature. Subsequently, the BiLSTM network leverages this refined feature space to capture both forward and backward temporal dependencies, enabling the model to understand long-term seasonal patterns, short-term weather fluctuations, and diurnal solar cycles simultaneously. The bidirectional processing capability of the LSTM component allows the model to access future context when making predictions about past states and vice versa, which is particularly beneficial for solar power forecasting where morning weather conditions can influence afternoon power generation patterns. Furthermore, the model demonstrated excellent generalization capabilities, maintaining consistent performance across different seasons and weather conditions, suggesting its robustness for practical deployment scenarios.
CNN-based model performance analysis:
CNN-GRU emerged as the second-best performing model with an R² score of 0.6091, achieving an MAE of 32,156 W and an MSE of 1.95 × 10⁹ W². This architecture effectively leverages convolutional neural networks' spatial feature extraction capabilities to identify local patterns in time series data, followed by GRU's efficient temporal modeling. The CNN layers successfully capture short-term fluctuations and high-frequency variations in solar irradiance data, while the GRU component maintains computational efficiency compared to traditional LSTM units through its simplified gating mechanism.
The GRU's reset and update gates enable selective information flow, allowing the model to retain relevant long-term dependencies while forgetting irrelevant information. This characteristic proved particularly valuable for solar power forecasting, where certain historical patterns (such as seasonal trends) need to be preserved while short-term noise should be filtered out. However, the model showed slightly reduced performance during rapid weather transitions and extreme meteorological events, indicating some limitations in handling highly dynamic conditions.
CNN-BiLSTM achieved an R² score of 0.5867, with an MAE of 33,892 W and an MSE of 2.08 × 10⁹ W², positioning it as the third-best individual model. The integration of CNN's feature extraction with BiLSTM's bidirectional processing creates a powerful combination for capturing both spatial and temporal patterns. The CNN component effectively identifies local correlations and patterns within the input sequences, creating abstract feature representations that enhance the BiLSTM's ability to model temporal dependencies.
The bidirectional nature of the LSTM component allows the model to consider both past and future information when making predictions, which is particularly beneficial for identifying trends and patterns that might not be apparent in unidirectional processing. However, the increased complexity of the BiLSTM architecture compared to GRU results in higher computational requirements and longer training times, which may limit its practical applicability in real-time forecasting scenarios.
Underperforming models analysis:
CNN-LSTM recorded the most modest performance among all evaluated models, with an R² score of 0.5234, MAE of 36,745 W, and MSE of 2.45 × 10⁹ W². The model struggled particularly with high variability in the input sequences, likely due to the traditional LSTM's limitations in learning from extremely long sequences effectively. The vanishing gradient problem, despite LSTM's gating mechanisms, becomes more pronounced in long-term solar power forecasting scenarios where dependencies can span multiple days or weeks.
The CNN component, while effective at capturing local patterns, may have introduced additional complexity without proportional benefits when combined with the vanilla LSTM architecture. The model showed weakness in handling seasonal transitions and long-term weather pattern changes, suggesting that the feature extraction and temporal modeling components were not optimally integrated.
CNN-Transformer, despite its theoretical advantages in handling long-range dependencies through self-attention mechanisms, achieved an R² score of 0.4987, MAE of 38,234 W, and MSE of 2.67 × 10⁹ W². The underperformance can be attributed to several factors: first, the self-attention mechanism, while powerful for natural language processing tasks, may not be optimally suited for the specific characteristics of solar irradiance time series data. Second, the model requires substantial amounts of training data to effectively learn attention patterns, and the available dataset size may have been insufficient for optimal Transformer performance.
Additionally, the CNN-Transformer architecture showed high sensitivity to hyperparameter settings and required extensive computational resources for training and inference. The attention weights analysis revealed that the model struggled to identify the most relevant time steps for prediction, often focusing on less informative portions of the input sequences.
Visual performance analysis
The hyperparameter configurations for all five hybrid models, including optimizer settings, learning rates, network architecture, and activation functions, are presented in Table 2.
| Parameter | CNN-RF-BiLSTM | CNN-LSTM | CNN-BiLSTM | CNN-GRU | CNN-Transformer |
| Optimizer | Adam | Adam | Adam | Adam | Adam |
| Learning Rate | 0.001 | 0.001 | 0.001 | 0.001 | 0.0005 |
| Batch Size | 32 | 32 | 32 | 32 | 32 |
| Epochs | 50 | 50 | 50 | 50 | 50 |
| Dropout | 0.3 | 0.3 | 0.3 | 0.3 | 0.2 |
| Hidden Units | 128 | 128 | 128 | 128 | 256 |
| No. of Layers | 5 | 5 | 5 | 5 | 6 |
| Activation Function | ReLU (CNN), tanh (BiLSTM) | ReLU (CNN), tanh (LSTM) | ReLU (CNN), tanh (BiLSTM) | ReLU (CNN), tanh (GRU) | GELU (Transformer), ReLU (CNN) |
| Other Parameters | RF: estimators=100, max depth=10 | Sequence length = 30 | Bidirectional after CNN | GRU cells replace LSTM | 8 heads, FF dim = 512 |
Table 2: Hyperparameter configurations deployed across various models during training. Detailed hyperparameter settings used for training each hybrid deep learning model, including optimization parameters, network architecture specifications, and model-specific configurations.
Figure 7 provides a comprehensive visual comparison of predicted versus actual solar power generation using all evaluated models. The analysis reveals distinct patterns in model performance and prediction reliability. Table 3 shows the evaluation results of each Hybrid DL model.

Figure 7: Predicted vs actual solar power generation plots using each evaluated DL model. Comparative scatter plots displaying predicted versus actual power output for all five individual hybrid models (RF-BiLSTM, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-Transformer), illustrating prediction accuracy and error patterns for each architecture. Please click here to view a larger version of this figure.
| Model | R2-Score | MAE (W) | MSE (W²) |
| RF-BiLSTM | 0.6568 | 30728.734 | 1812248443 |
| CNN-LSTM | 0.5283 | 37811.209 | 2560838256 |
| CNN-BiLSTM | 0.5867 | 34866.46 | 2182387540 |
| CNN-GRU | 0.6091 | 33526.944 | 2064043255 |
| CNN-Transformer | 0.5842 | 34308.202 | 2195705401 |
Table 3: Evaluation results of hybrid deep learning models. Performance comparison of five hybrid deep learning architectures based on coefficient of determination (R²), Mean Absolute Error (MAE), and Mean Squared Error (MSE) metrics on the test dataset.
Enhanced ensemble performance
The implemented ensemble yielded significantly improved performance with an R² score of 0.6933, representing a substantial 5.6% improvement over the best individual model (RF-BiLSTM). The ensemble achieved an MAE of 28,809.89 W (6.2% reduction from RF-BiLSTM) and an MSE of 1.53 × 10⁹ W² (15.5% reduction), demonstrating the effectiveness of the ensemble approach in reducing both absolute and squared prediction errors.
The enhanced results confirm that strategically weighted deep learning ensembles can significantly improve forecasting accuracy through several mechanisms. First, the ensemble reduces prediction variance by averaging out individual model errors that tend to be uncorrelated. Second, it captures different aspects of the underlying solar generation patterns through diverse architectural approaches. Third, the ensemble provides more robust predictions by reducing the impact of any single model's weaknesses or failure modes.
Figure 8 shows predicted vs actual solar power generation plots using the Ensemble model.

Figure 8: Predicted vs actual solar power generation plots using the Ensemble model. Scatter plot comparing the weighted ensemble model's predictions against actual solar power generation values, demonstrating improved prediction accuracy through strategic model combination. Please click here to view a larger version of this figure.
Figure 9, Figure 10, and Figure 11 show the comparative analysis of the hybrid DL models and their ensemble result.

Figure 9: R²-Score comparison between the five hybrid DL models and their ensemble model. Bar chart comparing coefficient of determination (R²) values across all five individual hybrid architectures and the final weighted ensemble, highlighting relative explained variance in predictions. Please click here to view a larger version of this figure.

Figure 10: MAE comparison between the five hybrid DL models and their ensemble model. Bar chart comparing Mean Absolute Error (MAE) values across all five individual hybrid models and the weighted ensemble, illustrating absolute prediction error magnitudes. Please click here to view a larger version of this figure.

Figure 11: MSE comparison between the five hybrid DL models and their ensemble model. Bar chart comparing Mean Squared Error (MSE) values across all five individual hybrid architectures and the weighted ensemble, demonstrating squared error penalties and model precision. Please click here to view a larger version of this figure.
Computational complexity and training time comparison across models
The computational complexity analysis reveals CNN-GRU as the most efficient model on Google Colab's T4 GPU with only 4.5 s training time (~5 ms/step) and competitive accuracy (R² = 0.609), significantly outperforming other architectures due to GRU's simpler three-gate structure versus LSTM's four gates. RF_BiLSTM achieves the highest accuracy (R² = 0.657) in moderate 15 s through effective Random Forest dimensionality reduction, while CNN-BiLSTM demonstrates balanced performance with 22 s training and early convergence at 12 epochs utilizing learning rate scheduling. CNN-LSTM proves least efficient on T4 GPU, requiring 47 s with high time variance (24-55 ms/step) yet yielding the lowest performance (R²=0.528), indicating memory bottlenecks and poor GPU utilization. CNN-Transformer shows unstable convergence with fluctuating validation loss despite 18-s training time. Regarding computational complexity, GRU operates at O(3 × d × h), LSTM at O(4 × d × h), BiLSTM at O(2 × 4 × d × h), and Transformer at O(n² × d) where attention mechanisms introduce quadratic sequence dependency. The T4 GPU's 16GB memory and Tensor Cores effectively accelerate CNN-GRU's parallel computations. For real-time CSP plant DNI prediction, CNN-GRU emerges as optimal with the best time-to-performance ratio (7.4 s per 0.1 R² improvement) and minimal inference latency, making it production-ready for operational deployment, while RF_BiLSTM remains preferable when maximum prediction accuracy is the primary objective despite slightly higher computational requirements on the T4 platform.
DATA AVAILABILITY:
Raw data used is uploaded as Supplementary File 1.
Supplementary File 1: Raw data of this study. Please click here to download this File.
Supplementary Coding File 1: Solar power prediction.ipynb. Please click here to download this File.
The proposed methodology follows a structured workflow as shown in Figure 12. Initially, the dataset undergoes comprehensive preprocessing, including missing value imputation, normalization, and feature engineering, to ensure data quality and enhance model learning3,6. The processed dataset is then partitioned into training (70%), validation (15%), and testing (15%) sets to enable robust model development and performance evaluation2,9. Subsequently, five hybrid deep learning models: RF-BiLSTM, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-Transformer are independently trained and evaluated using the validation set. Based on their individual performance, a weighted averaging ensemble is constructed, assigning greater weight to models with higher R² scores to emphasize their predictive strength. The final ensemble model is then assessed using standard forecasting metrics, including R², mean absolute error (MAE), and mean squared error (MSE), to determine its accuracy, stability, and generalization capability in solar power prediction1,10. This DL-only framework aims to exploit the complementary strengths of diverse neural architectures to achieve superior forecasting performance11,19. Detailed implementation of the proposed deep learning models and ensemble methodology is available in Supplementary Coding File 1.

Figure 12: Architecture diagram of the proposed methodology. Comprehensive workflow diagram illustrating the complete forecasting framework from data preprocessing, dataset partitioning, individual model training and evaluation, performance-weighted ensemble construction, to final prediction generation and assessment. Please click here to view a larger version of this figure.
This work contributes to scientific progress in several meaningful ways. The research confirms that combining diverse neural network designs through ensemble techniques yields better predictions than relying on any single architecture, with the combined system achieving R² of 0.6933 compared to 0.6568 for the strongest individual model. This validates the concept that different architectures capture distinct temporal patterns in solar generation data, and their integration produces more comprehensive forecasting capability. Additionally, through rigorous comparison of five hybrid configurations, the study offers practical recommendations for architecture selection, helping researchers and practitioners navigate the complex landscape of model choices. The purely data-driven nature of the approach eliminates the need for extensive meteorological modeling or site-specific calibration, making it adaptable to various deployment scenarios where quick implementation across multiple locations is required. The performance-weighted combination strategy also establishes a principled method for model integration that recognizes quality differences among constituent models rather than treating them uniformly.
However, certain constraints must be recognized. While the ensemble shows improvement, the R² value of 0.6933 indicates that nearly one-third of output variation remains unexplained, suggesting opportunities for further refinement through better feature representation, enhanced hyperparameter tuning, or improved data quality. The study concentrates solely on producing single-point predictions without quantifying uncertainty or providing probability distributions, which limits its utility for risk-aware operational planning. Running and maintaining five complex neural networks simultaneously demands considerable computational resources, potentially creating barriers for applications with limited processing power or strict latency requirements, such as real-time edge systems. The research does not deeply explore model transparency or interpretability, leaving questions about which input features drive predictions and how the models relate to underlying physical processes. Additionally, whether the ensemble performs consistently across different geographic regions, climate types, or solar installation characteristics remains untested, raising questions about its broader applicability.
Several alternative methodologies could be employed to address solar forecasting challenges. Pure transformer architectures with sophisticated attention layers might capture temporal dependencies more efficiently without recurrent components, potentially reducing training time while improving parallelization. Graph-based neural networks could model spatial correlations in distributed solar arrays or weather station networks more effectively. Bayesian deep learning offers a framework for incorporating uncertainty estimates directly into neural network predictions. Physics-informed networks that encode fundamental solar radiation laws into the learning algorithm could enhance reliability while reducing data dependence. Simultaneously predicting multiple related variables, such as different irradiance components along with power output, through multi-task learning might leverage correlated information to strengthen primary forecasts. Transfer learning and meta-learning techniques could enable rapid model customization for new sites with minimal historical records, addressing deployment challenges in data-limited environments.
The methodology presents valuable applications across numerous domains. For power system operations, improved forecast accuracy enables better generation scheduling, decreased backup capacity needs, and more efficient battery storage management, yielding both cost savings and enhanced grid stability. Energy integration research benefits from having reliable tools to evaluate scenarios with high renewable penetration and assess necessary flexibility measures. Market participants can leverage accurate predictions for strategic bidding and minimizing financial penalties from generation-load mismatches. Microgrid operators particularly benefit from precise forecasts when making autonomous operation decisions, coordinating distributed resources, and balancing local supply-demand. The systematic architectural comparison methodology extends beyond solar applications to other renewable forecasting problems and time-series prediction challenges where combining multiple model types proves advantageous.
Several priority areas should guide future investigations. Creating specialized attention mechanisms and transformer variations tailored for solar forecasting could better handle sudden weather changes and rapid generation fluctuations. Integrating diverse data streams satellite observations, ground-based sky imaging, weather model outputs, and distributed sensor measurements would provide richer input information and potentially improve performance across varied atmospheric conditions. Exploring adaptation techniques that allow trained models to function effectively in new geographic settings or system types without complete retraining would enhance practical deployment feasibility. Incorporating transparency tools such as attention pattern visualization, input sensitivity analysis, and causal relationship identification would increase user trust and system interpretability. Expanding the framework to generate probability distributions or prediction intervals through quantile methods, ensemble-based uncertainty measures, or Bayesian approaches would better support decision-making under uncertainty. Developing streamlined model versions through compression, selective parameter removal, and precision reduction would enable deployment on resource-limited hardware at individual solar sites or microgrid controllers. Conducting extensive real-world testing across diverse operational environments would verify whether laboratory performance translates to practical reliability, helping bridge the gap between theoretical development and field implementation.
The authors have nothing to disclose. During the preparation of this manuscript, the authors used Claude AI (Anthropic) and ChatGPT (OpenAI) for the following purposes: literature review assistance, grammar and language editing, code debugging and optimization for machine learning models, and formatting of technical content. All AI-generated content was carefully reviewed, edited, and verified by the authors. The authors take full responsibility for the content of the published article.
We thank Megha Engineering and Infrastructures Ltd for providing the necessary data, resources and support to carry out this work.
| BiLSTM | TensorFlow/Keras | TensorFlow 2.10.0 | |
| CNN layers | TensorFlow/Keras | TensorFlow 2.10.0 | |
| Google Colab | Google LLC | Cloud Platform | |
| GRU | TensorFlow/Keras | TensorFlow 2.10.0 | |
| Matplotlib | Matplotlib Dev Team | 3.7.1 | |
| NumPy | NumFOCUS | 1.25.2 | |
| NVIDIA T4 GPU | NVIDIA Corporation | Tesla T4 | |
| Pandas | NumFOCUS | 2.0.3 | |
| Pyrheliometer for DNI measurement | Kipp & Zonen | CH1-DL | |
| Python | Python Software Foundation | 3.10.12 | |
| Random Forest | Scikit-learn Developers | 1.2.2 | |
| Scikit-learn | Scikit-learn Developers | 1.2.2 | |
| Temperature sensors | Vaisala | HMP155 | |
| TensorFlow/Keras | Version 2.10.0 | ||
| Transformer | TensorFlow/Keras | TensorFlow 2.10.0 | |
| Weather station | Davis Instruments | Vantage Pro2 |