RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
A subscription to JoVE is required to view this content. Sign in or start your free trial.
Research Article
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
This study proposes an Attention-Gated Liquid Neural Network for water quality anomaly detection that achieves superior accuracy and interpretability through continuous-time modeling and attention-based gating, thereby enhancing the reliability of environmental monitoring and supporting sustainable water management.
With the rapid expansion of water monitoring networks, time-series anomaly detection has become increasingly crucial for safeguarding aquatic environments and ensuring their sustainable management. However, conventional models often struggle with irregular sampling intervals, multivariate correlations, and interpretability in practical applications. To address these challenges, this study proposes an Attention Gated-Liquid Neural Network (AG-LNN) that integrates the dynamic modeling capability of the Liquid Neural Network (LNN) with attention-based gating mechanisms. The model introduces an input-attention gate to emphasize anomaly-relevant variables such as Dissolved Oxygen (DO) and the Permanganate Index (CODMn), and a time-constant gate that adaptively adjusts the model's temporal memory. Using data from the China National Environmental Monitoring Center (CNEMC) collected between 2019 and 2024 across 13 provinces, AG-LNN demonstrated superior performance over baseline models, including Long Short-Term Memory (LSTM), Temporal Convolutional Network (TCN), Transformer, and Graph Neural Network (GNN) architectures. It achieves a Precision-Recall Area Under Curve (PR-AUC) of 0.95 and an F1-score of 0.90, while maintaining stability under cross-region and temporal evaluations. A compact version, AG-LNN-light, reduces parameters by 62% with minimal accuracy loss, enabling efficient edge deployment. The results confirmed that attention-gated continuous-time modeling provides a robust and interpretable approach for large-scale water quality anomaly detection.
Water quality monitoring is essential for protecting public health, sustaining aquatic ecosystems, and ensuring compliance with environmental regulations. Traditional water quality models, such as the Soil and Water Assessment Tool (SWAT), the mechanistic QUAL model, and statistical forecasting approaches, often struggle to capture the highly nonlinear, multivariate, and temporally irregular nature of water systems influenced simultaneously by chemical, biological, hydrological, and meteorological factors1,2,3,4,5,6. Therefore, improving anomaly detection in water-quality data remains a pressing challenge.
In practical terms, the proposed framework is designed for multi-parameter water quality sensing networks that measure variables such as potential of hydrogen (pH), dissolved oxygen (DO), total nitrogen (TN), total phosphorus (TP), and permanganate index (CODMn), turbidity, and conductivity, with typical sampling intervals ranging from 15 to 60 min. This model tolerates irregular sampling and missing observations, making it suitable for real-world deployment in surface water and reservoir monitoring systems.
Application of machine learning in water quality anomaly detection
In recent years, machine learning (ML) and deep learning (DL) methods have gained traction for water quality applications. For example, Wang et al.7 introduced a Long Short-Term Memory Autoencoder with Attention (LSTMA-AE) combined with mechanistic constraints to improve accuracy and reduce false alarms in anomaly detection of water injection pump operations, achieving significantly better performance than the interpolation, random forest, or LSTM-AE methods.
Similarly, Zhao et al.8 proposed a Gated Recurrent Unit with Physics-Informed Neural Network (GRU-PINN) model that incorporates physical constraints into the loss function, significantly boosting interpretability and the F1-score in water quality anomaly detection tasks. ElShafeiy et al.9 developed a Multivariate Convolutional Network-LSTM (MCN-LSTM) architecture, combining Multivariate Convolutional Networks with LSTM for real-time anomaly detection in water quality sensor data, demonstrating enhanced detection capabilities in field environments. Another advancement is the Gated-Liquid Neural Network (Gated-LNN) model10, which fuses gating mechanisms into Liquid Neural Networks to accurately predict the Water Quality Index (WQI) and classify water quality, achieving R2 ≈ 0.9995 and classification accuracy of 99.74% on Indian datasets.
Time series anomaly detection
Beyond DL-specific applications, time-series anomaly detection has seen rapid development across various domains. Time-series anomaly detection has become a key technique for improving water quality monitoring and early warning. Traditional statistical methods often fail to capture the nonlinear and dynamic characteristics of water quality data, thereby prompting the application of deep learning models. For instance, Zhang et al.11 combined Empirical Mode Decomposition (EMD) with LSTM to better handle non-stationary signals, significantly improving prediction accuracy of Chemical Oxygen Demand (COD), Biochemical Oxygen Demand (BOD₅), Total Phosphorus (TP), Total Nitrogen (TN), and Ammonia Nitrogen (NH₃-N). Building on this, Wang et al.12 proposed an LSTM-based fluctuation analysis method using Approximate Entropy, which enhances real-time anomaly detection performance.
Recent advances have emphasized attention mechanisms and hybrid frameworks. Arepalli et al.13 introduced a lightweight spatially shared attention LSTM for hypoxia detection in aquaculture, achieving 99.8% accuracy, while Zhang et al.14 showed that spatial and temporal attention significantly improved CNN-LSTM prediction of Dissolved Oxygen (DO) and NH₃-N. Similarly, Long et al.15 integrated Complete Ensemble Empirical Mode Decomposition with Adaptive Noise-Variational Mode Decomposition (CEEMDAN-VMD) with attention-enhanced LSTM, achieving Nash-Sutcliffe Efficiency (NSE) values up to 0.99 across multiple indicators. At the application level, Xie et al.16 demonstrated a mobile LSTM-sequence-to-sequence (Seq2Seq) system for operational, real-time water quality prediction across river basins.
Overall, these studies highlight a clear trend in LSTM-based baselines, attention-augmented hybrid models, and system-level applications. Although accuracy has improved substantially, challenges remain in interpretability, robustness under noisy conditions, and generalization across diverse monitoring networks.
Application of graph neural networks and spatiotemporal models in environmental monitoring
Graph neural networks (GNNs) and spatiotemporal deep learning models have recently demonstrated significant potential in environmental monitoring by capturing both spatial correlations among monitoring sites and temporal dependencies in water quality dynamics. For example, Wu et al.17 proposed a pre-training enhanced Spatio-Temporal Graph Neural Network (PT-STGNN) for wastewater treatment plants, integrating transformer-based pre-training and graph structure learning to improve long-term prediction of COD, NH₃-N, TP, TN, pH, and flow rate. Similarly, Wan et al.18 developed a Spatio-Temporal Feature GNN (STF-GNN) that combined graph convolution, GRU, and attention mechanisms, significantly improving Dissolved Oxygen (DO) and TN predictions while demonstrating robust cross-basin generalization.
Hybrid frameworks have emerged beyond purely data-driven designs to enhance generalization. Mu et al.19 introduced the spatiotemporal graph physics-informed neural network (ST-GPINN), which embeds hydraulic principles into Graph Neural Network (GNN)-based models for water distribution systems. By coupling graph representations with physics-informed constraints, ST-GPINN achieves state-of-the-art accuracy while scaling effectively from small to large networks. Together, these studies highlight the growing role of spatio-temporal GNNs in environmental monitoring, advancing predictive accuracy, interpretability, and robustness, and laying the foundation for next-generation intelligent water management systems.
Advantages and potential of liquid neural networks
Liquid Neural Networks (LNNs), also known as Liquid Time-Constant networks, uniquely model continuous-time dynamics using input-dependent, learnable time constants. Hasani et al.20 introduced LNNs and demonstrated their expressive power, stability, and efficiency in time-series tasks. Their brain-inspired adaptability makes them promising candidates for modeling dynamic environmental phenomena such as water quality, yet their integration with attention mechanisms remains underexplored.
Attention mechanisms, especially attention gates, are powerful tools for focusing on relevant features and enabling interpretability. Although widely used in medical imaging, Attention U-Net sets a foundation for selective feature refinement via learnable attention gates. Attention has also been applied in sensor-based anomaly detection frameworks, such as LSTMA-AE7 and GRU-PINN8, enhancing sensitivity to critical patterns and improving model transparency.
Taken together, these observations indicate the need for a hybrid architecture that synergizes continuous-time adaptability with interpretable attention mechanisms. Hence, we propose an Attention-Gated Liquid Neural Network (AG-LNN) that integrates the dynamic modeling capabilities of Liquid Neural Networks (LNNs) with attention gates that focus on salient inputs or temporal segments. This architecture aims to enhance robustness to sensor noise and irregular sampling, improve sensitivity to short-duration anomalies, and offer interpretability through attention heatmaps and adaptive time-constant trajectories. The AG-LNN was evaluated on real-world multivariate water quality datasets and benchmarked against LSTM, Transformer, pure LNN, and traditional ML baselines. Both quantitative (Precision, Recall, F1, AUC) and qualitative (attention visualizations, time-constant dynamics) analyses demonstrated the effectiveness and transparency of the proposed model.
NOTE: The overall workflow of this study is shown in Figure 1.
1. Data acquisition
2. Variable selection and labeling
(1)
(2)
is the lower and upper quantile thresholds (1st and 99th percentiles) for variable i in season s. Flag a data point as anomalous if it exceeds these bounds, as shown in Equation (3).
(3)
(4)
(5)
(6)
(7)3. Data quality control and preprocessing
(8)
(9)
(10)
(11)
using the first available non-missing observation and set the initial covariance P0|0 to
. Remove the entire segment if the missing interval exceeds 24 h (96 records) to prevent error accumulation from long-term extrapolation.
(12)
(13)4. Attention-Gated Liquid Neural Network (AG-LNN)
NOTE: The overall architecture of the proposed Attention-Gated Liquid Neural Network (AG-LNN) is illustrated in Figure 2. The model integrates multisource water quality inputs, an attention-based gating mechanism, an LNN with adaptive time constants, and a readout-decoding stage with interpretability outputs. This design allows the network to dynamically capture complex temporal dependencies while remaining robust to noise and non-uniform sampling.
(14)
(15)
(16)
denotes the mean attention weight across C channels at timestep τ; β is a sensitivity coefficient determining how strongly attention influences temporal responsiveness.
is high, λτ decreases, prompting the network to accelerate its response and capture short-lived anomalies better. Impose bound constraints on λτ to stabilize the training.
(17)
(18)
(20)
(21)
(22)
(23)
(24)5. Training procedure
Overall performance comparison
Table 2 presents the comparative results of the different baseline models and the proposed AG-LNN for the water quality anomaly detection task. Among the traditional unsupervised baselines, Isolation Forest22 and One-Class SVM23 achieved only moderate precision and recall, reflecting their limited ability to capture the temporal dependencies inherent in multi-source water quality data. LOF24 performed slightly better by exploiting local density structures, but its performance remained suboptimal compared with deep learning models.
For neural network-based methods, LSTM25 and Bi-GRU26 exhibited improved recall owing to their recurrent structures, although they struggled with long sequences and irregular sampling. TCN27 achieved competitive performance, particularly in terms of precision, by effectively modeling short-term fluctuations. The transformer-based architecture demonstrated strong capability in long-range dependency modeling28, but their sensitivity to noise and missing values led to inconsistent recall and higher false alarms.
LNN20 outperformed conventional RNNs and CNNs, benefiting from its continuous-time modeling ability, which is well-suited for non-uniform sampling, which is common in water quality monitoring. However, the lack of selective input weighting limits its robustness because irrelevant channels sometimes interfere with anomaly detection.
The proposed AG-LNN consistently outperformed all the baselines, achieving the highest precision (0.92), recall (0.89), F1-score (0.90), ROC-AUC (0.95), and PR-AUC (0.94). These results confirm that combining input-attention gating and adaptive time-constant control enables the network to emphasize anomaly-relevant variables (such as DO and CODMn) while dynamically adjusting its temporal responsiveness. Consequently, the AG-LNN demonstrated a superior capability in detecting both abrupt and subtle water quality anomalies.
Cross-region and sensor-network generalization with GNN baselines
This section evaluates the generalization capability of the proposed AG-LNN model in different hydrological and climatic contexts. Three complementary experiments were conducted without introducing new datasets: (i) cross-validation using the leave-one-basin-out (LOBO) strategy, (ii) temporal out-of-distribution (OOD) testing across different years and seasons, and (iii) robustness evaluation under sensor network perturbations. In addition to sequence-based baselines (LSTM, TCN, Transformer), three representative Graph Neural Network (GNN)-based models (ST-GNN, GDN, and ST-GPINN) were included for direct quantitative comparison.
Cross-region generalization (LOBO evaluation)
To assess spatial generalization, all monitoring stations were divided into five macro-regions (North, South, East, West, and Plateau) according to hydrological and climatic characteristics. In each round, one region was entirely excluded from training and validation and was used as the testing domain to simulate deployment in unseen basins.
As illustrated in Figure 3A, which plots the PR-AUC across the five held-out regions, the AG-LNN consistently achieves the highest precision in every region. Its average PR-AUC ≈ 0.94, and F1 ≈ 0.91, surpassing the best GNN baseline (ST-GNN) by +0.02 to +0.04 in PR-AUC and +0.02 to +0.03 in F1. Figure 3B shows that the performance advantage is particularly evident in the plateau region, where sparse sampling and unstable hydrology cause larger fluctuations in the other models.
Although GNN-based approaches (ST-GNN, GDN, and ST-GPINN) can model spatial correlations, they depend strongly on a complete and stable graph topology. In contrast, the AG-LNN employs continuous-time dynamics and attention-gated time constants to capture temporal dependencies without predefined edges, yielding higher transferability across heterogeneous hydrological environments.
Temporal out-of-distribution
To evaluate the temporal robustness, models trained on 2019-2022 data were directly applied to 2023-2024, with seasonal partitions (Spring, Summer, Autumn, Winter) reflecting inter-annual variability. Figure 4A shows the PR-AUC trends across the four seasons. The AG-LNN maintains a nearly constant performance (PR-AUC ≈ 0.93) throughout the year, with a maximum seasonal drop of ≤ 0.02. The GNN baselines show slightly lower but still stable scores (ST-GNN ≈ 0.92, GDN ≈ 0.91, ST-GPINN ≈ 0.91), whereas Transformer, TCN, and LSTM exhibit larger seasonal sensitivity.
The adaptive time-constant mechanism (λt) in the AG-LNN dynamically adjusts its effective temporal receptive field, allowing the model to represent both long-term hydrological cycles and short-term anomalies. In contrast, graph-based architectures, whose inter-station relations are assumed to be stationary, struggle when these relations vary seasonally. Consequently, the AG-LNN better mitigated seasonal distributional shifts, as demonstrated by the stability curves in Figure 4A.
Sensor-network perturbation robustness (Inference-phase shifts)
The third experiment explored the scalability of the AG-LNN under varying sensor network configurations during deployment. The following inference-phase perturbations were simulated without retraining: (i) coarser sampling (15 to 60 min), (ii) masking of two or three input variables, and (iii) Gaussian noise (+5% SD).
As shown in Figure 4B, the AG-LNN experienced the smallest degradation among all the models. Its average PR-AUC drop remains within 0.01-0.015, whereas GNN baselines decline by 0.015-0.028 and conventional models degrade even more. The figure clearly demonstrates that the AG-LNN preserves stable accuracy even when the data sparsity and noise increase.
The attention-gating mechanism enables the AG-LNN to dynamically reweight available features, while its continuous-time formulation ensures numerical stability under irregular sampling and variable loss. These results confirm that the AG-LNN can flexibly adapt to heterogeneous sensor infrastructures and varying sampling resolutions, which is crucial for large-scale environmental monitoring networks.
Model efficiency and edge-deployment scalability
To evaluate the computational efficiency and deployment potential of the AG-LNN, additional experiments were conducted to analyze its complexity, scalability, and lightweight adaptation.
Computational complexity
The full AG-LNN model contains approximately 5.3 million parameters, corresponding to 1.8 GFLOPs per 96-step sequence inference. This complexity is smaller than that of most GNN-based baselines, such as ST-GNN (6.8M, 2.3 GFLOPs) and GDN (7.1M, 2.6 GFLOPs). On an NVIDIA RTX 3060 GPU, AG-LNN processes one 24 h sequence in 0.21 s, and on a standard CPU (Intel i7-12700), the same inference takes 0.68 s, which meets the real-time requirement for 15 min monitoring intervals.
Lightweight variant: AG-LNN-light
Develop a compact version called AG-LNN-light to enable efficient edge deployment. It is derived from the original model through structured pruning and gate-parameter sharing. The following three strategies were adopted:
Hidden-dimension reduction: The hidden state size and feature embedding dimension were halved (from 256 to 128), directly reducing the matrix multiplications in both the attention and time-constant gates.
Gate-parameter sharing: The two gating submodules (input attention and time-constant modulation) were merged to share a single projection matrix
, eliminating redundant parameters while preserving separate nonlinear activation.
Sparsity pruning: Low-magnitude weights (below a 0.02 threshold of normalized magnitude) were pruned and fine-tuned for five epochs to recover the accuracy.
These modifications reduced the total parameter count by 62% and the FLOPs by approximately 60%, with only a 5% loss in precision (PR-AUC = 0.90 vs 0.94). Table 3 summarizes the comparison of the computational cost and accuracy.
Scalability and deployment
Owing to its continuous-time formulation and reduced parameter coupling, the AG-LNN-light maintains numerical stability even after aggressive pruning. The lightweight variant was successfully deployed on Jetson Xavier NX (8 GB RAM) and Raspberry Pi 5, achieving inference times of 1.2-1.5 s per sequence without retraining. This confirms the feasibility of deploying the AG-LNN-family models on distributed edge devices for real-time water quality anomaly detection.
Ablation study
To investigate the contribution of different components in the AG-LNN, ablation experiments were conducted by progressively removing the input attention gate and time-constant gate and the results are reported in Table 4.
The LNN (without any gating mechanism) outperformed the RNN and CNN-based baselines, demonstrating the advantage of continuous-time modeling for nonuniform water quality time series. However, the recall was limited (0.77), suggesting reduced sensitivity to subtle anomalies.
Adding only the input attention gate significantly improved precision (from 0.85 to 0.88) by suppressing irrelevant channels and highlighting key variables (such as DO and CODMn), thus reducing false positives. However, the recall improvement was modest.
Adding only the time-constant gate resulted in a higher recall (0.83 vs. 0.77), as the network became more responsive to sudden changes by dynamically adapting the temporal scale. However, the precision remained similar to that of the LNN.
The full AG-LNN, combining both gates, achieved the best trade-off with an F1-score of 0.90, ROC-AUC of 0.95, and PR-AUC of 0.94. These results confirm that the two gating strategies are complementary: input attention enhances selectivity, whereas adaptive time constants improve responsiveness. Together, they enable robust and accurate anomaly detection for water quality monitoring.
Visualization of detection and interpretability
To further demonstrate the interpretability of AG-LNN, visualize the outputs of AG-LNN in terms of anomaly scores, attention weights, and adaptive time constants (Figure 5) to further demonstrate its interpretability.
Figure 5A illustrates a case study of anomaly detection at a lake-monitoring station. The AG-LNN successfully detected sharp CODMn spikes and gradual DO declines, closely aligned with the ground truth anomaly labels. Compared to the baselines (not shown here for clarity), the AG-LNN produced fewer false alarms and responded faster to transient events.
Figure 5B shows the attention heatmap for the seven input variables (pH, DO, TN, TP, CODMn, Turbidity, EC). During abnormal periods, higher weights were consistently assigned to DO and CODMn, which aligns with established knowledge of water quality monitoring. This confirms that the input attention mechanism effectively highlights anomaly-relevant variables, while suppressing irrelevant variables.
Figure 5C shows the dynamics of the learned time constant λτ. Under stable water quality conditions, the model maintained relatively large values of λτ, favoring long-term memory. In contrast, during anomaly intervals, λτ decreases significantly, accelerating the network's responsiveness. This adaptive modulation enables the AG-LNN to capture both gradual and abrupt changes in the water quality.
Together, these visualizations confirm that the AG-LNN not only achieves high detection accuracy but also provides interpretable insights into its decision-making process, thereby offering practical value for water quality management.
Beyond expert interpretation, the visualization results were also reformatted into a stakeholder-oriented dashboard. Attention heatmaps and adaptive λ trajectories were converted into intuitive graphics using traffic light colors (green = stable, orange = mild change, and red = critical anomaly). Textual explanations summarize the most influential variables and time intervals, enabling environmental managers to rapidly identify causes and prioritize responses without requiring deep technical knowledge. The overall workflow of the transformation of AG-LNN interpretability outputs into stakeholder-oriented visual information is summarized in Figure 6.
Beyond visual explanation, the interpretability outputs of the AG-LNN provide actionable insights for environmental managers. The attention heatmap in Figure 5A highlights the key variables that dominate the anomaly detection. During abnormal intervals, increased attention weights on DO and CODMn correspond to oxygen depletion and organic-matter surges-two common indicators of water-quality deterioration. Meanwhile, the adaptive λτ trajectory in Figure 5C identifies precise time windows when the model becomes more sensitive to rapid fluctuations.
These interpretable outputs offer more than visual clarity; they provide guidance that can be acted upon in the field. When a sudden change is detected, the corresponding sensors or sampling sites most affected by variations in DO and CODMn should be paid attention to rather than blindly scanning through all the data. The shifting pattern of λτ, rising and falling over time, quietly signals when an anomaly begins and fades, making it easier to decide when to sample and when to respond. Taken together, these cues do not simply describe a problem; they point toward its source. By linking the highlighted variables and their timing, one can often infer whether an unexpected event comes from a local discharge, upstream inflow, or temporary system disturbance, allowing interventions to occur earlier and with greater confidence.
Impact of liquid dynamic parameters (β and λ)
To evaluate how the liquid dynamics mechanism contributes to AG-LNN's performance and stability of the AG-LNN, we conducted additional experiments focusing on two key parameters: the fundamental time constant (λ) and the adjustment coefficient (β). These parameters jointly regulate the temporal responsiveness of the network, analogous to the physical adjustment processes of aquatic systems, where λ controls the characteristic response time, and β governs its adaptability to external disturbances. Vary EQUATION and EQUATION while keeping the other hyperparameters fixed. PR-AUC and F1 were used as evaluation metrics to assess accuracy and stability.
Effect of λ
As shown in Figure 7(A), both the PR-AUC and F1 follow a unimodal pattern as λ increases. The performance peaks around λinit ≈ 1.0, indicating an optimal balance between temporal stability and responsiveness. A smaller λ (< 0.5) leads to overly reactive behavior, increasing false alarms, whereas a larger λ (> 1.5) causes delayed detection of sudden pollution spikes. This aligns with the physical interpretation of λ as the intrinsic adjustment timescale of the system.
Effect of β
Figure 7B shows that β determines how rapidly λ adapts to changing inputs. When β is too low (< 0.1), the model becomes sluggish to respond; when β is too high (> 0.5), λ fluctuates excessively, producing unstable predictions. The best performance occurs at βinit (≈0.3), suggesting that moderate adaptability ensures stable convergence, while retaining flexibility.
Joint influence
The contour in Figure 8 reveals a plateau of high performance (PR-AUC > 0.93) when λ∈[0.7, 1.3] and β∈[0.2, 0.4]. These ranges are consistent with physically plausible hydrological time scales (≈ 12 – 36 h) and moderate sensitivity to disturbances, reinforcing the interpretability of the learned liquid dynamics.
Effect of annotation ratio and weak label accuracy on model reliability
To quantitatively evaluate the effect of annotation composition on model reliability, controlled experiments were conducted by varying the proportion of experts and weak annotations during the training. These five settings are listed in Table 5.
Additionally, the expert-to-weak annotation ratio was fixed at 40:60, and different weak annotation accuracies (0.80, 0.85, 0.90, and 0.95) were used to evaluate the effect of weak label accuracy. Model performance was assessed using PR-AUC, F1-score, and Stability Index (standard deviation of prediction confidence across folds).
Effect of annotation ratio
As shown in Table 6 and Figure 9A, model performance exhibited a clear upward trend as the proportion of expert annotations increased from 20% to 40%. Both the PR-AUC and F1-score steadily improved within this range, indicating that a moderate inclusion of expert-labeled data substantially enhances the model’s reliability and discriminative power. When the expert label ratio exceeded 40%, however, the improvement began to plateau and even slightly declined beyond 50%, suggesting diminishing returns once annotation precision surpassed a critical threshold. The 40:60 configuration emerged as the most balanced setting, achieving the highest PR-AUC (0.938) and F1-score (0.906), while maintaining the lowest Stability Index (0.036). This composition effectively reconciles data sufficiency with annotation quality and maximizes label efficiency without compromising robustness. The gentle curvature of the performance curve in Figure 9A shows that model precision improved steadily when the expert label proportion increased from 20% to 40% but plateaued beyond 50%. The 40:60 configuration yields the best trade-off between performance and data volume, confirming that this ratio maximizes data utilization without compromising annotation reliability.
Effect of weak annotation accuracy
As presented in Table 7 and Figure 9B, the sensitivity of the model to weak-label accuracy revealed a clear transition pattern. When the weak-label accuracy drops below 0.85, both the PR-AUC and F1-score exhibit a noticeable decline, implying that label noise becomes dominant and undermines the reliability of supervision. As accuracy increases from 0.85 to 0.90, performance improves steadily, with PR-AUC rising from 0.922 to 0.938 and F1-score from 0.893 to 0.906, indicating that the model effectively benefits from cleaner annotations within this range. However, beyond the 0.90 threshold, the gain becomes marginal, with results nearly saturating at 0.95 accuracy, suggesting that further refinement of weak labels offers little return. This behavior aligns with the deviation curve in Table 7, where performance variation remains within ±0.1% when accuracy surpasses 0.90. The smooth upward trend in Figure 9B illustrates that the AG-LNN maintains consistent robustness under moderate noise levels, while still leveraging informative signals from weak supervision. In practical applications, maintaining weak-label accuracy around 0.90 appears sufficient to achieve reliable training without incurring unnecessary annotation costs or complexity.
These results demonstrate that the chosen ratio (Expert: Weak = 40:60) achieves the best balance between label diversity and reliability. Increasing weak labels beyond 60% introduces mild noise and instability, whereas higher expert proportions reduce data coverage and generalization.
Weighted learning ablation
To quantitatively evaluate the effectiveness of the proposed confidence-weighted loss formulation, an ablation experiment was performed by varying the relative weights of expert and weak annotations in the loss function. The objective was to determine the optimal configuration of α and β in Equation (2) that balances performance, robustness, and training stability.
Fix the expert-label weight (α) to 1.0 and vary the weak-label weight (β) within the range [0.5, 1.0] in increments of 0.1. Train each configuration under identical conditions using the same random initialization, learning rate, and data splits. Evaluate model performance using PR-AUC, F1-score, and a Stability Index, defined as the standard deviation of prediction confidence across five training folds. Additionally, conduct a sensitivity analysis by combining different weak-label accuracies(aw = 0.85,0.90,0.95) with corresponding β values to visualize the interaction between label accuracy and loss weighting.
The experimental results are visualized in Figure 9Cand Figure 10. As shown in Figure 9C, model precision and F1-score improve progressively as the weak-label weight β increases from 0.5 to 0.7, reaching a peak at β = 0.7 (PR-AUC = 0.939, F1 = 0.907). Beyond this point, both metrics show marginal decline, indicating that excessive reliance on weak annotations introduces minor noise into gradient updates. Figure 10 further demonstrates that the Stability Index, which reflects training consistency, reaches its minimum at the same setting (β=0.7), confirming that this configuration provides the most stable optimization trajectory.
To assess robustness under different levels of weak-label reliability, Figure 11 shows how the PR-AUC varies across combinations of weak-label accuracy and weight. The surface plot shows a broad plateau once the accuracy surpasses 0.90, and the performance remains stable for β values between 0.7 and 0.8. As the accuracy slips to 0.85, the ridge tilts downward and the optimal weighting shifts toward 0.6 to 0.7, suggesting that the model begins to rely more heavily on strong labels to offset noise. Interestingly, the selected configuration (α = 1.0, β = 0.7) is located near the center of this stable zone, delivering the best overall precision while maintaining resilience to moderate fluctuations in weak-label quality.
Robustness under noisy and missing data conditions
To further examine the reliability of the AG-LNN under realistic environmental variations, additional robustness tests were conducted covering multiple noise types and missing data patterns (Figure 12 and Figure 13).
In noise-type experiments, three perturbations were simulated: (i) Gaussian noise (σ = 0.05), (ii) impulse noise (5% spike probability), and (iii) sensor-drift noise (±3% linear bias per day). As shown in Figure 12, AG-LNN maintained PR-AUC ≥ 0.93 under Gaussian noise and ≥ 0.91 under drift noise, whereas ST-GNN and GDN declined below 0.87. The attention-gated liquid-state formulation dynamically adjusts the effective time constant λ, suppresses transient perturbations, and adapts to long-term biases.
In missing-pattern experiments, the model was evaluated under three conditions: (i) random 10% data loss, (ii) block-wise missing intervals (1–3 h), and (iii) feature-correlated dropout (variable-specific). Figure 13shows that AG-LNN sustained F1 = 0.88 and PR-AUC = 0.92, even with 15% block-wise missingness, outperforming ST-GNN and GDN by an average margin of +0.05.
These results confirm that the AG-LNN exhibits high robustness against diverse data imperfections, demonstrating stable performance under both stochastic and systematic disturbances commonly observed in environmental sensor networks.

Figure 1: Overall study design of the proposed water quality anomaly detection framework. The workflow includes seven sequential stages: (1) acquisition of multi-source water quality sensor data, (2) missing and anomaly handling, (3) normalization, (4) windowed sample construction, (5) Attention-Gated Liquid Neural Network (AG-LNN) training with continuous-time modeling and learnable time constants as well as attention gates for channel/time weighting, (6) threshold/probability-based anomaly decision, and (7) result visualization and interpretation using attention heatmaps and liquid time constant curves. Please click here to view a larger version of this figure.

Figure 2: Architecture of the Attention-Gated Liquid Neural Network (AG-LNN). The model processes multi-source water quality data through an input attention gate, a time-constant gate, and a liquid neural layer. Predictions are generated by the readout module, and anomalies are identified via residual scoring. Attention heatmaps and time-constant trajectories provide interpretability. Please click here to view a larger version of this figure.

Figure 3: LOBO generalization performance across regions with GNN baselines. (A) The AG-LNN consistently achieves the highest PR-AUC across most regions, indicating strong spatial generalization and adaptability to unseen hydrological contexts. (B) Across five held-out regions, AG-LNN consistently demonstrates superior adaptability. Its F1 advantage is most evident in geographically diverse areas such as the plateau, suggesting that the model effectively captures spatial heterogeneity in water quality dynamics. Please click here to view a larger version of this figure.

Figure 4: Temporal and inference-time robustness of AG-LNN with GNN baselines. (A) Across spring, summer, autumn, and winter, AG-LNN sustains the most consistent precision, while other models exhibit stronger seasonal sensitivity. This stability indicates that its dynamic design effectively captures long-term temporal dependencies in changing aquatic environments. (B) The proposed AG-LNN maintains the highest precision across all inference time shift scenarios, showing strong robustness under sampling coarsening, variable masking, and added sensor noise. Please click here to view a larger version of this figure.

Figure 5: Visualization of anomaly detection and interpretability in AG-LNN. (A) Anomaly detection curve showing AG-LNN anomaly scores, threshold, and ground truth labels over time. (B) Attention heatmap across seven water quality variables (pH, DO, TN, TP, CODMn, Turbidity, EC), highlighting increased weights on DO and CODMn during anomaly periods. (C) Time-constant dynamics λT, illustrating larger values under stable conditions and reduced values during anomalies, enabling faster responsiveness. Please click here to view a larger version of this figure.

Figure 6: Integration of AG-LNN outputs into an intuitive dashboard for visualizing anomalies and variable importance. Through a compact layout, the system reveals which parameters drive each event and when the risk escalates, allowing experts to trace anomalies and plan timely interventions. Please click here to view a larger version of this figure.

Figure 7: Effect of liquid dynamic parameters on model performance. (A) Scores peak around λinit ≈ 1.0, reflecting a balance between responsiveness and stability. (B) Moderate β yields optimal accuracy; extreme values cause under-adaptation or unstable updates. Please click here to view a larger version of this figure.

Figure 8: Joint effect of λ and β on PR-AUC (contour). A stable high-accuracy region emerges for λ∈[0.7, 1.3] and β∈[0.2, 0.4]. Please click here to view a larger version of this figure.

Figure 9: Effects of expert weak annotation configuration on model performance. (A) PR-AUC and F1 continue to rise as the expert ratio increases from 20% to 40%, reaching a peak at 40:60 (PR-AUC=0.938, F1=0.906), and then the growth slows down. The supplementary figure shows that the stability index (the lower the better) is optimal at 40:60. (B) When the weak annotation accuracy falls below 0.85, the model performance declines significantly; when it is ≥0.90, the PR-AUC and F1 scores approach saturation (0.938 / 0.906), and an accuracy of 0.95 yields only marginal improvement. (C) Both PR-AUC and F1-score peak at β=0.7, confirming the optimal trade-off between weak-label contribution and noise control. Please click here to view a larger version of this figure.

Figure 10: Stability index versus weak-label weight. The lowest variance of prediction confidence is observed at β=0.7, indicating maximal training stability. Please click here to view a larger version of this figure.

Figure 11: Sensitivity of PR-AUC to weak-label accuracy and weight. The high-performance plateau (PR-AUC>0.935) consistently occurs within β∈[0.7, 0.8] when weak-label accuracy ≥0.90. Please click here to view a larger version of this figure.

Figures 12: Robustness under different noise types. The AG-LNN maintains the highest PR-AUC under Gaussian, impulse, and drift noise, demonstrating strong resistance to signal distortion and superior stability compared with graph-based baselines. Please click here to view a larger version of this figure.

Figures 13: Performance under different missing-data patterns. The AG-LNN maintains the highest F1 score across all missing-data types, demonstrating strong resilience to random loss, block-wise gaps, and feature-correlated missingness. Please click here to view a larger version of this figure.
| Field Name | Type | Unit / Format | Description |
| province | string | — | Name of the administrative province where the monitoring station is located. |
| region | string | — | Macro-region classification (North, South, East, West, or Plateau) assigned based on provincial geography. |
| basin | string | — | Primary river basin associated with the monitoring station (e.g., Yangtze, Pearl River, Huai). |
| station_id | string | — | Unique identifier for each station, composed of the province code + three-digit number (e.g., GD001). |
| section_name | string | — | Text label describing the monitored water section or site name. |
| monitor_time | datetime | YYYY-MM-DD HH:MM | Observation timestamp (local UTC + 8), rounded to 15-minute intervals. |
| season | string | — | Season automatically derived from monitoring date (Spring, Summer, Autumn, Winter). |
| waterbody_type | string | — | Type of monitored waterbody: Lake or Reservoir. |
| water_quality_class | string | — | Overall surface-water quality class (I–V / Inferior V) following Chinese GB 3838-2002 standard categories. |
| water_temp_c | float | °C | Measured or simulated water temperature. |
| pH | float | — | Hydrogen-ion concentration index. |
| do_mgL | float | mg/L | Dissolved-oxygen concentration. |
| conductivity_uScm | float | μS/cm | Electrical conductivity of water sample. |
| turbidity_NTU | float | NTU | Turbidity value indicating suspended-solid content. |
| codmn_mgL | float | mg/L | Permanganate index (CODmn), measuring oxidizable organic matter. |
| ammonia_mgL | float | mg/L | Ammonia-nitrogen (NH₃-N) concentration. |
| tp_mgL | float | mg/L | Total phosphorus concentration. |
| tn_mgL | float | mg/L | Total nitrogen concentration. |
| chlorophyll_a_mgL | float | mg/L | Chlorophyll-a concentration (indicator of algal biomass). |
| algal_density_cellsL | integer | cells/L | Estimated algal cell density in the water column. |
| station_status | string | — | Operational state of the monitoring station (“Normal” / “Maintenance”). |
| lon | float | decimal degrees | Longitude placeholder (can be populated using CNEMC site coordinates). |
| lat | float | decimal degrees | Latitude placeholder (can be populated using CNEMC site coordinates). |
Table 1: Field Description of cnemc_data_2019_2024.
| Model | Precision | Recall | F1-score | ROC-AUC | PR-AUC |
| Isolation Forest22 | 0.68 | 0.55 | 0.61 | 0.7 | 0.64 |
| One-Class SVM23 | 0.65 | 0.58 | 0.61 | 0.72 | 0.66 |
| LOF24 | 0.7 | 0.62 | 0.66 | 0.74 | 0.69 |
| LSTM25 | 0.78 | 0.72 | 0.75 | 0.82 | 0.79 |
| Bi-GRU26 | 0.79 | 0.73 | 0.76 | 0.83 | 0.8 |
| TCN27 | 0.81 | 0.71 | 0.76 | 0.84 | 0.81 |
| Transformer28 | 0.82 | 0.74 | 0.78 | 0.86 | 0.83 |
| LNN20 | 0.85 | 0.77 | 0.81 | 0.89 | 0.86 |
| AG-LNN (proposed) | 0.92 | 0.89 | 0.9 | 0.95 | 0.95 |
Table 2: Quantitative comparison of anomaly detection models. Performance was evaluated across multiple metrics, including Precision, Recall, F1-score, ROC-AUC, and PR-AUC. Compared models include classical machine learning baselines (Isolation Forest, One-Class SVM, LOF), recurrent models (LSTM, Bi-GRU), sequence models (TCN, Transformer), the LNN, and the proposed AG-LNN.
| Model | Parameters (M) | FLOPs (G) | PR-AUC | Inference Time (s) | Relative Accuracy |
| AG-LNN | 5.3 | 1.8 | 0.94 | 0.21 (GPU) / 0.68 (CPU) | 100% |
| AG-LNN-light | 2 | 0.7 | 0.9 | 0.10 (GPU) / 0.34 (CPU) | 95% |
| ST-GNN | 6.8 | 2.3 | 0.92 | 0.27 (GPU) | 98% |
| GDN | 7.1 | 2.6 | 0.91 | 0.32 (GPU) | 97% |
Table 3: Comparison of Model Complexity, Efficiency, and Accuracy among AG-LNN, AG-LNN-light, and GNN-based Baselines.
| Variant | Precision | Recall | F1-score | ROC-AUC | PR-AUC |
| LNN | 0.85 | 0.77 | 0.81 | 0.89 | 0.86 |
| + Input Attention Gate only | 0.88 | 0.79 | 0.83 | 0.91 | 0.88 |
| + Time-constant Gate only | 0.86 | 0.83 | 0.84 | 0.92 | 0.89 |
| Full AG-LNN | 0.92 | 0.89 | 0.9 | 0.95 | 0.94 |
Table 4: Ablation study of AG-LNN variants. Performance of different model configurations was evaluated using precision, recall, F1-score, ROC-AUC, and PR-AUC. The full AG-LNN, which combines both the input attention gate and the time-constant gate, achieved the best overall performance, confirming the complementary effects of the two gating mechanisms.
| Experiment ID | Expert Labels (%) | Weak Labels (%) | Weak Label Accuracy (Simulated) |
| E1 | 20 | 80 | 0.9 |
| E2 | 30 | 70 | 0.9 |
| E3 | 40 | 60 | 0.9 |
| E4 | 50 | 50 | 0.9 |
| E5 | 60 | 40 | 0.9 |
Table 5: Experimental settings for evaluating the impact of expert–weak annotation composition and weak-label accuracy on model reliability.
| Weak Label Accuracy | PR-AUC | F1-score | Deviation (%) |
| 0.8 | 0.904 | 0.879 | 4.8 |
| 0.85 | 0.922 | 0.893 | 3.1 |
| 0.9 | 0.938 | 0.906 | — |
| 0.95 | 0.939 | 0.907 | 0.1 |
Table 6: Effect of expert–weak annotation ratio on model performance. This table reports the quantitative performance of AG-LNN under different proportions of expert and weak annotations used for training. Both PR-AUC and F1-score increase steadily as the expert label percentage rises from 20% to 40%, reaching a peak at the 40:60 configuration, where the stability index is also minimal. Beyond this ratio, performance gains saturate and slightly decline, indicating that the 40:60 composition provides the best trade-off between annotation reliability and data coverage.
| Weak Label Accuracy | PR-AUC | F1-score | Deviation (%) |
| 0.8 | 0.904 | 0.879 | 4.8 |
| 0.85 | 0.922 | 0.893 | 3.1 |
| 0.9 | 0.938 | 0.906 | — |
| 0.95 | 0.939 | 0.907 | 0.1 |
Table 7: Effect of weak annotation accuracy on model performance (at fixed 40:60 expert–weak ratio). This table presents the quantitative results of AG-LNN under different weak-annotation accuracy levels while keeping the expert–weak ratio fixed at 40:60. Both PR-AUC and F1-score increase notably as weak-label accuracy improves from 0.80 to 0.90, beyond which the gains become marginal. The deviation column denotes the relative performance change (%) with respect to the optimal configuration (accuracy = 0.90).
This study introduces an Attention-Gated Liquid Neural Network (AG-LNN), a novel architecture for water quality anomaly detection that combines continuous-time liquid dynamics with attention-based gating mechanisms. The liquid neural component of an AG-LNN is inspired by Liquid Time-constant Networks (LTCs), which model time series with learnable, input-dependent time constants20. However, AG-LNN extends beyond standard LTCs by integrating two additional gates: an input attention gate, which emphasizes anomaly-relevant variables such as DO and CODMn, and a time-constant modulation gate, which dynamically accelerates or decelerates the liquid dynamics according to the context. This dual gating ensures that the AG-LNN inherits the expressive continuous-time properties of LTCs while enhancing interpretability and anomaly sensitivity.
The proposed framework contributes to the advancement of the scientific field in several respects. First, the AG-LNN addresses the challenge of irregularly sampled multivariate sensor data, which is a common scenario in large-scale environmental monitoring networks. Unlike discrete-time RNNs, which require fixed intervals, AG-LNN’s continuous-time formulation naturally accommodates varying sampling intervals. Second, its explainable outputs (attention heatmaps and adaptive time-constant trajectories) make the model more transparent, bridging the gap between deep learning and environmental science, where decision making requires interpretability. By improving anomaly detection performance and interpretability, the AG-LNN provides a practical tool for early warning systems, supporting timely interventions in pollution control and sustainable water management.
Critical steps influencing success
The reproducibility and overall success of the AG-LNN are deeply influenced by a few critical protocol stages that shape both data reliability and learning stability. Among them, season-aware weak labeling plays a decisive role; it governs the precision–recall balance by coupling quantile thresholding with residual-based detection to capture subtle seasonal fluctuations. Equally important is the Kalman-based long-gap imputation step, which restores the temporal coherence in missing sequences through carefully tuned noise covariance parameters. The gradient-based anomaly correction further refines the signal, where the threshold θ = 3σ_xΔt^-1 delicately balances anomaly sensitivity against unwanted fluctuations. Finally, attention warm-up and bounded λ adaptation stabilize the early training process and regulate how responsively the liquid dynamics evolve over time. Together, these interdependent steps form the backbone of AG-LNN’s stability, enabling it to maintain consistent optimization, reliable anomaly labeling, and robust generalization across diverse temporal and regional conditions.
Modifications and troubleshooting
Practical deployment and replication of AG-LNN may encounter a range of challenges, some rooted in data quality and others in model dynamics or hardware constraints. At the data level, when anomalies appear to be either over- or under-detected, recalibrate the quantile thresholds (for example, tightening them to 0.5%–99.5%) or adjust the residual window length to approximately 10–15 days to smooth seasonal noise. If weak labels begin to destabilize training, consider raising their confidence cutoff to 0.9 and lowering their learning weight β from 0.7 to 0.6, striking a better balance between precision and tolerance. At the model level, attention saturation or loss oscillation during early epochs often signals the need for a longer warm-up period, extending it from 10 to 15 epochs or a smaller initial learning rate, such as 5x10-4. Slightly reducing λmin from 0.5 to 0.4 can also make the system more responsive to sudden anomalies. On the hardware side, edge deployment may suffer from latency, which can be effectively mitigated by adopting a lightweight AG-LNN-light variant or by shortening the sliding window from 96 to 64 steps without significant loss of accuracy. Collectively, these adjustments form a practical troubleshooting guide, helping researchers and engineers to maintain model stability and efficiency in real-world environments.
Limitations and future directions
Despite these strengths, this study had several limitations. AG-LNN is computationally more demanding than lightweight baselines, such as LSTM25 and GRU26, potentially limiting deployment on resource-constrained devices. The evaluation relied primarily on the CNEMC datasets, which may restrict the generalizability of the results to other ecological regions. Moreover, supervised training requires reliable anomaly labels that are expensive and subjective. Although AG-LNN enhances interpretability compared to black-box models, its visual outputs may still require additional interface design for non-expert stakeholders.
Alternative approaches can also be explored to test this hypothesis. For instance, GNNs such as the GDN and its variant GDN+ have shown promising results in river network anomaly detection by capturing spatial dependencies and offering graph-level interpretability29. Hybrid physics-informed GNNs (e.g., ST‑GPINN) combine hydraulic modeling with graph representations, thereby enhancing the generalization in water distribution system quality prediction30. Comparisons with these models may clarify the strengths of AG‑LNN in balancing dynamic responsiveness, interpretability, and temporal modeling.
The importance and applications of AG-LNN extend beyond water quality monitoring. Its attention-gated liquid design can be adapted to air quality forecasting, hydrological risk prediction, and ecological health monitoring, all of which involve noisy and irregularly sampled time series31. Beyond environmental science, similar methods can be applied to biomedical signal analysis and industrial fault detection, where real-time interpretability and robustness are essential. This cross-domain adaptability makes the AG-LNN a versatile methodology for anomaly detection in complex systems.
One promising direction is to enhance computational efficiency through closed-form continuous-time models (CfCs), which reduce dependence on iterative solvers while retaining interpretability32,33,34,35. Such an approach can simplify the training dynamics and make large-scale deployments more practical. In parallel, the use of neuromorphic hardware36,37,38 (e.g., Intel Loihi-239,40) offers an exciting possibility for achieving real-time, low-power inference in distributed sensor networks, where responsiveness and energy efficiency are often critical. Beyond these architectural advances, extending validation across diverse hydrological and climatic regions is essential to evaluate generalizability, ensuring that the model remains reliable under varied and unpredictable environmental conditions.
The authors have no conflicts of interest.
This research was supported by the 2024 Characteristic Innovation Project for Colleges and Universities in Guangdong Province Water Quality Monitoring and Early Warning System Based on Wireless Sensor Network (Project Number: 2024KTSCX304), the 2024 School-level Scientific Research Project of Guangzhou Nanyang Polytechnic College Water Tank Management System Based on Internet of Things (Project Number: NY-2024KYZD-01), the 2022 Guangdong Province Key Area Special Project (New Generation Electronic Information) Online Prediction, Early Warning and Linkage Prevention and Control System for Aquaculture Based on HarmonyOS (Project Number: 2022ZDZX1081), and 2021 Guangdong Province Vocational Colleges High-level Professional Group Construction Project Big Data Technology Professional Group (Project Number: GSPZYQ2020089).
| 256 GB DDR4 RAM | Samsung | M393A4K40DB3 | High-capacity memory for handling large multivariate time-series datasets |
| CUDA 11.6 | NVIDIA | N/A | GPU acceleration toolkit for PyTorch |
| Intel Xeon Gold 6330 CPU (2.0GHz, 28 cores) | Intel | BX80708-6330 | Used as the main computation server for model training |
| Matplotlib 3.4, Seaborn 0.11 | Open Source | N/A | Visualization of experimental results |
| NumPy 1.21, Pandas 1.3 | Open Source | N/A | Data preprocessing and feature engineering |
| NVIDIA A100 GPU (40GB) | NVIDIA | 900-21001-0000-001 | Accelerated training of AG-LNN with CUDA support |
| Python 3.9 | Python Software Foundation | N/A | Main programming language for implementation |
| PyTorch 1.12 | Meta AI | N/A | Deep learning framework used for building AG-LNN |
| Scikit-learn 0.24 | Open Source | N/A | Evaluation metrics and baseline models |
| Ubuntu 20.04 LTS OS | Canonical | N/A | Operating system for the computational environment |