Skip to main navigation menu Skip to main content Skip to site footer

Improving KPI Time Series Anomaly Detection in Cloud Computing Environments through Graph Neural Network-Based Structural and Temporal Modeling

Abstract

This paper proposes a graph neural network-based time series anomaly detection method to address the challenges of high dimensionality, strong dynamics, and complex dependencies in KPI anomaly monitoring within cloud computing environments. The method abstracts multi-source metrics into a dynamic graph with a topological structure and effectively captures semantic correlations and temporal evolution features among system components through joint structural modeling and temporal feature learning. At the structural level, the model leverages graph neural networks to capture inter-metric interaction dependencies and achieve structure-aware global contextual representations. At the temporal level, it models multi-scale time series patterns to enhance anomaly discrimination and detection accuracy. To further improve adaptability in complex scenarios, an uncertainty calibration and bias constraint mechanism is introduced to enhance the handling of boundary samples and noisy data. Experiments conducted on real-world cloud KPI datasets evaluate the model's performance under various load conditions, data distributions, and noise levels. Results show that the proposed method outperforms existing approaches in metrics such as MSE, MAE, RMSE, and MAPE, maintaining stable detection performance under challenging conditions such as workload fluctuations, resource competition, and distribution shifts. This research provides an efficient and structure-aware solution for anomaly detection in multidimensional, multi-tenant cloud environments and offers critical support for automated operations, performance assurance, and fault diagnosis in cloud systems.

pdf