XFedGraph-Hunter: An Interpretable Federated Learning Framework for Hunting Advanced Persistent Threat in Provenance Graph

9:01 03/07/2023

Advanced persistent threats (APT) are increasingly sophisticated and pose a significant threat to organizations' cybersecurity. Detecting APT attacks in a timely manner is crucial to prevent significant damage. However, hunting for APT attacks requires access to large amounts of sensitive data, which is typically spread across different organizations. This makes it challenging to train effective APT detection models while preserving data privacy. To address this challenge, this paper proposes XFedGraph-Hunter, an interpretable federated learning framework for detecting APT attacks in provenance graphs. The framework leverages federated learning to train APT attack hunting models collaboratively on decentralized data stored on multiple devices. This approach helps to preserve data privacy and security while improving the model's performance. The machine learning (ML) model employed in the framework is GraphSAGE. Moreover, a pre-trained transformer model is leveraged into the feature preprocessing process to enhance GraphSAGE's performance. Additionally, GNNexplainer is employed to provide explanations for the APT attack hunting model's predictions, thereby increasing transparency and interpretability. The proposed framework is evaluated on DARPA TCE3 datasets, using FedAvg as the federated learning algorithm. The results indicate that the proposed framework can effectively detect APT attacks, achieving high accuracy and F1 scores. The interpretability provided by GNNexplainer helps in understanding the features contributing to the detection of APT attacks. The collaborative approach to APT attack hunting presented in this paper enables multiple parties to contribute their data while preserving privacy, providing an effective and scalable solution for APT detection.

The proliferation of connectivity through modern telecommunications has led to increased unwanted and disruptive calls. Such communications negatively impact user experience and trust in platforms. Currently, call filtering relies on centralized architectures that aggregate vast troves of sensitive user data within single entities, compromising privacy and ownership. Users have limited...