The UIT Information Security Laboratory

The advancement of software vulnerability detection tools has accelerated in recent years, yet the prevalence and severity of vulnerabilities continue to escalate, posing significant threats to computer security and information safety. To address this, numerous detection methodologies have been proposed, with machine learning-based approaches demonstrating notable promise. In this paper, we present a comprehensive review of state-of-the-art (SOTA) architectures that leverage deep learning (DL) and natural language processing (NLP) or large language models (LLMs) for identifying vulnerabilities. We systematically examine the efficiency of these cutting-edge architectures and performance analysis. Our aim is to uncover novel approaches for maximizing the potential of existing architectures to enhance vulnerability detection. Concretely, we evaluate the performance of models on advanced datasets containing over 90 CWEs in C/C++ vulnerabilities. During our research, we identified three pivotal research questions as the effective integration of NLP and DL technologies, the strengths and limitations of LLMs in this domain, and a comparative analysis of LLMs versus integrated NLP-DL approaches. Additionally, we discuss the challenges and experimental constraints encountered in this domain, offering insights into future research directions. This study aims to inspire further exploration of innovative methodologies and contribute to the development of more robust cybersecurity solutions.

An empirical review of the effectiveness of different language processing approaches in Software Code Vulnerability Detection