Fuzzing is a popular and effective software testing technique that automatically generates or modifies inputs to test the stability and vulnerabilities of a software system, which has been widely applied and improved by security researchers and experts. The goal of fuzzing is to uncover potential weaknesses in software by providing unexpected and invalid inputs to the target program to monitor its behavior and identify errors or unintended outcomes. Recently, researchers have also integrated promising machine learning algorithms, such as reinforcement learning, to enhance the fuzzing process. Reinforcement learning (RL) has been proven to be able to improve the effectiveness of fuzzing by selecting and prioritizing transformation actions with higher coverage, which reduces the required effort to uncover vulnerabilities. However, RL-based fuzzing models also encounter certain limitations, including an imbalance between exploitation and exploration. In this study, we propose a coverage-guided RL-based fuzzing model that enhances grey-box fuzzing, in which we leverage deep Q-learning to predict and select input variations to maximize code coverage and use code coverage as a reward signal. This model is complemented by simple input selection and scheduling algorithms that promote a more balanced approach to exploiting and exploring software. Furthermore, we introduce a multi-level input mutation model combined with RL to create a sequence of actions for comprehensive input variation. The proposed model is compared to other fuzzing tools in testing various real-world programs, where the results indicate a notable enhancement in terms of code coverage, discovered paths, and execution speed of our solution.