To discover threats to a network system, investigating the behaviors of attackers after successful exploitation is an important phase, called post-exploitation. Although various efficient tools support post-exploitation implementation, the crucial factor in completing this process remains experienced human experts, known as penetration testers or pen-testers.This study proposes the Raiju framework, a Reinforcement Learning (RL)-driven automation approach, which automatically implements steps of the post-exploitation phase for security-level evaluation. We implement two well-known RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO), to evaluate specialized agents capable of making intelligent actions. With the support of Metasploit, modules corresponding to selected actions of the agent automatically launch real attacks of privileges escalation (PE), gathering hashdump (GH), and lateral movement (LM) on multiple platforms. Through leveraging RL, our objective is to empower agents that can autonomously select suitable actions to exploit vulnerabilities within target systems. This approach enables the automation of specific components within the penetration testing (PT) workflow, thereby enhancing its efficiency and adaptability to evolving threats and vulnerabilities.The experiments are performed in four real environments with agents trained in thousands of episodes. The agents can automatically launch exploits on the four environments and achieve a success ratio of over 84\% across the three attack types. Furthermore, our experiments demonstrate the remarkable effectiveness of the A2C algorithm in the realm of post-exploitation automation.