Master Thesis on Reinforcement Learning for Combat Aircraft

Master Thesis on Reinforcement Learning for Combat Aircraft

Project Overview

Top Summary: A Master’s thesis focusing on developing and training a Deep Multi-Agent Reinforcement Learning (MARL) agent to improve strategic planning and collaboration for Computer Generated Forces (CGF) in combat simulations. The project, conducted at Airbus Defense and Space, shifted the focus from direct flight control to high-level strategic decision-making in high-stakes environments.

Quick StatsDetails
RoleResearcher / Developer
DurationJan 2024 - Jun 2024
Stack/ToolsPython, PyTorch, Gymnasium, Ray RLLib
Specific AlgorithmsPPO, APPO, and SAC

1. The Challenge

Traditional rule-based systems for air combat are often predictable and lack the flexibility required for modern adversarial environments.

  • Problem: Existing rule-based Computer Generated Forces (CGFs) are laborious to develop and can be easily exploited by human pilots because they fail to adapt to novel tactics.
  • Goal: To investigate if multi-agent reinforcement learning can facilitate high-level strategic planning and collaboration, aiming for a win probability of over 90% against traditional rule-based systems in complex 1-vs-1 and 2-vs-1 scenarios.

2. My Solution & Technical Approach

I implemented a strategic behavioral controller that abstracted flight physics into high-level tactical decisions.

  • Architecture: Developed a system where the agent selects from discrete “behavioral states” (strategies). This high-level “brain” maps radar data and positioning to strategic actions executed over 4-second intervals.
  • Key Decisions: Designed a multi-component reward function that balanced individual survival, team mission goals, and behavior-shaping rewards to ensure the agents learned robust, collaborative tactics.
  • Code Highlights: Built a custom air combat simulator in Python optimized for reinforcement learning. This allowed for rapid iteration, where a basic agent could reach convergence in approximately 15 minutes of training on standard hardware.

3. Implementation & Overcoming Obstacles

  • Hardest Challenge: Managing the integration with complex RL frameworks like Ray RLLib and handling the stability of multi-agent training.
  • Resolution: I utilized Asynchronous Proximal Policy Optimization (APPO) and implemented league-based training, where agents were progressively matched against stronger versions of themselves to prevent strategy stagnation and improve robustness.

4. Results & Conclusion

The agents successfully developed emergent, non-intuitive strategies that outperformed traditional baselines.

  • Measurable Impact: Achieved a win probability of over 90% in strategic 1-vs-1 experiments. In 2-vs-1 scenarios, the agents demonstrated sophisticated cooperative maneuvers that are typically difficult to program manually.
  • Lessons Learned: The research proved that high-level strategic control is more computationally efficient and adaptable than low-level flight control for tactical AI, paving the way for more intelligent “wingman” agents in professional simulations.
© 2026 David van Scheppingen