Abstract

Collaborative capture of dynamic targets is common in nature as an essential strategy for weaker species against the strong. Similar concepts have shown to be useful for numerous robotic applications, such as security and surveillance, search and rescue. However, most existing works focus on analytical and geometric solutions or end-to-end reinforcement learning methods, which are largely constrained to obstacle-free environments or scenarios with sparse, regularly distributed obstacles. This work tackles the problem from a unique perspective: the renowned strategy of ``ambush'' alone would suffice for multiple slower pursuers to capture one faster evader with different levels of intelligence efficiently in complex environments. A parameterized strategy of ambush (including discrete and continuous parameters) is designed first, which takes into account the topological properties of the workspace, the truncated line-of-sight visibility, the relative speed ratio and the limited capture range. Then, a Hybrid Monte Carlo Tree Search (H-MCTS) algorithm is proposed to optimize the associated parameters through long-term planning, enabling the identification of highly promising parameters for future capture. Lastly, the strategy is trained offline to learn the ranking of different choices of parameters across various environments, and to directly predict scores, replacing the rollout process in H-MCTS. The learned heuristics are adopted during online H-MCTS to accelerate the planning procedure while guaranteeing the planning quality. Its efficiency and effectiveness are validated in extensive simulations and hardware experiments, against the different capability and intelligence of the evaders, including two-times higher velocity and human-controlled behavior.

Proposed Method

(I) generates the role of pursuers as attacker or evader and their motion parameters; (II) determines the motion policy under the parameters; and (III) oversees the execution progress and reaction of the evader to trigger the adaptation scheme.

Experiments

Hardware Experiments

Autonomous Evader

Human-controled evader with limited FOV

Human-controled evader with global FOV

Baselines

Analytic-based Method:
A distributed algorithm enables slower pursuers to capture faster evaders through the adaptive switching between two strategies, i.e., the collaborative encirclement formation and the direct hunting approach.
Geometric-based Method:
Centralized algorithm using buffered Voronoi cells to coordinate pursuers in cluttered environments.
RL-based Method:
This approach implements a decentralized deep reinforcement learning framework to learn cooperative pursuit strategies through a structured curriculum mechanism.
Shooting Method:
Probabilistic gate selection method that replaces H-MCTS with a scoring mechanism. For each candidate gate, computes expected capture probability by: (i) generating evader's shortest paths to goals, (ii) evaluating overlap between pursuer zones and escape paths, (iii) aggregating probability-weighted coverage. Then executes the highest-scoring gate.
MCTS-based Method:
Direct implementation of H-MCTS algorithm without incorporating the NARE module.

Comparisons

Scenario-I

Ours

MCTS

Shooting

Analytic

Geometric

RL

Scenario-II

Ours

MCTS

Shooting

Analytic

Geometric

RL

Scenario-III

Ours

MCTS

Shooting

Analytic

Geometric

RL

Comparison: Numerical Result

(I) generates the role of pursuers as attacker or evader and their motion parameters; (II) determines the motion policy under the parameters; and (III) oversees the execution progress and reaction of the evader to trigger the adaptation scheme.