Reinforcement Learning for Scotland Yard: Strategies and Implementation

Reinforcement Learning techniques like Deep Q-Networks and Monte Carlo Tree Search can improve the computer agent strategies in the Scotland Yard board game, improving user experience.

Table of Contents

1. Introduction to Scotland Yard and Reinforcement Learning

Scotland Yard is a classic board game of strategy and deduction, where a group of detectives chase a secretive criminal, Mr. X, across a map of Old England. Using different modes of transportation, players must outsmart each other in this game of hide-and-seek. With the integration of Reinforcement Learning (RL) methods such as Deep Q-Networks (DQN) and Monte Carlo Tree Search (MCTS), we can significantly enhance the gameplay, particularly the strategies employed by Mr. X.

1.1 Game Rules

1.1.1 Setup

Scotland Yard involves 2-5 detectives and one criminal, Mr. X.
Players use taxis, buses, and the underground to move across 200 nodes on the map.
Detectives win by landing on Mr. X’s location, while Mr. X wins by evading capture for 20 turns.

1.1.2 Information Asymmetry

Mr. X’s location is hidden except at specific intervals.
Detectives know the transportation modes used by Mr. X but not his exact location.

1.2 Game Design Challenges

Asymmetry: The game is biased against Mr. X due to the limited turns and collective detective strategy.
Partial Observability: Detectives operate with incomplete information, making standard RL algorithms less effective without modifications.

2. Reinforcement Learning in Gameplay

2.1 Methodology

The two main policy optimization techniques used are Deep Q-Networks and Monte Carlo Tree Search. Feature vectors are engineered to include the locations of detectives, the modes of transport used by all players, and the nodes on the map to be used as data.

Method: Both Mr. X and detective agents are trained adversarially to improve each other’s strategies. This involves simulating games where each side employs its optimal policy, continuously refining through competition.

Strategy: Integrating DQN with MCTS combines the strengths of deep learning and tree search, optimizing decision-making processes.

2.2 Results

2.2.1 Baseline

Initial simulations with random moves set a baseline win rate for Mr. X at around 9-10%.

2.2.2 Deep Q-Networks

Mr. X’s win rate increased to approximately 50% after extensive training against random detective moves.

2.2.3 Monte Carlo Tree Search

MCTS-based strategies yielded a maximum win rate of over 60% for Mr. X, highlighting its effectiveness in complex decision-making environments.

2.2.4 Combined Approach

The hybrid DQN + MCTS model showed promising results set to surpass just MCTS but required extensive training iterations to reach its full potential with smart detective agents trained adversarially.

Conclusion

Reinforcement Learning offers a robust framework for enhancing the strategies in Scotland Yard. By leveraging DQN and MCTS, we can develop intelligent agents capable of outperforming traditional human strategies. However, the computational demands and the need for extensive training iterations present challenges that need to be addressed in future work.

Future Developments

Enhanced Computational Resources, ensuring sufficient computational power to train models to their optimal performance levels.
Graph Neural Networks to model game states more accurately.
Asynchronous Actor-Critic Models for more stable learning.

The project report can be found here