Game AI
13 June 2022
Alexandre Peyrot, Romain Trachel
Adaptive Game AI . Behavior Tree . Environment Query System . Reinforcement Learning

# Gameplay Prediction via Machine Learning for Adaptive Game AI

## Abstract

We investigate how machine learning can be combined with Unreal Engine’s Environment Query System to update behavior trees for adaptive gameplay. The main idea is to keep the interpretability and reliability of AI decisions given by behavior trees, while benefitting from a data driven approach to update some of the tree’s values based on the current game state.

## Introduction

Artificial Intelligence (AI) in video games uses carefully authored computer programs that leverage the various gameplay mechanics to support fun and engaging games. In the iconic 1980 videogame, Pac-Man, each of the four ghosts have their unique chasing mechanism and the game alternates between intense chase modes followed by periods of ghost scattering. This creative human design allows for the game to be neither too easy nor too hard, to keep the player engaged and in a state of flow. In more recent games like Left 4 Dead, Middle-earth: Shadow of Mordor or Alien: Isolation, the enemy behavior is dynamically adapted to maintain engagement and prevent the game from becoming too repetitive. Additionally, adaptive game AI could also contribute to making games accessible to more players.

We introduce an adaptive game AI approach to learn the optimal game difficulty from gameplay data. The approach is based on machine learning (ML) as well as game AI tools like Behavior Trees (BTs) and Unreal Engine’s Environment Query System (EQS). Our ML framework is inspired by Reinforcement Learning which allows to train agents that play video games in the context of automated game testing, or that can compete and win against professional players in games like Starcraft II or Dota 2. The goal of our work however is not to have an unbeatable agent, but instead build an enemy AI that should be intuitive to understand and fairly challenging to play against. The main idea is to keep the interpretability and reliability of AI behavior given by BTs, while benefitting from a data driven approach to update some of the tree’s values based on the current game state.

To achieve this goal, we trained an artificial neural network that learns to predict the player behavior from various game replays. These predictions are computed in real-time with an inference runtime (ONNX) and used to update dynamically enemy BTs. Such predictions could be for instance, “how likely is an item about to be picked up?” or “how likely is the player about to get caught by an enemy?”. Hence, the game difficulty can be adapted such that in hard mode, the enemy would move towards certain threatening locations, while in easy mode, it could be less aggressive and move further away from the player.

In the following sections, we first describe the scene and task we used to test our adaptive game AI approach. We further introduce the various game AI tools we combined with ML. We then explain how we collected gameplay data and how the neural network is trained. Finally, we present our results and discuss limitations and future works.

Overview of our system implemented in Unreal. In the following sections, we will detail the gameplay and game AI elements displayed in this video.

To test our approach, we used the top-down template in Unreal Engine 4 (UE4) to build a simple demo based on a hide and seek game. The scene is a squared flat surface with 29 uniformly distributed pillars where the player can hide from the enemy. The goal of the game is to pick up 16 items while avoiding the enemy. The player character is controlled by point and click and uses the navigation system to move and pick up items. At the beginning of each game, both the player and the enemy character are spawned at a random location on the map. Every item must be collected to complete the level, and if the player is caught by the enemy, it is game over.

To prevent an immediate capture, we kept a minimum distance between the characters’ spawning locations. The scene layout does not change, and pickup items are always spawned at the same location for simplicity. The game score is equal to the number of items collected before getting caught. The score is used to measure player performance and is reset when the level is completed or upon game over. At the end of each game, we asked the playtesters to report the difficulty of the enemy behavior between easy, medium, or hard. These subjective reports are used to compare the perceived difficulty level of several kinds of chasing behaviors.

## Behavior Trees and EQS

Our approach relies on the use of an auto-generated spatial query system such as Unreal’s EQS which we use in conjunction with Behavior Trees.

Behavior Trees

Behavior trees’ logic works by going down the leaves with a priority system from left to right. In UE4, a BT can store game variables in its blackboard component and may use them to execute a behavior like moving towards a given location if certain conditions are met. See Figure 1 for an example of a BT made in UE4.

Environment Query System

The idea behind EQS is to generate a set of points in the scene, check if some criteria are met at each point, and select the best point to update a BT’s blackboard value, for instance. These selection criteria are called EQS tests and typical examples include distance, pathfinding, or line of sight checks between entities that heuristically should help the AI character achieve a certain goal. Multiple tests can be combined to output a selection score. See Figure 2 for an example of a BT that uses an EQS with two tests.

## Enemy behaviors

A series of BTs were built to produce enemy behaviors of varying difficulty. We start from a random patrol and chasing enemy behavior described in UE4 documentation, later incorporate an EQS node, and finally combine it with ML predictions.

Random Patrol

Let us start by describing the original random patrol and chasing enemy behavior.

The left branch of this BT executes a chasing behavior for the enemy if it has line of sight on the player (Figure 1.A). Otherwise, it falls into patrol mode (Figure 1.B) where it picks a random location within a given radius and moves the enemy to this location. The line of sight is lost when the player hides behind pillars and the enemy goes back to patrolling if this happens for over a second.

Line of Sight (LoS) based EQS

To make the patrol behavior more engaging, one may replace the random patrol node by more elaborated search mechanisms. In our chase scene, we replace the random patrol node in the BT by an EQS node that executes two tests (Figure 2.A). The first test checks for line of sight, while the second one scores according to shortest distance to the player. The point returned by the EQS query is then used as a new patrol location.

ML based EQS

Good EQS heuristics often combine multiple tests to produce a desired behavior. For instance, in the above example, we chose an EQS that combines tests based on line-of-sight and distance, as these were the most important considerations to be able to capture the player.

We propose instead to use an EQS with a single test based on a ML model that has been trained to combine multiple considerations automatically. The model outputs predictions about game events such as player capture or item collection which can in turn be converted into an EQS score. In other words, the gameplay data informs the model how to generate a test, instead of selecting and tuning various tests manually.

Using a custom ONNX plugin for UE4, we can load our model in the engine and use the predictions as scoring tests for our custom EQS (Figure 3).

Our EQS blueprint, now consists of a unique test which under the hood combines multiple features to make its selections. These predictions are obtained by running an ONNX inference at runtime when the EQS node is executed.

How to interpret the ML based EQS test

Having a reinforcement learning model learn multiple values simultaneously falls under the umbrella of General Value Functions (GVF) and is useful for having models that generalize better by gaining more information from their world exploration. Using temporal-difference learning, we trained a feedforward neural network with one hidden layer and two outputs consisting of two GVFs: game score and enemy score. We recall that the game score corresponds to the number of items collected, while the enemy score keeps track of whether the player has been captured. For brevity, we will focus here on a test based on the GVF which outputs enemy score predictions.

The value output by our GVF should be thought of as a proxy for the probability of game over happening soon: the higher the value, the sooner the enemy should capture the player.

ML based EQS tests with reference values

Selecting a point with a high score typically leads the enemy to go to a location at which the player will be easily spotted and captured. On the other hand, low scoring points are locations at which the enemy would walk in directions opposite to that of the player. Finally, mid scoring points serve as an in between for locations that have some visibility and proximity to the player without being overly aggressive.

One way to leverage this is to use reference values: instead of finding the point with maximal model prediction value, we search for the point that minimizes the difference between the model’s prediction and the reference value. We produce in this fashion 3 ML based chase modes with intermediate difficulty levels by gradually increasing the reference values as follows:

• An easy mode: reference value of 0
• A medium mode: reference value of 0.12
• A hard mode: no reference value

The value of 0 for the easy mode is equivalent to the model choosing the point at which the model prediction is as low as possible. The value of 0.12 for the medium mode was chosen manually through trial and error according to the different EQS value grids as shown on Figure 4, and according to player performances against various reference values. Finally setting no reference values is equivalent to setting a reference value of 1 and choosing the point at which the model prediction is as high as possible.

## Experimentation

Data collection and model training

We collect data from games with an enemy playing with the random patrol BT (see Figure 1), and with an automated player which follows a simple heuristic: it looks for the closest item to pick up, while hiding behind the pillars on the map to break line of sight with the enemy. We collect gameplay information (e.g., player position, orientation, etc.) as well as the game score at regular intervals of 0.3 seconds for being able to keep time consistency from sample to sample. We collected in this way 400 games worth of automated gameplay for training our model – an equivalent of 2.6 hours of gameplay with an average game duration of 23 seconds. The model used in the subsequent experiments is trained on this dataset via temporal-difference learning with a discount factor γ = 0.9. A model prediction of 1 thus corresponds to the model expecting an immediate capture, a prediction of 0.12 corresponds to an expected capture in 6 seconds, while a prediction of 0 corresponds to no capture expected to happen.

Capturing the player happens infrequently and the sparsity of the enemy score update could make it hard for the model to converge. We implemented a priority replay buffer which helped the model converge to solutions with lower prediction error. Since there are 16 items on the map, we also noted an imbalance in the frequency of game score updates versus enemy score updates. This would lead the model to focus on optimizing its game score prediction, while neglecting the enemy scores. To counteract the imbalance between the frequency of scoring signals we also implemented a PopArt layer which allowed to train models with both outputs simultaneously.

Experiment 1: Comparisons of game AI with and without ML based EQS

In this experiment, we compared 3 BTs that share the same logic except for the patrol mode depending on their EQS configurations. The random patrol and the line-of-sight based EQS were used as game AI tools and compared with our custom ML based EQS. The data collected here consists of 201 games worth of human gameplay. This experiment was done internally with three participants. For each new game, the enemy was given a randomly selected BTs amongst one of the three patrol behaviors. The participant was not told what BT was controlling the enemy.

We analyze with heatmaps in Figure 5, the behavior of the player/enemy characters showing how they moved across the level and their death location.

The players’ trajectories favor the edge of the map as well as corners to hide better from the enemy. We notice that the paths taken by our ML based EQS BT are homogenous across the map, similarly to the random patrol as well as the line-of-sight based EQS BT. This is an indicator that there is not too much map specific bias in the ML model, which we wanted to avoid because it would lead to repetitive patrol behaviors. This can be in parts explained by the capture location heatmaps which indicates the parts of the maps at which the enemy score is updated.

To test the difficulty levels of the various BTs described above, we compare the average performance of human players as measured by gameplay time (i.e., how long they managed to play before getting captured or completing the level), as well as game score (i.e., how many items they managed to collect). These appear as boxplots in Figure 6, which exhibit the median within the box that extends from lower to upper quartile values for the game score and duration (in seconds), respectively.

The subjective difficulties reports given by the players at the end of each game also appear as pie charts below. Each chart corresponds to the ratio of difficulty perceived under each type of patrol mode used by the enemy BT: either random patrol, our proposed ML based EQS without reference value or the line-of-sight based EQS, respectively labeled as “Random”, “EQSHard” and “EQSLoS”. The variable N indicates the number of games played with each patrol setting.

The game scores and durations are comparable under the EQS Hard and EQS LoS modes, however they are significantly lower than under the random patrol mode. This indicates that the level was harder to play under both the EQS Hard and LoS modes. Moreover, the ML model has learned to detect comparably difficult locations to send the enemy to with respect to the EQS LoS mode. The subjective reports also show that the players noticed an increase of difficulty under both EQS modes.

Experiment 2: ML based reference values difficulty comparisons

In this second experiment, we compare ML based EQS with different reference values to have a better control on the difficulty level. We collected 280 games played by our three internal testers. Each new game, a ML based EQS patrol mode was selected at random amongst setting reference values of 0 (EQS Easy), 0.12 (EQS Medium) or not setting a reference value (EQS Hard), as described in the previous section. We use similar pie charts and boxplots as before to show the subjective difficulty reports as well as the game scores and durations (in seconds) achieved in this new experiment.

We note a clear increase of difficulty, with an average player score of 12 at easy, 10.75 at medium and 6.71 at hard. We also compared the performance of our automated bot, used for data collection, against the various proposed BTs. The bot achieved an average score of 3.76 at easy, 3.35 at medium and 2.46 at hard. We note that these scores are much lower than the human participants, however they maintain the trend of decreasing scores as difficulty is increased.

The easy mode in the current experience has similar perceived difficulty, game scores and durations compared with the random patrol mode in the previous experience. Although the same hard mode was used in both experiments, we observe different perceived difficulty with comparable game scores and durations. This is likely because the EQS LoS mode was harder than the medium EQS mode in this experiment, hence human testers may have felt the hard mode less difficult in experiment 1.

## Conclusions and future work

We presented a novel approach mixing game AI tools with ML to create an adaptive chasing behavior. Our ML-based EQS test can perform assessments of in-game situations to control game difficulty by allowing the query system to prioritize points of low, medium, and high threat values. The performance of human players in a simple hide and seek scene indicates that our approach can modulate the game difficulty effectively.

We are currently investigating how other gameplay event predictions can be included in our EQS test. More specifically, our model also predicts the occurrence of an item pickup event which could be used to create a defensive behavior: the enemy moves to protect locations where the player’s next item target likely will be.

Preliminary exploration of this defensive behavior however shows a strong bias toward going at certain corners of the map to defend specific items. The player would then simply be able to collect the remaining items and not be able to collect that final item which results in a stale gameplay. This comes in contrast with our test based on capture events, where experiments show that the enemy explores the scene homogeneously. Overcoming this bias will be necessary for using item pickup predictions for new mechanics such as alternating between intense chasing episodes followed by a defensive item protection behavior. This idea is inspired by the chase versus scatter modes that is used in Pac-Man to maintain the player in a state of flow. We are researching ways to overcome the bias in our pickup item predictions and how other game events can fit our framework for an engaging game experience.

Alexandre Peyrot

Alexandre Peyrot is a Machine Learning specialist in the AI & Machine Learning team of Eidos-Sherbrooke. He obtained his Ph.D. in mathematics at Ecole Polytechnique Fédérale de Lausanne (EPFL) in 2017. He continued his research in mathematics while working as a postdoctoral scholar at Stanford University before joining Eidos-Montréal in 2018. He sees an immense potential for how machine learning can be combined with game AI to create immersive adaptive gameplay as well as seamlessly adjusting game difficulties.

Romain Trachel

Romain Trachel is a Senior Machine Learning Specialist working in the AI & Machine Learning team of Eidos-Sherbrooke, since July 2018. He obtained his PhD in 2014 from University of Nice (south of France) where he worked on Brain-Computer Interfaces at the Neuroscience Institute of Timone (Marseille – www.int.univ-amu.fr) and INRIA (Sophia-Antipolis – https://www.inria.fr/fr/athena). He did a couple of post-docs and started working as an independent researcher for producing virtual reality movies and art installations based on artificial intelligence. His major interests include signal/image processing, machine/deep learning, neurosciences, cognitive sciences, digital/modern art, and recent technologies such as virtual/augmented reality.

Insert math as
$${}$$