How we solved the Royal Game of Ur

We spent 7 years working on AI for the Royal Game of Ur, and now we have strongly solved it! All it took was a couple pretty equations, which I will do my best to explain.

In fact, we have even gone beyond "just" strongly solving the game. Strongly solving the game requires us to find the best move from every position in the game. We have done that, plus we have also calculated each player's precise chance of winning, under optimal play, from every position as well. All using our open-source library for the game, RoyalUr-Java.

Here's how we did it, and how you could also solve this ancient game yourself when you next have a spare weekend! We have also written a technical report here.

Watch flawless play

Before we go any further, here's a little live demo. Below, two bots engage in a flawless duel - perfect play! I find it mesmerising. How often do you get to watch something and know it is mathematically perfect?

This demo shows our flawless bot, the Panda, playing against itself in the Royal Game of Ur. You can play the perfect bot yourself here.

Perfection means maximising your chance of winning

In order to create a bot that plays perfectly, we need to create a bot that maximises its chance of winning.

Unlike in Chess or Go, when you play Ur you cannot plot a perfect victory in advance. The dice would never allow it. Instead, a flawless bot has to constantly adapt to whatever the dice throw its way. This makes solving the Royal Game of Ur fundamentally different to solving deterministic games like Connect-Four.

Instead of plotting out one single path to victory, we need to consider every possible future path and weigh them all in our choice of moves. In this way, solving the Royal Game of Ur is more similar to solving Poker than it is to solving Chess.

Roll
Click to roll!
The special tetrahedral dice used for the Royal Game of Ur.

We use value iteration to maximise your chance of winning

So, how do we possibly figure out what move maximises our chance of winning, especially when we might be 100 moves away from the end of the game?! Our answer is to use value iteration.

V(state)={Vlight(state),if the current player is light,Vdark(state),if the current player is dark. V(\text{state}) = \begin{cases} V_{\text{light}}(\text{state}), & \text{if the current player is light},\\ V_{\text{dark}}(\text{state}), & \text{if the current player is dark}. \end{cases}
Vlight(state)=rollrollsP(roll)maxmovemoves(V(state after roll & move)) V_{\text{light}}(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot \max_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)
Vdark(state)=rollrollsP(roll)minmovemoves(V(state after roll & move)) V_{\text{dark}}(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot \min_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)

Value iteration really is one of my favourite algorithms of all time. It runs a simple kernel over a big state space over and over again until it converges. Similar to a fluid simulation, it allows a big set of rules and probabilities to settle over time. Here's how I like to imagine it:

...

Imagine a dam. Water up high on one side, and low on the other. Artificial. Now explode it! Chaos! Thrashing water. Rubble. Waves of debris. But wait, give the water some more time... The debris will be washed away. The water will start to run clear. The damaged riverbanks will be worn smooth. Eventually, you will have a beautiful and tranquil river.

That is how I see value iteration. It is a force that takes an initial bad estimate of your win percentages, allows those win percentages to flow, and then eventually smoothes out those percentages into a big curve from the very end of the game back to the very start.

The especially neat thing about this is that the probabilities that it converges to are your chance of winning! So to play perfectly, you now just have to surf that probability wave, picking the moves that lead to your highest chance of winning at each turn.

Now, let's get into exactly how we can go from bad estimates of win percentages, to win percentages accurate to within one trillionth of a percent.

The process that value iteration goes through to find the perfect solution to a game.
The process that value iteration goes through to find the perfect solution to a game.

How does value iteration actually work?

Here is the core idea of value iteration: we propagate wins and losses backwards.

If a position is likely to lead to a win, then it must be a good position. If a position is likely to lead to a loss, it must be a bad position.

Going further: If a position is likely to lead to other good positions, then it must also be good. If a position is likely to lead to other bad positions, then it must be bad.

If you know how good the next positions you might reach are, then you can calculate how good your current position is. This allows us to work backwards through a game, updating the value of states based upon the states that come after them.

...

This "backwards pass" is implemented using a simple weighted average. Your chance of winning in one state is equal to the weighted average of your chance of winning in all the states you could reach after each roll of the dice.

V(state)=rollrollsP(roll)V(state after roll & best move) V(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot V(\text{state after roll \& best move})

This simple operation acts as the means by which wins and losses cascade through the state space.

Deriving your chance of winning in one position based upon its neighbour positions that you would reach after each roll.
Deriving your chance of winning in one position based upon the neighbour positions you would reach after each roll of the dice.

Now do it again, and again, and again...

Unfortunately, we cannot just do one backwards pass through the Royal Game of Ur, calculate your chance of winning from each position, and call it a day. It's not quite that simple...

Why not? Loops. When you reach a state, which is just the technical name for a position, in the Royal Game of Ur, it is often possible to play a few moves and then get back to the same state as before you played those moves.

These loops cause a single backwards pass to be insufficient, because the win percentage of some states will be incorrect when you try to use them to calculate the winning chance of another state. Their value is incorrect because that state you just calculated can influence the states you used in the calculation (a circular dependency)!

A diagram showing how a capture of a piece can lead to a loop in the states that are visited in a game.
A diagram showing how a capture of a piece can lead to a loop in the states that are visited in a game.

So, how do we solve this problem? We just do lots of backwards passes! Every time you do a backwards pass the estimates of the win percentages get better and better until eventually they barely change at all.

The animation below is showing these backwards passes as they happen. This animation is particularly chaotic, but it doesn't actually need to be. It is chaotic because we edit the map in-place, and because of the order in which we loop through all the states to update them. If we used a copy of the map, or if we used a different ordering, then value iteration might converge faster and look more calm. But the animation would look much more boring. So, I like this chaotic version better!

An animation showing the process of calculating the light player's winning chance over a game of the Royal Game of Ur.
An animation showing the process of calculating the light player's winning chance from every position in the game.

But wait, how do you decide what moves players would make after each roll?

This is a crucial question. In reality, the best strategy to win depends heavily on what your opponent does. However, we can't predict every individual player's quirks and tendencies. Instead, we assume that both players play optimally, making the best possible decision at every turn.

By doing this, we can calculate the Nash Equilibrium for the Royal Game of Ur. That is, we calculate a strategy where neither player can improve their chance of winning by changing their moves. A strategy that is impossible to beat. But also a strategy that does not necessarily notice and exploit the weaknesses in your opponent's play. This is a tradeoff we make.

To calculate the Nash Equilibrium, it turns out that a simple greedy approach is all you need. You just pick the move that would lead to the highest winning chance for the current player. This is the same strategy that our Panda bot uses to play the game flawlessly. Even though the win percentages you are using to make this decision might be incorrect at first, this is still good enough to let the win percentages converge over time.

Within our value iteration kernel, the selection of the optimal move for each player is represented by a maximum or minimum over the value of their possible moves.

Vlight(state,roll)=maxmovemoves(V(state after roll & move)) V_{\text{light}}(\text{state}, \text{roll}) = \max_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)
Vdark(state,roll)=minmovemoves(V(state after roll & move)) V_{\text{dark}}(\text{state}, \text{roll}) = \min_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)

Final overview of value iteration

So that's all there is to it!

We use a simple weighted average to update your chance of winning in each state based upon your chance of winning in neighbouring states (the sum and multiply by the probability of each roll). We assume that light and dark will make the move that is best for them (the min/max over the available moves). And we do that lots of times until it converges!

V(state)={Vlight(state),if the current player is light,Vdark(state),if the current player is dark. V(\text{state}) = \begin{cases} V_{\text{light}}(\text{state}), & \text{if the current player is light},\\ V_{\text{dark}}(\text{state}), & \text{if the current player is dark}. \end{cases}
Vlight(state)=rollrollsP(roll)maxmovemoves(V(state after roll & move)) V_{\text{light}}(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot \max_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)
Vdark(state)=rollrollsP(roll)minmovemoves(V(state after roll & move)) V_{\text{dark}}(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot \min_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)

You just run this one simple kernel over the entire state space a bunch of times and hey presto! A solved game. Ain't that pretty.

Now chop the game up for speed!

Beauty is nice, but let's be real. We're often happy to sacrifice it for some speed! And in the Royal Game of Ur, there's one big optimisation that we have available to us: chop the game up into many smaller mini-games.

Performing value iteration on the entire game at once is expensive, and inefficient. During the thousand iterations it takes to train our models, there is really only a narrow boundary of states that are converging to their true value at a time. States closer to the winning states have already converged, and states further away don't have good enough information to converge yet.

We'd really love to only update the states that are close to converging in every iteration, to avoid wasting time updating everything else. Luckily, we have one delicious little trick that lets us do just that!

Once you score a piece, you can never unscore it.

Once you score a piece, you can never unscore it. This detail gives us one huge benefit: it allows us to split the Royal Game of Ur into many smaller “mini-games”. Once you score a piece, you progress from one mini-game to the next. We get an advantage in training using this, as each of these mini-games is only dependent upon its own states and states of the next mini-games you could reach.

Grouping states of the Royal Game of Ur played with 3 pieces by the number of scored pieces.
The groups we organise states into, the “mini-games”, and their dependencies. The groups shown here represent a ruleset restricted to 3 starting pieces.

Instead of training the whole game at once, with all its waste, we are able to train each smaller game one-by-one. And since each of these mini-games is much smaller than the whole game, they can each be trained more efficiently, and with much less waste! This reduces the time it takes to solve the Finkel game from 11 hours to less than 5 hours on my M1 Macbook Pro.

Storing all the win percentages with a custom densely-packed map

One thing we haven't gone over yet, but that is very important to actually solving the game, is optimising how we store all the positions and their associated win percentages.

We need a map, or lookup-table, from every position in the game to the likelihood that you would win starting from that position. For the most common Finkel ruleset, this means creating a map of 275,827,872 unique positions. The Masters ruleset is even larger, with over a billion positions to store.

So how do we build these big maps? We decided our best bet was to overcomplicate the process! We were worried about memory, so we built our own map implementation that is optimized for it.

Our maps are made of densely-packed binary keys and values. Instead of the normal approach of hashing keys to find entries, we sort the keys and use binary-search. This allows our maps to be as small as possible, while still being really fast.

Using our overcomplicated map, we are able to keep memory usage down to just 6 bytes per map entry! This means we can store the entire solution for the Finkel ruleset in just 827MB.

A visualisation of the different memory structures of hash maps and our custom map implementation.
A visualisation of the different memory structures of hash maps and our custom map implementation.

Game states in 32-bits, win percentages in 16

The lookup keys in our map represent each position that you can reach in the game. These positions, which we usually refer to as "states", are snapshots of the game at a fixed point, including whose turn it is, each player's pieces and score, and where all the pieces are on the board. For each of these states, we then store the chance that light wins the game if both players played perfectly from that point onwards.

To use as little memory as possible, we encode the states and their associated win percentages into small binary numbers. The states of the game are encoded into 32-bits, and the win percentages are quantised to 16-bits when we save our final models. The percentages are easy to manage, but getting the game states into 32-bit keys turns out to be a bit tricky…

If you were to naively pack a Finkel game state into binary, you might end up with 54-bits:

  • 1-bit: Whose turn is it?
  • 1-bit: Has the game been won?
  • 2 x 3-bits: Light player's pieces and score.
  • 2 x 3-bits: Dark player's pieces and score.
  • 20 x 2-bits: Contents of each tile on the board (tiles can be empty, contain a light piece, or contain a dark piece).
An unoptimised binary format showing 22-bits used past the goal of 32-bits.
A simple, but inefficient, binary format that you could use to encode game states into keys.

54-bit keys aren't so bad. We could use standard long 64-bit numbers to store these keys, and it would still be quite fast and memory efficient. But wouldn't it be nice if we could fit it all in 32-bits instead! Luckily, there's some tricks we have up our sleeves to reduce the number of bits we need.

The first thing we can tackle is that some information in the keys is redundant. The number of pieces players have left to play, and whether the game has been won, can both be calculated from other information we store in the keys. Therefore, we can remove those values and save 7 bits!

Another 12 bits can also be axed by changing how we store tiles that can only be reached by one player. We call these tiles the "safe-zone" tiles. Since they can only be accessed by one player, we only need 1-bit to store each: 0 for empty, and 1 for containing a piece. This halves the space we need to store these tiles, bringing us down to an overall key-size of 35 bits. That's pretty close to our goal of 32!

The safe-zone and war-zone tiles under the Finkel ruleset of the Royal Game of Ur.
The safe-zone (S) and war-zone (W) tiles under the Finkel ruleset. The safe-zone tiles are green, and the war-zone tiles are red.

The final 3 bits can be removed by compressing the war-zone tiles. Before compression, each war-zone tile uses 2-bits to store its 3 states (empty, light, or dark). That means there's an extra 4th state that 2-bits can represent that we don't use. In other words, an opportunity for compression!

To compress the war-zone, we list out all legal 16-bit variations that the war-zone could take, and assign them to new, smaller numbers using a little lookup table. This allows us to compress the 16 bits down to 13, bringing us to our 32-bit goal!

Ur is symmetrical - symmetrical is Ur

As a cherry on-top, we can also remove the turn bit, as the Royal Game of Ur is symmetrical. This is actually a pretty big deal! It means that if you swap the light and dark players, their chance of winning stays the same. This is really helpful, as it means we only have to store and calculate half the game states. That's a really significant saving!

The binary format we use for encoding Finkel game states into keys.
The binary format we use for encoding Finkel game states into keys.

We now have a 31-bit encoding that we can use for the keys in our map! Theoretically, the minimum number of bits we'd need to represent all 137,913,936 keys would be 28. So we're a bit off, but I'm pretty happy with getting within 3 bits of perfect!

Other rulesets need more than 32-bits

Unfortunately, when we try to move on to different rulesets, we hit a snag. For the Masters and Blitz rulesets the same techniques can only get us down to 34 bits, due to the longer path around the board. That means we have 2 more bits than we wanted...

But our trick bag isn't empty yet! Instead of doubling our key size to 64-bits, we can instead split the map into 4 sections. The 2 extra bits can then be used to select the section to look in, before we search for the actual entry using the remaining 32-bits. This allows us to fit the half a billion entries for the Masters ruleset into 3 GB instead of 5!

The binary format we use for encoding Blitz and Masters game states into keys.
The binary format we use for encoding Blitz and Masters game states into keys.

And that's a map, folks! There's nothing too special in our implementation, but it works well for our purposes of reducing memory usage and allowing fast reads and writes. Now we get to move on to some real work.

We open-sourced the solved game

You now have all you need to solve the Royal Game of Ur yourself! Although, if you'd rather experiment with the solved game straight away, we have open-sourced a few different implementations for training and using our models.

If you do experiment with the solved game, we'd love to hear what you manage to do! If you're interested, I invite you to join our Discord. We have a channel dedicated to discussing research topics, just like everything I've talked about above.

Now, let's zoom-out from the technical details and discuss how solving the Royal Game of Ur fits into the larger landscape of AI.

Can we solve more games?

It's difficult to find a game that aligns as perfectly with value iteration as the Royal Game of Ur. With its element of chance, non-finite gameplay, and limited number of positions, value iteration is the ideal choice for solving the game. Value iteration is rarely as ideal when applied to other games.

Some games we'd love to solve, such as Backgammon, have too many positions for value iteration. Ur has ~276 million positions, while Backgammon contains ~100 quintillion... This means that solving Backgammon using value iteration would require so much storage it verges on the unimaginable. You would need nearly 3% of the world's total data storage, or a trillion gigabytes.

Other popular games, such as Connect-Four or Yahtzee, would be inefficient to solve using value iteration. Connect-Four contains no dice, so it is more efficient to solve using search. Yahtzee has its dice, but it is simplified by its inescapable forwards motion. Once you have made an action in Yahtzee, you can never get back to a state before you took that action. This leads to Yahtzee being more efficient to solve by evaluating the game back to front instead.

We estimate that any game with a state-space complexity on the order of 1E+11 or lower could feasibly be solved using value iteration. This cuts out using value iteration to solve most complex games, including Backgammon.

The state-space and game-tree complexity metrics of popular games, including the Royal Game of Ur.
The state-space and game-tree complexity metrics of popular games, including the Royal Game of Ur.

This is why, despite solving Ur being a remarkable achievement for the game, it is not a breakthrough for solving more complex games. There may be some games of chance with perfect information that could be elevated to the title of strongly solved using value iteration. But value iteration alone will not allow us to solve complex games like Backgammon.

Yet, we should not discount value iteration entirely! In the tense final moves of a board game, value iteration can still provide a lot of value. Advanced AI systems for games often rely on databases to play flawlessly in endgames, and value iteration is one method that can be used to build those databases. Bearoff databases, for instance, are commonly used for perfect endgame strategies in Backgammon. Similar techniques could also allow flawless play at the end of other games like Ludo or Parcheesi!

What do we use this for?

Solving the game has allowed us to calculate player's accuracy in games, to provide in-depth game reviews, and to let players challenge the perfect player by selecting the Panda as their opponent. This has all had a big impact on how our players play the Royal Game of Ur.

If you're interested to read more about what this means for the Royal Game of Ur, read our blog post about that here.

Screenshot from a game review.
Screenshot from a game review.

Thanks for reading!

If you'd like to read more about our story of figuring out how to solve the Royal Game of Ur, or what it means for the game, we have also written a blog post on just that. And if you've done anything cool in regards to solving board games, then you might be interested in joining our Discord! We love that sort of thing.

Thanks for reading :)
~ Paddy L.

Would you like to hear about future updates?

Join our newsletter and we'll send you emails when we have new updates or Royal Game of Ur news to share 📮 We also send photos of cool boards! 🖼️