Solving the world's oldest board game

The Royal Game of Ur, a 4500-year-old mystery, is now solved.

Our new AI has not only defeated the world's best - it has surpassed the skill of every player to ever live. And with that, we have changed the Royal Game of Ur forever.

Let me tell you the story of how we uncovered the secrets of this ancient game, and created a ruthless little bot called the Panda in the process.

Our supercharged Panda bot breaking the Royal Game of Ur in two.
Our supercharged Panda bot breaking the Royal Game of Ur in two.

A few mornings before Christmas in 2017, I hunched over the table in my parents' house. My dad was asleep on the couch, and I had a pint of coffee in hand. It was one of those sleepy family days that I cherish. I opened up Eclipse and started writing plain HTML, JavaScript, and CSS. I wanted to make an online version of the Royal Game of Ur.

Six years later, I leaned back in my chair and watched Londoners shuffle past my window in the cold. They were on their way to work, and I was working from home in my pyjamas. There had been 401 games played on RoyalUr.net the day before, a slow day. My foot tapping, I would glance outside and then back down to my laptop. I was waiting for the final steps of training to complete.

Eventually, my laptop fans quieted. My foot stopped tapping. I looked down and read a generic message in my terminal: "Write took 1,036ms".

That was the moment we found the optimal move to play from every possible position in the game.

That was the moment the Royal Game of Ur was solved.

...

Creating this website and solving this game have become really meaningful to me. I now spend a lot of my time with friends I've made through the Royal Game of Ur. And solving this ancient game is one of the most significant things I have ever done. I'm also pretty confident that I was the first person to solve it, which is a pretty cool claim!

But I didn't get here on my own, and I definitely don't think of solving the game as only my achievement. No, I may have started the ball rolling and given it its final tap into the net, but there were many more people working to move the ball forward.

You see, when you make a nerdy website like this one, it tends to attract like-minded enthusiasts who are motivated and perhaps also a little nerdy. And that is how our band of nerds - ahem, I mean researchers - came to spend our nights having four-hour-long discussions about the central tile. Or three-hour-long debates on measuring luck. Picking apart the strategy of the Royal Game of Ur became our hobby, our obsession.

We dug into the corners of the Royal Game of Ur, hauled out whatever nuggets of insight we could find, and hammered them into something useful. AI that could play the game better and better every week. We spent 3 years working on it together.

Then one day, we solved it.

And just like that, we had changed the face of a ~4500-year-old board game forever. For the last four millennia the optimal strategy for the Royal Game of Ur has been a mystery. Now, for the rest of time, it will be solved and accessible to all. How cool is that!!

A photo of the board of the Royal Game of Ur
Photo of an excavated board of the Royal Game of Ur that has been dated back to 2500 BC. © The Trustees of the British Museum.

It's a pretty big deal!

In the Royal Game of Ur, you throw dice to see how far you can move. These dice bring a lot of fun to the game. But they also make it hard to objectively evaluate strategy. You could never really tell whether you got lucky, or whether you played really skillfully. Now you can!

Solving the game has brought your skill in the Royal Game of Ur out of obscurity and into sharp focus.

This is a big deal for a game where luck plays a significant role in who wins each game, especially at higher levels.

I am ecstatic and slightly nervous to finally share what we achieved, what it means for the game, and to provide you with a full report on how you can solve the game yourself! It has been a fun story to live, and hopefully it is fun to read as well.

Watch flawless play

Before we go any further, here's a little live demo. Below, two bots are engaging in a flawless duel - perfect play! I find it mesmerising. How often do you get to watch something and know it is mathematically perfect?

Our flawless bot, the Panda, playing against itself in the Royal Game of Ur using the Finkel ruleset. You can play the perfect bot yourself here.

Table of Contents

The breakthrough 🎯

We spent years hiking through the woods to improve our AI for the game. And then we realised there was a highway slightly to our left...

Before solving the game, our bots were simple and search-based. They would simulate millions of moves to figure out the one move they should make. This approach was unsophisticated, and so it took years of fine-tuning to get it to work well.

Throughout the years, we would always give the same codename to our latest and best bot: the Panda!

We optimised the Panda, updated how it searched, and improved its heuristics. We made a lot of progress! Before solving the game, the Panda was playing with an average accuracy of 87%, whilst also running on most of our player's phones. That's only a few percentage points shy of Arthur, one of the strongest Royal Game of Ur players in the world.

However, just as we thought we might narrowly pass Arthur, a new researcher, Jeroen, joined our Discord. Soon after, we had not only beaten Arthur's skill - we had smashed it!

We set the Panda on a completely new training regimen. And only a couple of weeks later, we had done something we didn't even realise was possible. We had calculated the statistically best move to make from every possible position in the game. 100% accuracy. Perfect play.

We had solved the Royal Game of Ur!

That was a bit of a shock to the system.

...

Oh, and we also open-sourced the solution. Programmers, have at it!

The Panda now plays flawlessly

The Panda now ruthlessly cares about just one thing: winning. It does not care about trivial details such as losing pieces, scoring, or capturing. No, it only cares about how each and every move it makes affects its final chance of winning the game.

If sacrificing a piece gave the Panda a 0.01% improved chance of winning, the Panda would play that move. In this way, the Panda plays the Royal Game of Ur flawlessly. There is no strategy that you can use to outplay the Panda. You can only hope to match its skill!

How does it know what is best? It's pretty simple actually. We have calculated your chance of victory from every single position in the game. Once you know that, you can just pick the move that leads to your highest chance of winning. Easy!

Your winning chance from all positions in the Royal Game of Ur, grouped by how far each player's pieces have moved.
Your winning chance from all positions in the Royal Game of Ur, grouped by how far each player's pieces have moved.

How did we do it?

The method we used to solve the game is actually rather old, even if it is still a youngster compared to the Royal Game of Ur itself. It's called value iteration, and it is one of the most well-known reinforcement learning algorithms around. In fact, similar methods have been used to solve endgames in Backgammon for decades!

But if the algorithm has been around so long, why hasn't someone solved the game sooner?

We were blinded by our incremental progress. Our group had spent so much time looking down the rabbit-hole of search-based AI that we failed to see that there was a simpler solution. We had to take our blinders off!

I had dismissed the idea of solving the game by brute-force for years, because I overestimated its difficulty. The Royal Game of Ur contains hundreds of millions of positions, so calculating your chance of winning from every single one was daunting. Oh boy was I wrong.

Jeroen showed us it was possible

Out of the blue, I received an email. Within it, Jeroen shared how he had solved a two-piece version of the Royal Game of Ur using value iteration. A month later, he joined our Discord and presented his work. It outlined exactly how value iteration could be used to solve the game.

At that point, I knew I had a great week ahead of me. We already had all the libraries we'd need to simulate the game, so all we had left to do was build the training program to solve it.

I spent the next week working in my pyjamas and subsisting on baked beans. Wake up. Classical music. Code. Repeat. One week of that, and the Royal Game of Ur was solved!

It's crazy to me that this was possible for the entire three years of us working on AI for the game.

The Panda
The Panda chilling, knowing it is the best.

The final model we produced is very simple. It is just a big table of all the positions in the game, and the likelihood that you would win starting from each one. These simple tables have unlocked countless possibilities for the Royal Game of Ur.

I'll dive further into the technical challenges of solving the game later in this article. But first, let me try to illustrate how big of a deal it is that we managed to solve the game!

Our Panda bot now plays flawlessly. We can produce in-depth game reviews, calculate player's accuracy in games, and create a stable competitive rating system. We can even generate pretty graphs of your winning chance throughout a game!

This changes how the Royal Game of Ur is played.

A graph of the light player's winning chance throughout a game of the Royal Game of Ur.
A graph of the light player's winning chance throughout a game.

The game has changed 👑

Solving the game has unlocked strategic depth for the Royal Game of Ur. The dice made it hard to determine whether you won or lost due to strategy or luck. No longer.

Using the solved game, we can now perceive the full strategic depth in every move, every decision, and every careful placement of your pieces. You may still be playing the same game that people played four millenia ago. But now we can appreciate the true quality of your play. The dice no longer hide that from us.

This has created a whole new game within a game - a sophisticated and challenging puzzle that asks not merely "Can you win?" but rather "Can you find the perfect answer to each position?"

Many of our best players now aim to play the best possible game, not just to win. This is a fundamental shift in the game. If you lost, but you played with an accuracy over 80%, you still played an outstanding game. This raises the ceiling on how good players can get.

Now our job is to build the tools to help our players reach these new heights!

The average accuracy of players, grouped by the number of games they have played, and the speculated accuracy of future players.
The average accuracy of players, grouped by the number of games they have played, and the speculated accuracy of future players.

Let the percentages flow like water 🌊

Let us take a moment, breathe,  ...  and dive in to the depths of the algorithm we used to master the Royal Game of Ur. Value iteration. I know just saying the word "algorithm" can lead to people's eyes glazing over, but please don't doze off just yet! I promise it's worth it.

I find value iteration breathtakingly elegant, and I have a great desire to express that here!

I ask you to look at value iteration through my eyes. Imagine a dam. Water up high on one side, and low on the other. Artificial. Now explode it! Chaos! Thrashing water. Rubble. Waves of debris. But wait, give the water some more time... The debris will be washed away. The water will start to run clear. Fish will return. The damaged riverbanks will be worn smooth. Birds will start chirping. Eventually, you will have a beautiful and tranquil river.

That is how I see value iteration.

We have set up a dam with your wins on the high side, and your losses on the low side. We then remove the dam wall, and value iteration takes over. All the possible wins and losses in the game cascade through every possible way the game could play out. The probabilities will flow, flow, flow. Until they stop. And at that point, the game is solved.

You can now sail this quiet river of probabilities, and the current will bring you through your best path to victory.

An animation showing the process of calculating the light player's winning chance over a game of the Royal Game of Ur.
An animation showing the process of calculating the light player's winning chance from every position in the game.

To solve the game, you just have to let the water settle.

...

Perhaps I'm alone in this. But I really find the mathematical perfection and simplicity in value iteration to be profoundly beautiful. Value iteration allows probabilities to flow as if they were water. All you have to do is set up the dam, and it does the rest.

This simple method could actually determine the impact of a butterfly flapping its wings on the result of the game. All through the flow of probabilities. That blows my mind! I will explain exactly how it does that later in this article.

What does it mean to solve a game containing chance? 🎲

The waves of chance are fundamental to playing the Royal Game of Ur. Probabilities really play a huge role, and they matter a whole lot when we consider what it means to have solved this game.

Unlike in Chess or Go, when you play Ur you cannot plot a perfect victory in advance. The dice would never allow it. This makes solving the Royal Game of Ur quite different to solving deterministic games like Connect-Four.

In Connect-Four we know that the starting player, if they play perfectly, will always win. This is not the case when you play perfectly in a game where luck is involved.

Playing perfectly in the Royal Game of Ur requires you to constantly adapt to whatever the dice throw your way. Players must balance making the most of good dice rolls, whilst also minimising the downsides of bad dice rolls.

Sometimes a swing for the fences is what you need. Other times, you may be better off playing cautiously. Deciding between these strategies isn't always straightforward, but solving the game allows us to do it perfectly. The solved game tells us exactly when each strategy is better, down to the slimmest of margins. If one move gives you a 0.01% better chance to win over another, we know it!

Playing perfectly in a game like Connect-Four means you'll never lose. Playing perfectly in the Royal Game of Ur means you'll lose fewer games than anyone else. The real challenge of the Royal Game of Ur is not in winning itself - anyone can get lucky. The real challenge is in maximising your odds of winning. And the perfect player is a master of that.

There were fun technical challenges getting here! 🤓

We have actually solved a wide range of rulesets for the Royal Game of Ur, not just the most common Finkel ruleset. This, combined with the difficulty of creating, storing, and updating hundreds of millions of game positions, poses a few fun technical challenges!

We needed to 1) devise an efficient way to store all those positions, and 2) write and optimise the algorithm that calculates your winning chance from each position. That's what we'll get into next.

I dive into a lot of technical detail in this section. If that's not your thing, it might be worth jumping to later when we discuss solving other games.

To everyone else interested in the technical nitty-gritty, you're my type of person! Let's get into it.

We will dive into the map we use to store positions, the binary encoding of positions to use as keys, and all the little map optimisations and quirks we use. We'll get into value iteration itself, and how we can optimise it for the Royal Game of Ur. Then we'll finish it all off with links to all our open-source projects if you want to give solving the game a try yourself!

We made a custom densely-packed map

Our goal was to create a map, or lookup-table, from every position in the game to the likelihood that you would win starting from that position. For the most common Finkel ruleset, this means creating a map of 275,827,872 unique positions. The Masters ruleset is even larger, with over a billion positions to store.

So how do we build these big maps? We decided our best bet was to overcomplicate the process! We were worried about memory, so we built our own map implementation that is optimized for it.

Our maps are made of densely-packed binary keys and values. Instead of the normal approach of hashing keys to find entries, we sort the keys and use binary-search. This allows our maps to be as small as possible, while still being really fast.

Using our overcomplicated map, we are able to keep memory usage down to just 6 bytes per map entry!

A visualisation of the different memory structures of hash maps and our custom map implementation.
A visualisation of the different memory structures of hash maps and our custom map implementation.

Game states in 32-bits, win percentages in 16

The lookup keys in our map represent each position that you can reach in the game. These positions, which we usually refer to as "states", are snapshots of the game at a fixed point, including whose turn it is, each player's pieces and score, and where all the pieces are on the board. For each of these states, we then store the chance that light wins the game if both players played perfectly from that point onwards.

To use as little memory as possible, we encode the states and their associated win percentages into small binary numbers. The states of the game are encoded into 32-bits, and the win percentages are quantised to 16-bits when we save our final models. The percentages are easy to manage, but getting the game states into 32-bit keys turns out to be a bit tricky…

If you were to naively pack a Finkel game state into binary, you might end up with 54-bits:

  • 1-bit: Whose turn is it?
  • 1-bit: Has the game been won?
  • 2 x 3-bits: Light player's pieces and score.
  • 2 x 3-bits: Dark player's pieces and score.
  • 20 x 2-bits: Contents of each tile on the board (tiles can be empty, contain a light piece, or contain a dark piece).
An unoptimised binary format showing 22-bits used past the goal of 32-bits.
A simple, but inefficient, binary format that you could use to encode game states into keys.

54-bit keys aren't so bad. We could use standard long 64-bit numbers to store these keys, and it would still be quite fast and memory efficient. But wouldn't it be nice if we could fit it all in 32-bits instead! Luckily, there's some tricks we have up our sleeves to reduce the number of bits we need.

The first thing we can tackle is that some information in the keys is redundant. The number of pieces players have left to play, and whether the game has been won, can both be calculated from other information we store in the keys. Therefore, we can remove those values and save 7 bits!

Another 12 bits can also be axed by changing how we store tiles that can only be reached by one player. We call these tiles the "safe-zone" tiles. Since they can only be accessed by one player, we only need 1-bit to store each: 0 for empty, and 1 for containing a piece. This halves the space we need to store these tiles, bringing us down to an overall key-size of 35 bits. That's pretty close to our goal of 32!

The safe-zone and war-zone tiles under the Finkel ruleset of the Royal Game of Ur.
The safe-zone (S) and war-zone (W) tiles under the Finkel ruleset. The safe-zone tiles are green, and the war-zone tiles are red.

The final 3 bits can be removed by compressing the war-zone tiles. Before compression, each war-zone tile uses 2-bits to store its 3 states (empty, light, or dark). That means there's an extra 4th state that 2-bits can represent that we don't use. In other words, an opportunity for compression!

To compress the war-zone, we list out all legal 16-bit variations that the war-zone could take, and assign them to new, smaller numbers using a little lookup table. This allows us to compress the 16 bits down to 13, bringing us to our 32-bit goal!

Ur is symmetrical - symmetrical is Ur

As a cherry on-top, we can also remove the turn bit, as the Royal Game of Ur is symmetrical. This is actually a pretty big deal! It means that if you swap the light and dark players, their chance of winning stays the same. This is really helpful, as it means we only have to store and calculate half the game states. That's a really significant saving!

The binary format we use for encoding Finkel game states into keys.
The binary format we use for encoding Finkel game states into keys.

We now have a 31-bit encoding that we can use for the keys in our map! Theoretically, the minimum number of bits we'd need to represent all 137,913,936 keys would be 28. So we're a bit off, but I'm pretty happy with getting within 3 bits of perfect!

Other rulesets need more than 32-bits

Unfortunately, when we try to move on to different rulesets, we hit a snag. For the Masters and Blitz rulesets the same techniques can only get us down to 34 bits, due to the longer path around the board. That means we have 2 more bits than we wanted...

But our trick bag isn't empty yet! Instead of doubling our key size to 64-bits, we can instead split the map into 4 sections. The 2 extra bits can then be used to select the section to look in, before we search for the actual entry using the remaining 32-bits. This allows us to fit the half a billion entries for the Masters ruleset into 3 GB instead of 5!

The binary format we use for encoding Blitz and Masters game states into keys.
The binary format we use for encoding Blitz and Masters game states into keys.

And that's a map, folks! There's nothing too special in our implementation, but it works well for our purposes of reducing memory usage and allowing fast reads and writes. Now we get to move on to some real work.

Value iteration, for real this time

Enough with the poetical descriptions of this algorithm. We want the nuts and bolts!

Here's the kernel of value iteration in a couple equations. The value we are calculating is the chance of light winning, so naturally light wants to maximise value, and dark wants to minimise it.

V(state)={Vlight(state),if the current player is light,Vdark(state),if the current player is dark. V(\text{state}) = \begin{cases} V_{\text{light}}(\text{state}), & \text{if the current player is light},\\ V_{\text{dark}}(\text{state}), & \text{if the current player is dark}. \end{cases}
Vlight(state)=rollrollsP(roll)maxmovemoves(V(state after roll & move)) V_{\text{light}}(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot \max_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)
Vdark(state)=rollrollsP(roll)minmovemoves(V(state after roll & move)) V_{\text{dark}}(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot \min_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)

You iteratively run this operation over every non-winning state, updating your state map, until the maximum amount that a state changes in value is below a threshold. You read old values of V(state) from the map, and store new values back to the map. If you try to get recursive, you're going to have a bad time with infinite loops, so make sure to avoid that!

...

Okay... but what are these equations actually doing? They are updating the estimate of light's winning chance in one state based upon the estimates of light's winning chance in other states. If you do this for all the states enough times, they will eventually converge to light's actual winning chance (assuming both players are playing perfectly). Let me break it down a bit further!

I will walk you through the following steps of value iteration: setting up the state map, estimating your chance of winning in one state based upon other states, doing that lots of times for all the states, and how you consider what moves players would make.

Another detail for those familiar with value iteration: In our use of value iteration, we will not give value to actions. Instead, we will give value to states where a player has won. This is an important distinction! When you use value iteration by giving value to actions, you will probably also need to use a discount factor and your results will be imperfect. By assigning value to the win states directly, we don't have to worry about this.

The Setup

First, let's make sure to give the win states their values. The initial value of most states in the map don't matter that much, but the value of states where light or dark wins are vitally important. These states should be set to 100% if light wins, or 0% if light loses. These are the values that we will propagate through the entire state space.

The values within the state map before running value iteration.
The values within the state map before running value iteration.

Estimating Win Percentages

Now comes the core idea of value iteration: propagating wins and losses backwards.

The basic idea is this: If you know how good the next positions you might reach are, then you can calculate how good your current position is. For example, if the positions you can reach after each roll are great, then your current position is also great! And now that you know how good your current position is, you can use that to calculate how good the positions before it are, and the positions before those, and so on.

This "backwards pass" is implemented using a simple weighted average. Your chance of winning in one state is equal to the weighted average of your chance of winning in all the states you could reach after each roll of the dice.

V(state)=rollrollsP(roll)V(state after roll & best move) V(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot V(\text{state after roll \& best move})

This simple operation acts as the means by which wins and losses cascade through the state space.

Deriving your winning chance based on the states you could reach after each roll.
Deriving your winning chance based on the states you could reach after each roll.

Now do it again, and again, and again...

So, can we just do one backwards pass through the Royal Game of Ur, calculate your chance of winning from each state, and call it a day? Unfortunately, it's not so simple. One single pass backwards through all the states is not quite enough to solve the Royal Game of Ur. Why? Loops. When you reach a state in the Royal Game of Ur, you can play a few moves and then get back to the same state as before you played them.

These loops cause a single backwards pass to be insufficient, because the win percentage of some states will be incorrect when you try to use them to calculate the winning chance from another state. Their value is incorrect because that state you just calculated can influence the states you used in the calculation (a circular dependency)!

A diagram showing how a capture of a piece can lead to a loop in the states that are visited in a game.
A diagram showing how a capture of a piece can lead to a loop in the states that are visited in a game.

So, how do we solve this problem? We just do lots of backwards passes! Every time you do a backwards pass the estimates of the win percentages get better and better until eventually they barely change at all. For our models, we stopped the backwards passes when the maximum change made to any state in the state map was less than 0.0001% after running through all the states.

The animation below is showing these backwards passes as they happen. This animation is particularly chaotic, but it doesn't actually need to be. It is chaotic because we edit the map in-place, and because of the order in which we loop through all the states to update them. If we used a copy of the map, or if we used a different ordering, then value iteration might converge faster and look more calm. But the animation would look much more boring. So, I like this chaotic version better!

An animation showing the process of calculating the light player's winning chance over a game of the Royal Game of Ur.
An animation showing the process of calculating the light player's winning chance from every position in the game.

But wait, what moves do players make after each roll?

This is a crucial question, but we've only skimmed over it. In reality, the best strategy to win depends heavily on what your opponent does. However, we can't predict every individual player's quirks and tendencies. Instead, we assume that both players play optimally, making the best possible decision at every turn.

By doing this, we can calculate the Nash Equilibrium. That is, we calculate a strategy where neither player can improve their chance of winning by changing their moves. A strategy that is impossible to beat. But also a strategy that does not necessarily notice and exploit the weaknesses in your opponent's play. This is a tradeoff we make.

To calculate the Nash Equilibrium, it turns out that a simple greedy approach is all you need. You just pick the move that would lead to the highest winning chance for the current player. This is the same strategy that the Panda uses to play the game flawlessly. Even though the win percentages you are using to make this decision might be incorrect at first, this is still good enough to let the win percentages converge over time.

Within our value iteration kernel, the selection of the optimal move for each player is represented by the maximum or minimum over the value of their possible moves.

Vlight(state,roll)=maxmovemoves(V(state after roll & move)) V_{\text{light}}(\text{state}, \text{roll}) = \max_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)
Vdark(state,roll)=minmovemoves(V(state after roll & move)) V_{\text{dark}}(\text{state}, \text{roll}) = \min_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)

Putting it all together

So that's all there is to it!

We use a simple weighted average to calculate your chance of winning in each state based upon your chance of winning in other states (the sum and multiply by the probability of each roll). We assume that light and dark will make the move that is best for them (the min/max over the available moves). And we do that lots of times until it converges!

V(state)={Vlight(state),if the current player is light,Vdark(state),if the current player is dark. V(\text{state}) = \begin{cases} V_{\text{light}}(\text{state}), & \text{if the current player is light},\\ V_{\text{dark}}(\text{state}), & \text{if the current player is dark}. \end{cases}
Vlight(state)=rollrollsP(roll)maxmovemoves(V(state after roll & move)) V_{\text{light}}(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot \max_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)
Vdark(state)=rollrollsP(roll)minmovemoves(V(state after roll & move)) V_{\text{dark}}(\text{state}) = \sum_{\text{roll} \in \text{rolls}} \text{P(roll)} \cdot \min_{\text{move} \in \text{moves}} \Bigl(V(\text{state after roll \& move})\Bigr)

You just run this one simple kernel over the entire state space a bunch of times and hey presto! A solved game. I find that beautiful.

Now chop the game up for speed!

Beauty is nice, but let's be real. We're often happy to sacrifice it for some speed! And in the Royal Game of Ur, there's one big optimisation that we have available to us: chop the game up into many smaller mini-games.

Performing value iteration on the entire game at once is expensive, and inefficient. During the thousand iterations it takes to train our models, there is really only a narrow boundary of states that are converging to their true value at a time. States closer to the winning states have already converged, and states further away don't have good enough information to converge yet.

We'd really love to only update the states that are close to converging in every iteration, to avoid wasting time updating everything else. Luckily, we have one delicious little trick that lets us do just that!

Once you score a piece, you can never unscore it.

Once you score a piece, you can never unscore it. This detail gives us one huge benefit: it allows us to split the Royal Game of Ur into many smaller “mini-games”. Once you score a piece, you progress from one mini-game to the next. We get an advantage in training using this, as each of these mini-games is only dependent upon its own states and states of the next mini-games you could reach.

Grouping states of the Royal Game of Ur played with 3 pieces by the number of scored pieces.
The groups we organise states into, the “mini-games”, and their dependencies. The groups shown here represent a ruleset restricted to 3 starting pieces.

Instead of training the whole game at once, with all its waste, we are able to train each smaller game one-by-one. And since each of these mini-games is much smaller than the whole game, they can each be trained more efficiently, and with much less waste! This reduces the time it takes to solve the Finkel game from 11 hours to less than 5 hours on my M1 Macbook Pro.

We open-sourced the solved game

You now have all you need to solve the Royal Game of Ur yourself! Although, if you'd rather experiment with the solved game straight away, we have open-sourced a few different implementations for training and using our models.

If you do experiment with the solved game, we'd love to hear what you manage to do! If you're interested, I invite you to join our Discord. We have a channel dedicated to discussing research topics, just like everything I've talked about above.

Now, let's zoom-out from the technical details and discuss how solving the Royal Game of Ur fits into the larger landscape of AI.

Can we solve more games? 🔮

It's difficult to find a game that aligns as perfectly with value iteration as the Royal Game of Ur. With its element of chance, non-finite gameplay, and limited number of positions, value iteration is the ideal choice for solving the game. Value iteration is rarely as ideal when applied to other games.

Some games we'd love to solve, such as Backgammon, have too many positions for value iteration. Ur has ~276 million positions, while Backgammon contains ~100 quintillion... This means that solving Backgammon using value iteration would require so much storage it verges on the unimaginable. You would need nearly 3% of the world's total data storage, or a trillion gigabytes.

Other popular games, such as Connect-Four or Yahtzee, would be inefficient to solve using value iteration. Connect-Four contains no dice, so it is more efficient to solve using search. Yahtzee has its dice, but it is simplified by its inescapable forwards motion. Once you have made an action in Yahtzee, you can never get back to a state before you took that action. This leads to Yahtzee being more efficient to solve by evaluating the game back to front instead.

We estimate that any game with a state-space complexity on the order of 1E+11 or lower could feasibly be solved using value iteration. This cuts out using value iteration to solve most complex games, including Backgammon.

The state-space and game-tree complexity metrics of popular games, including the Royal Game of Ur.
The state-space and game-tree complexity metrics of popular games, including the Royal Game of Ur.

This is why, despite solving Ur being a remarkable achievement for the game, it is not a breakthrough for solving more complex games. There may be some games of chance with perfect information that could be elevated to the title of strongly solved using value iteration. But value iteration alone will not allow us to solve complex games like Backgammon.

Yet, we should not discount value iteration entirely! In the tense final moves of a board game, value iteration can still provide a lot of value. Advanced AI systems for games often rely on databases to play flawlessly in endgames, and value iteration is one method that can be used to build those databases. Bearoff databases, for instance, are commonly used for perfect endgame strategies in Backgammon. Similar techniques could also allow flawless play at the end of other games like Ludo or Parcheesi.

You can defeat perfection! 🐼

You can now play the perfect player right here on RoyalUr by selecting the Panda as your opponent. The Panda is a ruthless opponent, but with some skill and a lot of luck, it is possible to win! If you'd like to see if you can beat the perfect AI yourself, you can try playing the Panda here.

Solving the game has also allowed us to release another major feature that we've been working on for a very long time: game review! Game review employs the perfect player to analyse your moves, highlight your best moments, and suggest improvements.

In other words, we've created a computer teacher to help you learn the game. I am truly proud of it, and we will continue working to make it the best teacher it can be. If you'd like to give it a try, you can access an example game review here.

Screenshot from a game review.
Screenshot from a game review.

This was not a solo effort 🌍

Let me illustrate this with a vastly shortened list of the people who contributed to solving the game. There are also many more people who contributed through the research discussions we have held since 2021!

  • Jeroen Olieslagers. The problem-solving master. Jeroen was the first to prove the feasibility of solving the Royal Game of Ur using value iteration by solving the 2-piece game. I wouldn't be writing this article if it weren't for Jeroen joining our Discord a year ago!
  • Raphaël Côté. The man that gets stuff done. Raph is the reason that you can load the perfect-play maps in Python. He is also the person that brings us all together to research the Royal Game of Ur!
  • Mr. Carrot. The catalyst for critical thinking. Mr. Carrot came up with the idea that the game is symmetrical, which halved the memory requirements of our models.
  • Padraig Lamont. I created this website, RoyalUr.net, back in 2017, and our Discord community in 2021. This is also me writing this, hello!

Join us! 🚀

Solving the Royal Game of Ur isn't an endpoint - it's a launchpad. As this ancient game moves into the modern age, people are creating beautiful boards, organising enthralling tournaments, and fostering the most welcoming communities you'll ever see. The modern era of the Royal Game of Ur is quickly marching into reality.

Join us! If you are interested in history, probability, AI, or if you enjoy a fun casual game with friends, you will probably like the Royal Game of Ur. Join our community, try your hand against our perfect AI, and who knows? You might just defeat perfection.

Head to our homepage to start your journey with the Royal Game of Ur. I look forward to playing you some time :D

...

Wow. That was a long one... I have never ever spent so much time on a single piece of writing, not even for my thesis. 100s of hours writing, rewriting, scrapping, and starting again.

A massive thank you to Jeroen for helping to write and correct this post, and also to my family that I constantly pestered to make sure my writing was clear. It is a strange relief and melancholy to finally be done with it.

Now I need to go and find something else to spend all my time on... like maybe Mancala?

Thanks :)
~ Paddy L.

Would you like to hear about future updates?

Join our newsletter and we'll send you emails when we have new updates or Royal Game of Ur news to share 📮 We also send photos of cool boards! 🖼️