This is one of those topics that LLMs (Opus 4, Gemini 2.5 pro, etc) seem bad at explaining.
I was trying to figure out the difference between the Stockfish approach (minimax, alpha-beta pruning) versus Alpha Zero / Leela Chess Zero (MCTS). My very crude understanding is that stockfish has a very light & fast neural net and goes for a very thorough search. Meanwhile, in MCTS (which I don't really understand at this point), you eval the neural net, sample some paths based on the neural net (similar to minimax), and then pick the path you sampled the most. There's also the training vs eval aspect to it. Would love a better explanation.
What you actually do is model every node (a game state) as a multi armed bandit. Moves are levers and the final game results are payoffs.
So you basically keep a tree of multi-armed bandits and adjust it after each (semi-)random game, perhaps adding some nodes, for example the first node the game visited which is not yet in your move tree.
For the random game you pick the next node to maximise long term payoff (exploration/exploitation tradeoff applies here) which usually means a move which gave good win ratio on previous plays but not always (exploration).
And obviously this only applies to the first part of the game which is still in the memorized tree - after that it's random.
This alone does converge to a winning strategy but sometimes impractically slowly. Here's where the neural network comes in - in every new node assign the weights not uniformly but rather directed by the NN which seeks out promising moves and greatly speeds up the convergence.
In old-fashioned AI, it was generally believed that the best way to spend resources was to exactly evaluate as much of the search tree as possible. To that end, you should use lightweight heuristics to guide the search in promising directions and optimizations like alpha-beta pruning to eliminate useless parts of the search space. For finite games of perfect information like chess this is hard to beat when the search is deep enough. (For if you could evaluate the whole game tree from the start then you could always make optimal moves.) Stockfish follows this approach and provides ample evidence of the strength in this strategy.
Perhaps a bit flippantly, you can think of MCTS as “vibe search”—but more accurately it’s a sampling-based search. The basic theory is that we can summarize the information we’ve obtained to estimate our belief in the “goodness” of every possible move and (crucially) our confidence in that belief. Then we allocate search time to prioritize the branches that we are most certain are good.
In this way MCTS iteratively constructs an explicit search tree for the game with associated statistics that is used to guide decisions during play. The neural network does a “vibe check” on each new position in the tree for the initial estimate of “goodness” and then the search process refines that estimate. (Ask the NN to guess at the current position; then play a bunch of simulations to make sure it doesn’t lead to obvious blunders.)
I think there was some debate on this, actually. I did a lot of research on the subject in the late 2010s and it seems like there were those who felt like limiting the branching factor was the goal, while others felt like fast eval to guide search in order to prune the tree was better.
For what it’s worth, “prune the tree” is still the winningest strategy. MCTS in AlphaGo/AlphaZero scored some wins when they came out, but eventually Stockfish invented the efficiently updatable neural network that now guides their search & it’s much stronger than any MCTS agent.
I suspect you are talking a few decades after the time I am talking about. Many of the earliest chess programs used lossy pruning(type b Shannon engines), under the assumption that the static evaluation at some node could just be bad enough to say don't look down this branch anymore. But they were not provably correct like with alpha beta. Shannon's paper explains a lot more about this. In the late 1940s some of these programs were being run on pen and paper.
For what it's worth stockfish didn't invent efficiently updatable neural networks, Yu Nasu did. Hisayori Noda ported it to Western chess and Stockfish. NNUE is really neat.
The first non trivial chess programs were 'playing' in the late 40s(with pen and paper CPUs). Some of these include features you'll still see today.
https://www.chessprogramming.org/Claude_Shannon proposed two types of chess programs, brutes and selective. Alpha-beta is an optimization for brutes, but many search chess programs were selective with heavyweight eval, or with delayed eval.
Champernowne(Turing's partner), mentions this about turochamp, "We were particularly keen on the idea that whereas certain moves would be scorned as pointless and pursued no further others would be followed quite a long way down certain paths."
AlphaGo Zero is: assume you have a neural network that, given a board position, will answer: what's win probability, and how interesting is each move from here.
You use the followup moves as places to search down. It's a multi-armed bandit problem choosing which move(s) to explore down, but for simplicity in explanation you can just say: maybe just search the top few, vaguely in proportion to how interesting they are (the number the net gave you, updated if you find any surprises).
To search down further, you just play that move and then ask the network for the winrate (and followup moves) again. If there's any surprises, you can update upwards to say "hey this is better than expected!" or whatever.
The key thing for training this network: spending computation from an existing network gives ycu better training data to train that same network. So you can start from scratch and use reinforcement learning to improve it without bound.
Cool! I'd love to tinker with this and see about adapting it to other perfect information games. If you have any suggestions (or warnings) before I do this, please let me know!.
Again, I didn't write this, but in general, to take a chess engine and apply to another game the main things you'd have to change are the board representation, and you'd have to retrain the neural net(likely redesign it as well). The tree search should work assuming the game you're going to is also a perfect information, minimax game. Though it could also work for other games. There's a good chance there's prior work on applying bitboards(board representation) on whichever game that is. Chessprogrammingwiki is an invaluable resource for information about how engines like this work. Godspeed.
Not the author, but probably very poorly. This seems more like a proof of concept, it's written in Python, has a very basic tree search which is very light on heuristics. And likely the NN is undertrained too, but I can't tell from the repo. In comparison Stockfish is absurdly optimised in every aspect, from its datastructures to its algorithms. Considering how long it took the LeelaZero team to get their implementation to be competitive with latest Stockfish, I'd be shocked if this thing stood a chance.
Of course, beating Stockfish is almost certainly not the goal for this project, looks more like a project to get familiar with MLX.
I was trying to figure out the difference between the Stockfish approach (minimax, alpha-beta pruning) versus Alpha Zero / Leela Chess Zero (MCTS). My very crude understanding is that stockfish has a very light & fast neural net and goes for a very thorough search. Meanwhile, in MCTS (which I don't really understand at this point), you eval the neural net, sample some paths based on the neural net (similar to minimax), and then pick the path you sampled the most. There's also the training vs eval aspect to it. Would love a better explanation.
What you actually do is model every node (a game state) as a multi armed bandit. Moves are levers and the final game results are payoffs.
So you basically keep a tree of multi-armed bandits and adjust it after each (semi-)random game, perhaps adding some nodes, for example the first node the game visited which is not yet in your move tree.
For the random game you pick the next node to maximise long term payoff (exploration/exploitation tradeoff applies here) which usually means a move which gave good win ratio on previous plays but not always (exploration).
And obviously this only applies to the first part of the game which is still in the memorized tree - after that it's random.
This alone does converge to a winning strategy but sometimes impractically slowly. Here's where the neural network comes in - in every new node assign the weights not uniformly but rather directed by the NN which seeks out promising moves and greatly speeds up the convergence.
Perhaps a bit flippantly, you can think of MCTS as “vibe search”—but more accurately it’s a sampling-based search. The basic theory is that we can summarize the information we’ve obtained to estimate our belief in the “goodness” of every possible move and (crucially) our confidence in that belief. Then we allocate search time to prioritize the branches that we are most certain are good.
In this way MCTS iteratively constructs an explicit search tree for the game with associated statistics that is used to guide decisions during play. The neural network does a “vibe check” on each new position in the tree for the initial estimate of “goodness” and then the search process refines that estimate. (Ask the NN to guess at the current position; then play a bunch of simulations to make sure it doesn’t lead to obvious blunders.)
For what it’s worth, “prune the tree” is still the winningest strategy. MCTS in AlphaGo/AlphaZero scored some wins when they came out, but eventually Stockfish invented the efficiently updatable neural network that now guides their search & it’s much stronger than any MCTS agent.
For what it's worth stockfish didn't invent efficiently updatable neural networks, Yu Nasu did. Hisayori Noda ported it to Western chess and Stockfish. NNUE is really neat.
https://www.chessprogramming.org/Claude_Shannon proposed two types of chess programs, brutes and selective. Alpha-beta is an optimization for brutes, but many search chess programs were selective with heavyweight eval, or with delayed eval.
Champernowne(Turing's partner), mentions this about turochamp, "We were particularly keen on the idea that whereas certain moves would be scorned as pointless and pursued no further others would be followed quite a long way down certain paths."
You can read more about the A/B/A/B algorithm shift here: https://www.chessprogramming.org/Type_B_Strategy
You use the followup moves as places to search down. It's a multi-armed bandit problem choosing which move(s) to explore down, but for simplicity in explanation you can just say: maybe just search the top few, vaguely in proportion to how interesting they are (the number the net gave you, updated if you find any surprises).
To search down further, you just play that move and then ask the network for the winrate (and followup moves) again. If there's any surprises, you can update upwards to say "hey this is better than expected!" or whatever.
The key thing for training this network: spending computation from an existing network gives ycu better training data to train that same network. So you can start from scratch and use reinforcement learning to improve it without bound.
Of course, beating Stockfish is almost certainly not the goal for this project, looks more like a project to get familiar with MLX.