You got to know when to hold ’em, and know when to fold ’em — and when it comes to betting on human superiority in the game of poker, it may be time to fold ’em.
Carnegie Mellon University researchers laid their cards on the table in a study published this week in the journal Science, explaining how they designed their Libratus AI program to beat four professional poker players in no-limit Texas Hold’em.
Poker is different, in that one player doesn’t know exactly what the other players have in their hands. That leaves the door open for what would seem to be peculiarly human behaviors such as bluffing. It was once thought that such imperfect-information games would be tough to crack using machine learning.
Carnegie Mellon computer science professor Tuomas Sandholm and Ph.D. student Noam Brown showed how it could be done.
Libratus emerged victorious at the end of a 20-day competition with four poker pros, conducted in January at Pittsburgh’s Rivers Casino. The software program beat each of the players individually in the two-player game of Head’s-Up, No-Limit Texas Hold’em, and amassed more than $1.8 million in chips at the end of 120,000 hands.
Sandholm and Brown reported that Libratus defeated the humans by 14.7 big blinds per game, which is a decisive win by poker standards.
“The techniques in Libratus do not use expert domain knowledge or human data and are not specific to poker,” they said in the Science paper. “Thus, they apply to a host of imperfect-information games.”
And not just games: Making decisions on the basis of imperfect information is key to real-world strategic interactions such as business negotiations, finance, cybersecurity and military planning.
So how’d the researchers do it? They used a triple-pronged approach. First, they developed an algorithm that simplifies the 10121 decision points in a typical poker game. The algorithm produces an abstract blueprint for game play that’s detailed for the early rounds of betting, but looser for the later rounds.
“Intuitively, there is little difference between a king-high flush and a queen-high flush,” Brown explained in a news release. “Treating those hands as identical reduces the complexity of the game and, thus, makes it computationally easier.”
As the game progresses to its climax, a second software module fine-tunes the blueprint based on the state of play, and figures out its strategy going forward in real time. If an opposing player makes a move that the strategy doesn’t anticipate, the strategy is reworked to incorporate that unexpected move. This part of the process is called nested subgame solving.
The third module, known as the self-improver, analyzes how much Libratus’ opponent is betting to detect potential gaps in its strategy. The software uses that information to fill the gaps with new branches in its decision tree.
The triple-play strategy bested a different poker-playing AI known as Baby Tartanian8, and went on to clean up in the 20-day match against professional Texas Hold’Em players Jason Les, Dong Kim, Daniel McCauley and Jimmy Chou.
“The most surprising thing was its ability to adjust, its ability to just learn every day and get better,” Chou said in a Carnegie Mellon video about the match. “It’s been taxing on us to try to find weaknesses.”
Les said that “you really have to pry every chip you can out of Libratus’ hands,” but added that the program didn’t shy away from big bets.
“You don’t often see play like Libratus, where it’s like 250 percent, 500 percent — all in for like $2,000 in the middle? Libratus is all in for $19,000,” Les said.
To some, Carnegie Mellon’s experiment may sound like a “Futurama” episode. “No need for an AI-induced nuclear war. The machines can just take our money in high-stakes poker games,” Alex Hanshaw, director of engineering at California-based Actian, joked in a tweet.
But Sandholm said the implications are deadly serious.
“If we are able to show that the best AI has surpassed the quality of the best humans in strategic thinking under imperfect information, that would have tremendous implications,” he said.
Libratus’ technology has been exclusively licensed to Strategic Machine Inc., a company founded by Sandholm to apply strategic reasoning technologies to a range of applications. The National Science Foundation and the Army Research Office supported the research.