CICERO is Meta’s latest AI system that can negotiate with humans in natural language, convince them of strategies and cooperate with them. The strategic board game “Diplomacy” serves as a reference.
According to Meta, CICERO is the first language AI capable of playing the board game “Diplomacy” on a human level. In Diplomacy, players negotiate the balance of European forces before World War I.
Representing Austria-Hungary, England, France, Germany, Italy, Russia and Turkey, players form strategic alliances and break them when it is to their advantage. All movements are planned and then executed simultaneously. Skillful negotiation is therefore at the heart of the game.
Diplomacy at the human level
CICERO is optimized for diplomacy, and the game also serves as a benchmark for the model’s language skills: in 40 games online for 72 hours on “webDiplomacy.net”, CICERO scored more than twice the average score of human and s is ranked among the best. ten percent, according to Meta.
Agent CICERO is designed to negotiate and build alliances with humans, according to Meta. AI should be able to infer players’ beliefs and intentions from conversations – a task that Meta’s research team found was considered a challenge. “a big, almost impossible challenge” in the development of AI for decades.
According to Meta, CICERO is so good at playing Diplomacy that human players prefer to team up with AI. In online games, CICERO encountered 82 different human players who did not know that CICECO is an AI system. Only one player expressed suspicion of a bot in the chat after a game, but it had no consequences.
In the paper, the researchers describe a case in which CICERO was able to deter a human player from making a planned move and convince him to make a new, mutually beneficial move.
Plan first, then talk
Basically, CICERO works with two systems: one plans the movements for itself and its partners, the second translates these movements into natural language and explains them to the players to convince them of its planning.
CICERO’s language model is based on a pre-trained Transformer Language Model (BART) with 2.7 billion parameters that has been refined with over 40,000 diplomacy sets. The anonymized game data included well over twelve million messages exchanged between human players, which CICERO processed during training.
According to Meta, the supervised training approach with human game data, which is classic for gaming AIs and involves cloning the behavior of human players, would result in a gullible agent in diplomacy that could be easily manipulated, for example with a phrase such as, “I’m glad we agreed that you would move your unit out of Paris!” Additionally, a purely supervised trained model could learn spurious correlations between dialogues and actions.
With the “piKL” (policy-regularized) iterative planning algorithm, the model optimizes its initial strategy based on strategy predictions for other players, while trying to stay close to its initial prediction. “We found that piKL better models human gaming and leads to better agent policies compared to supervised learning alone,” writes Meta AI.
Cicero uses a strategic reasoning module to intelligently select intentions and actions. This module runs a planning algorithm that predicts the policies of all other players based on the state of the game and dialogue so far, taking into account both the strength of different actions and their probability in the human games, and chooses an optimal action for Cicero based on these predictions.
Planning relies on a value and policy function trained via RL self-gambling that penalized the agent for straying too far from human behavior in order to maintain a human-compatible policy. During each trading period, intentions are recalculated each time Cicero sends or receives a message. At the end of each turn, Cicero plays his most recently calculated intention.
According to Meta, one possible use case for CICERO-style systems is for advanced digital assistants that hold longer, streamlined conversations with people and teach them new knowledge or skills during those conversations. CICERO himself can only play diplomacy.
The system also makes mistakes, such as occasionally sending messages with illogical justifications, contradicting its plans, or being “otherwise strategically inadequate”. The researchers tried to hide these errors as best they could with a series of filters. They attribute the fact that CICERO was not exposed as a bot despite making mistakes to in-game time pressure and the fact that humans sometimes make similar mistakes.
In the deliberate use of conversational AI, there remain “many open problems” in human-agent collaboration, for which diplomacy is a good testing ground, the researchers write.
Meta publishes Cicero’s code as open source on Github. For more detailed information on Meta’s AI project, visit the Cicero project page.