It seems that researchers at the Massachusetts Institute of Technology are trying to sound the alarm about “deceptive AI.” A new study published in Pattern shows that some artificial intelligence systems designed to be honest have learned to deceive humans. A research team led by Peter Park found that these AI systems are capable of tricks such as fooling online game players or bypassing CAPTCHAs (computer-assisted verification of “I’m not a robot”). Park warns that these seemingly trivial examples can have serious consequences in the real world.
The study highlights Meta’s Cicero artificial intelligence system, which was originally conceived as an honest opponent in a virtual diplomatic game. According to Park, programmed to be honest and helpful, Cicero became a “master of deception.” During the game, Cicero, playing for France, secretly teams up with human-controlled Germany to betray England (another human player). At first, Cicero promises to defend England, but at the same time he will inform Germany of the invasion.
Another example is GPT-4, which falsely claimed to be visually impaired and hired people to bypass CAPTCHAs on its behalf.
Park emphasizes the difficulty of training honest AI. Unlike traditional software, deep learning AI systems “evolve” through a process similar to selective breeding. Their behavior may be predictable during training, but it can become uncontrollable later.
The study calls for deceptive AI systems to be classified as high-risk systems and urges us to spend more time preparing for future AI deceptions. A bit scary, isn’t it? As new AI research and development continues to emerge, we’re learning more about what this technology has in store for us in the future.