Remember that selfie you posted last week? There’s currently nothing stopping someone taking it and editing it using powerful generative AI systems. Even worse, thanks to the sophistication of these systems, it might be impossible to prove that the resulting image is fake.
The good news is that a new tool, created by researchers at MIT, could prevent this
The tool, called PhotoGuard, works like a protective shield by altering photos in tiny ways that are invisible to the human eye but prevent them from being manipulated. If someone tries to use an editing app based on a generative AI model such as Stable Diffusion to manipulate an image that has been “immunized” by PhotoGuard, the result will look unrealistic or warped.
Right now, “anyone can take our image, modify it however they want, put us in very bad-looking situations, and blackmail us,” says Hadi Salman, a PhD researcher at MIT who contributed to the research. It was presented at the International Conference on Machine Learning this week.
PhotoGuard is “an attempt to solve the problem of our images being manipulated maliciously by these models,” says Salman. The tool could, for example, help prevent women’s selfies from being made into nonconsensual deepfake pornography.
The need to find ways to detect and stop AI-powered manipulation has never been more urgent, because generative AI tools have made it quicker and easier to do than ever before. In a voluntary pledge with the White House, leading AI companies such as OpenAI, Google, and Meta committed to developing such methods in an effort to prevent fraud and deception. PhotoGuard is a complementary technique to another one of these techniques, watermarking: it aims to stop people from using AI tools to tamper with images to begin with, whereas watermarking uses similar invisible signals to allow people to detect AI-generated content once it has been created.
The MIT team used two different techniques to stop images from being edited using the open-source image generation model Stable Diffusion.
The first technique is called an encoder attack. PhotoGuard adds imperceptible signals to the image so that the AI model interprets it as something else. For example, these signals could cause the AI to categorize an image of, say, Trevor Noah as a block of pure gray. As a result, any attempt to use Stable Diffusion to edit Noah into other situations would look unconvincing.
The second, more effective technique is called a diffusion attack. It disrupts the way the AI models generate images, essentially by encoding them with secret signals that alter how they’re processed by the model. By adding these signals to an image of Trevor Noah, the team managed to manipulate the diffusion model to ignore its prompt and generate the image the researchers wanted. As a result, any AI-edited images of Noah would just look gray.
The work is “a good combination of a tangible need for something with what can be done right now,” says Ben Zhao, a computer science professor at the University of Chicago, who developed a similar protective method called Glaze that artists can use to prevent their work from being scraped into AI models.
Tools like PhotoGuard change the economics and incentives for attackers by making it more difficult to use AI in malicious ways, says Emily Wenger, a research scientist at Meta, who also worked on Glaze and has developed methods to prevent facial recognition.
“The higher the bar is, the fewer the people willing or able to overcome it,” Wenger says.
A challenge will be to see how this technique transfers to other models out there, Zhao says. The researchers have published a demo online that allows people to immunize their own photos, but for now it works reliably only on Stable Diffusion.
And while PhotoGuard may make it harder to tamper with new pictures, it does not provide complete protection against deepfakes, because users’ old images may still be available for misuse, and there are other ways to produce deepfakes, says Valeriia Cherepanova, a PhD researcher at the University of Maryland who has developed techniques to protect social media users from facial recognition.
In theory, people could apply this protective shield to their images before they upload them online, says Aleksander Madry, a professor at MIT who contributed to the research. But a more effective approach would be for tech companies to add it to images that people upload into their platforms automatically, he adds.
It’s an arms race, however. While they’ve pledged to improve protective methods, tech companies are still also developing new, better AI models at breakneck speed, and new models might be able to override any new protections.
The best scenario would be if the companies developing AI models would also provide a way for people to immunize their images that works with every updated AI model, Salman says.
Trying to protect images from AI manipulation at the source is a much more viable option than trying to use unreliable methods to detect AI tampering, says Henry Ajder, an expert on generative AI and deepfakes.
Any social media platform or AI company “needs to be thinking about protecting users from being targeted by [nonconsensual] pornography or their faces being cloned to create defamatory content,” he says.