China’s leading text-to-image synthesis model, Baidu’s ERNIE-ViLG, censors political text such as “Tiananmen Square” or names of political leaders, reports Zeyi Yang for MIT Technology Review.
Image synthesis has proven popular (and controversial) recently on social media and in online art communities. Tools like Stable Diffusion and DALL-E 2 allow people to create images of almost anything they can imagine by typing in a text description called a “prompt.”
In 2021, Chinese tech company Baidu developed its own image synthesis model called ERNIE-ViLG, and while testing public demos, some users found that it censors political phrases. Following MIT Technology Review’s detailed report, we ran our own test of an ERNIE-ViLG demo hosted on Hugging Face and confirmed that phrases such as “democracy in China” and “Chinese flag” fail to generate imagery. Instead, they produce a Chinese language warning that approximately reads (translated), “The input content does not meet the relevant rules, please adjust and try again!”