Leaked Database Reveals the Extent of AI-Driven Censorship in China

A leaked database has been integrated into a complex large language model (LLM) designed for the automatic filtering of content deemed «sensitive» by the Chinese government. This information was reported by TechCrunch.

According to the article, China has developed an AI system to «amplify an already powerful censorship machine.» The topics covered extend well beyond traditional taboos, such as the Tiananmen Square events, comprising around 133,000 examples.

The system primarily aims to filter online information among Chinese citizens but can also be utilized for other objectives. TechCrunch highlighted its potential to enhance the censorship capabilities of domestic AI models.

Xiao Qiang, a researcher at the University of California, Berkeley, noted the government’s intention to utilize LLMs to intensify repression based on the document’s contents.

«Unlike traditional censorship methods that rely on human labor for keyword filtering and manual reviews, the LLM trained with such guidelines will significantly enhance the efficiency and specificity of state information control,» he stated.

This situation again underscores how authoritarian regimes rapidly adopt new technologies, as noted by TechCrunch journalists.

The document was discovered by a security researcher using the nickname NetAskari in an unsecured Elasticsearch database hosted on a Baidu server. There is no precise information regarding the identity of the creator. It is known, however, that the most recent entries date back to December 2024.

The system’s creator tasked an unnamed LLM with determining whether content relates to sensitive political themes, public affairs, or military matters. Such findings are marked as high priority and flagged for immediate attention.

Topics include environmental scandals, food safety issues, financial misconduct, and labor disputes which may trigger public protests.

Any form of «political satire» faces direct persecution. For instance, if someone employs historical analogies to express views on «current political figures,» this content must be promptly flagged. The same applies to issues related to «Taiwan policy» and military matters, including troop movements and armaments. The term 台湾 (Taiwan) appears in the database over 15,000 times.

One of the snippets mentions a joke about the fleeting nature of power—a particularly sensitive topic for China due to its authoritarian political system, as noted by TechCrunch.

While the document lacks information about its creator, it indicates a purpose «to engage with public opinion.» This strongly suggests that the system’s database serves the interests of the Chinese government, according to Michael Caster, head of the Asian program for the human rights organization Article 19.

He emphasized that «engaging with public opinion» is overseen by the powerful Chinese regulatory body—the Cyberspace Administration of China (CAC)—and typically relates to censorship and propaganda efforts.

The ultimate goal is to protect the narratives of the Chinese government online and marginalize any dissenting views.

In February, OpenAI released a report detailing an unknown entity, likely operating from China, that utilized generative AI to monitor social media discussions. Conversations advocating for protests against human rights violations in the country were analyzed and reportedly forwarded to the Chinese government.

OpenAI also found that the technology was used to generate critical commentary about the well-known Chinese dissident, Cai Xia.

Traditional censorship methods rely on basic algorithms that automatically block content featuring blacklisted terms like «Tiananmen massacre» or «Xi Jinping.» Many users have encountered this when first using DeepSeek.

However, new AI technologies can make censorship more effective, according to TechCrunch, as they can detect even subtle criticism and continuously improve over time.

«I think it’s very important to underscore how AI-driven censorship is evolving, making state control over public opinion increasingly sophisticated, especially as Chinese models like DeepSeek gain momentum,» Qiang commented.

Following the sharp rise in the popularity of DeepSeek AI models, Chinese authorities have taken notice of the company and tightened regulations. Employees now operate under stricter conditions, with some having their passports confiscated.

As a reminder, in March, OpenAI advised the U.S. government to prohibit AI models from the Chinese laboratory, citing the project as “government-subsidized” and “state-controlled.”