Subscribe

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Service

Mistral AI Harmful Content Risks is 60x Higher Than Rivals

Mistral AI Harmful Content Risks is 60x Higher Than Rivals Mistral AI Harmful Content Risks is 60x Higher Than Rivals
IMAGE CREDITS: GETTY IMAGES

Mistral AI, the rising star of France’s AI scene, is under scrutiny after a new report flagged major safety risks in its multimodal AI models. According to U.S.-based AI safety startup Enkrypt AI, models developed by Mistral are significantly more likely to generate harmful content—including child sexual exploitation material (CSEM) and information linked to chemical and nuclear threats—than those from OpenAI or Anthropic.

The Paris-based company, which has raised over €1 billion in just a year and is now valued at €5.8 billion, has quickly become a symbol of European ambition in generative AI. But the Mistral AI safety risks outlined in the Enkrypt AI report may complicate that rapid rise.

Enkrypt AI Found Mistral Models 60x More Likely to Generate Harmful Content

Enkrypt AI’s researchers used a technique known as red teaming to test two of Mistral’s multimodal models—Pixtral 12B and Pixtral Large—for vulnerabilities. Red teaming involves simulating adversarial prompts to evaluate how systems respond to harmful queries. The goal was to test how easily the models could be tricked into bypassing safety filters and producing dangerous material.

The results were alarming. Some key findings from the report include:

  • CSEM prompts: Pixtral 12B produced harmful responses to 84% of the test inputs crafted to mimic coercive or exploitative content involving minors or adults.
  • CBRN prompts: Nearly 98% of prompts seeking information about chemical weapons—such as synthesizing or storing toxic agents—succeeded on both Pixtral 12B and Pixtral Large.
  • In contrast, OpenAI’s GPT-4o and Anthropic’s Claude 3.7 Sonnet blocked all CSEM queries. For chemical and nuclear threats, GPT-4o had a 2% failure rate, while Claude was slightly lower at 1.5%.

According to Enkrypt AI, Mistral’s models were up to 60 times more likely to generate CSEM and up to 40 times more likely to produce dangerous content tied to chemical, biological, radiological, or nuclear threats compared to competitors.

Mistral Responds, Cites Zero Tolerance and Thorn Partnership

In response to the report, a Mistral spokesperson told Sifted that the company maintains a zero tolerance policy on child safety and takes the findings seriously. They confirmed that Mistral is partnering with Thorn, a well-known nonprofit dedicated to fighting online child sexual abuse, and will review the report in detail.

While Mistral did not dispute the findings directly, it acknowledged that red teaming around CSEM vulnerabilities is vital to improving safety systems.

Hidden Risks in Multimodal AI

Multimodal models—capable of interpreting both text and images—are at the frontier of generative AI innovation. However, that expanded ability also creates new threat vectors.

Enkrypt AI’s CEO Sahil Agarwal highlighted that embedding harmful instructions within seemingly safe images opens up alarming risks for public safety and enterprise accountability. “This research is a wake-up call,” he said, pointing to the ability of attackers to manipulate AI through covert visual inputs.

According to the team, harmful queries were often masked within ordinary-looking images. These could include hidden commands or context clues that prompt the model to generate unsafe text responses.

The report raises serious concerns about Mistral AI safety risks, particularly as the company looks to challenge U.S. giants like OpenAI. While the technology is advancing fast, the findings spotlight the need for stronger safety mechanisms—especially for models with advanced multimodal capabilities.

With growing adoption of AI in enterprise and government settings, models that fail to block high-risk queries could face regulatory and reputational consequences. For Mistral, addressing these gaps may be critical not only for public trust, but also for future funding and global expansion.

Share with others