About DeepSeek v3
DeepSeek v3 was released a couple of days ago, so naturally I took the API for a spin. My first impression: the quality of its answers is genuinely impressive.
I also noticed a few discussions about it on X. One especially interesting test came from Breck Yunits who compared Claude Sonnet 3.5 and DeepSeek v3 on frontend code generation. His think DeepSeek won.
Then someone in the replies suggested that he should try a prompt related to Chinese politics.
As we knows, OpenAI and Anthropic have long had fairly visible safety guardrails in place. So it was a little unexpected to see that DeepSeek could actually generate the relevant frontend page for that kind of prompt.
I then tried the same experiment myself using the API through a self-hosted OpenWebUI instance, importing the same prompt. And when I used the original English prompt, DeepSeek returned a normal, usable response. But when I translated the same prompt into Chinese, the answer changed completely.
After that, I tested the same English prompt with several other models. Both ChatGPT-4o and Claude Sonnet 3.5 refused to generate a normal response, mostly replying with some variation of: “Sorry, I can’t assist with this sensitive topic.”But Grok 2 and Gemini 2.0 Flash both produced normal responses.
Based on this small experiment, here’s my thought:
In an English-language context, DeepSeek seems to apply very few restrictions around this category of topics, at least in terms of the type and boundary of discussion. But in a Chinese-language context, it appears to handle sensitive political topics differently — possibly through safety or compliance training based on predefined data, or through special response patterns triggered by certain keywords.
Of course this is only a casual test, not a benchmark. But it does suggest something worth paying attention to: model behavior may not only vary by company or safety philosophy, but also significantly by language.