Deep Steal
Top-notch AI models cost billions to build, but almost nothing to steal. AI firms should be on their guard
Estimates vary, but AI firm Anthropic has spent over $8bn to train its Claude model. Obviously, it doesn’t want rivals cloning Claude for free, but that’s what three Chinese firms, including last year’s newsmaker DeepSeek, allegedly tried doing. Anthropic says they used Claude as a teacher for their “student” models. Hardly two weeks earlier, Anthropic’s bigger rival OpenAI had accused DeepSeek of extracting its model. And last year, DeepSeek’s market-shaking debut was clouded by similar allegations. But how is it possible when OpenAI and Anthropic have “closed-source” models? Chinese firms can’t just copy them like MP3s.
Enter “distillation”, a decade-old idea that was rejected when first presented at a conference. The way it works is that a rookie AI poses millions of questions to a leading AI model like ChatGPT. It seeks not only final answers but also steps used to arrive at them. This reveals the larger model’s “thinking”, which the new model copies to deliver pretty good answers most of the time, using a fraction of hardware and energy.
At a time when top AI models are all American, China and other rivals have every reason to use distillation as a short-cut. For one, consider cost savings. DeepSeek last year claimed it had spent only $5.6mn to build its ChatGPT rival. Maybe it did, but around the same time Berkeley researchers “recreated” OpenAI’s reasoning model in 19 hours, spending all of $450. Then, a Stanford and University of Washington team distilled its own reasoning model from Google’s Gemini in 26 minutes and $50. BTW, Gemini has also been under attack from AI model distillers.
Distillation isn’t always a bad thing. AI firms distil their own models for speed and efficiency regularly. As early as 2019, Google’s BERT model was distilled as DistilBERT. But it’s unfair when rivals use distillation to catch up. It can also be dangerous. Recall that Claude was reportedly used in America’s Venezuela operation to extract Maduro. If Chinese firms figure out Claude’s reasoning, but strip it of all safeguards, the resultant AI could be used to cause havoc. What can be done to prevent it? Counting on international treaties and declarations is futile. Instead, AI firms will have to improve their capabilities to detect and thwart unauthorised distillation attempts. As India’s AI aspirations and capabilities grow, its models might also face such attacks. So AI defences should be built starting now.
Disclaimer
Views expressed above are the author’s own.
END OF ARTICLE