OpenAI, the creator of the popular ChatGPT AI chatbot, claims to have found evidence suggesting that Chinese AI upstart DeepSeek used OpenAI’s data to train its own competing models.
The Verge reports that OpenAI and Microsoft are investigating whether Chinese AI rival DeepSeek has violated the terms of service by using OpenAI’s API to integrate its AI models into DeepSeek’s own offerings. According to sources from Bloomberg, Microsoft security researchers detected large amounts of data being exfiltrated through OpenAI developer accounts in late 2024, which the company believes are affiliated with DeepSeek.
OpenAI has stated that it found evidence linking DeepSeek to the use of distillation, a technique developers employ to train AI models by extracting data from larger, more capable ones. This method allows for the efficient training of smaller models at a fraction of the cost OpenAI spent to train its GPT-4 model, which exceeded $100 million. While developers can use OpenAI’s API to integrate its AI with their own applications, distilling the outputs to build rival models is strictly prohibited under OpenAI’s terms of service.
The irony of the situation has not gone unnoticed, as OpenAI itself made significant advancements with its GPT model by ingesting vast amounts of web-based content without explicit consent. David Sacks, the artificial intelligence czar under President Donald Trump, acknowledged the possibility of IP theft, stating, “There’s substantial evidence that what DeepSeek did here is they distilled knowledge out of OpenAI models and I don’t think OpenAI is very happy about this.”
In a statement to Bloomberg, OpenAI emphasized the constant efforts by China-based companies and others to distill models from leading US AI companies. As a leading builder of AI, OpenAI engages in countermeasures to protect its intellectual property, including a careful process for determining which frontier capabilities to include in released models. The company also stressed the critical importance of working closely with the US government to best protect the most capable models from efforts by adversaries and competitors to appropriate US technology.
Read more at the Verge here.
Lucas Nolan is a reporter for Breitbart News covering issues of free speech and online censorship.