Microsoft is investigating if DeepSeek illegally accessed OpenAI data to train its AI model

DragonSlayer101 · Wednesday at 12:40 PM

In context: AI startup DeepSeek stunned the world with the release of its R1 AI model, which can mimic human reasoning at a level comparable to the best OpenAI models to date. While the company has received widespread acclaim for its achievement, it is now reportedly under investigation by Microsoft and OpenAI for allegedly accessing OpenAI's data output illegally to train its AI model.

According to unnamed sources cited by Bloomberg, the probe started last fall after Microsoft's security researchers discovered that a group linked to DeepSeek had accessed a large amount of data through OpenAI's API. While developers can legally pay for a license to use this data in their applications, Microsoft suspects that DeepSeek's actions violated OpenAI's terms of service.

Following the discovery, Microsoft informed OpenAI that it had detected an attempt by a DeepSeek-affiliated group to bypass restrictions on how much data a single party can access. The two companies then launched a joint investigation into the incident, which is now being treated as a potential data breach.

An OpenAI spokesperson declined to confirm the alleged data theft but told Reuters that the company employs cutting-edge "countermeasures" to protect its intellectual property. They added that OpenAI is working closely with the U.S. government to prevent its AI models from falling into the hands of foreign adversaries and competitors.

However, the White House's AI and crypto czar, David Sacks, was less restrained in his response to the report. In an interview with Fox News, he claimed there was "substantial evidence" that DeepSeek had accessed OpenAI's data in an unethical and possibly illegal manner. Microsoft refused to comment, while DeepSeek could not be reached for its reaction on the controversy.

With the launch of its all-conquering AI model, DeepSeek appears poised to challenge OpenAI, Google, and Meta in the field of AI research. However, if the allegations turns out to be accurate, it could spell trouble for the Chinese AI firm, which saw its new app overtake ChatGPT to become the top free app on both the App Store and Play Store in the U.S. this week.

DeepSeek also received unexpected praise from President Trump this week when he described the release of the R1 LLM as a "positive" development and an "asset" for the global tech sector. He noted that if DeepSeek delivers on its promise of accelerating AI training at a lower cost, "that's good (and) I view that as a positive."

Permalink to story:

Microsoft is investigating if DeepSeek illegally accessed OpenAI data to train its AI model

MSIGamer · Wednesday at 12:54 PM

LOL talk about hypocrisy. Everything online is free bites you back in the *** now. Ridiculous companies.

axiomatic13 · Wednesday at 1:01 PM

Captain Paypal here doesn't know a damn thing. He's grifting.

AnilD · Wednesday at 1:13 PM

OpenAI discovering what it feels like to have your data taken without permission is like a pirate getting mad that someone else is looting their treasure.

The irony aside, this raises an important question: If OpenAI is upset about this, does that mean they do believe there should be some kind of ownership or protection over training data? Because that’s not the argument they’ve been making in court.

unoficialoficial · Wednesday at 1:15 PM

AI is a collaborative effort, similar to Wikipedia. Therefore, there's no need to feel embarrassed if a more efficient model temporarily surpasses the current one. Neural network weights are not a form of data that can be copyrighted. Without copyright, you cannot license or enforce rules related to them.
Furthermore, the outputs generated by any neural network stem from synthetic intelligence that isn't legally recognized as having a personality. Consequently, these outputs are not subject to intellectual property rights and are by default in the public domain. This is because freedom of access to information is a fundamental right.
Training AI is a hobby for the ultra-wealthy, a side project and a path to make a gift to humanity. Due to its nature, AI isn't inherently suited for large-scale commercial exploitation.
I was writing a few weeks ago that a $200/month and $2,000/month will backfire if a better model becomes available for less...

Dr Roboto · Wednesday at 1:31 PM

It serves you right, OpenAI. How does it feel?

LetTheWookieWin · Wednesday at 1:31 PM

China stealing someone else’s intellectual property? Who would have guessed? /sarc

DanteSmith · Wednesday at 2:45 PM

Stealing is cheaper. Got it, China.

The Talking Tech · Wednesday at 2:53 PM

Of course the U.S tech companies will try to find fault on them as that Chinese company showed them how they can have more efficient results with far less money.

NumberNine · Wednesday at 5:36 PM

All the pearl clutching has me in tears with laughter. As others have pointed out, Open AI doesn't have a leg to stand on. I'm sure the government puppets and their billionaire owners are seething at the thought of someone else stealing data better.

themastergoose · Wednesday at 5:40 PM

Did I just read that OpenAI is unhappy their stolen data to train AI has been stolen by another AI?

Never thought I’d ever be typing that in my lifetime.

BuckarooBonzaii · Wednesday at 5:41 PM

In other words "Don't Con the Con" sort of speak. Where money is concerned honor among thieves is out the window.

mcnabney · Wednesday at 5:57 PM

Oh no, some company stole data from another company whose data is created from stolen IP.

Mr Majestyk · Wednesday at 7:06 PM

Hilarious how US clowns think US doesn't do IP theft, only China. The entire AI industry is based on theft.

maxxcool7421 · Wednesday at 7:50 PM

<<"Substantial evidence" of data theft, says Trump's AI and crypto czar>>

Says the orange turd who gutted cyber security in his first term and opened tech trade with china

fazy shah · Wednesday at 8:54 PM

The openAi legally used the copyright materials to train and assassinate the whistleblower and made it look like a suicide.

SO only you have the right to steal and use private data.

scoffer · Wednesday at 10:13 PM

unoficialoficial said:
Training AI is a hobby for the ultra-wealthy, a side project and a path to make a gift to humanity. Due to its nature,....

I nearly fell off my chair laughing at this statement............."Ultra-wealthy.....gift to humanity"...........S**t, did you sit back and read your post at all? How could those words possibly belong in the same sentence? Oh...unless you happen to be Musky or Zuckerdick trolling...

toooooot · Thursday at 1:10 AM

The Talking Tech said:
Of course the U.S tech companies will try to find fault on them as that Chinese company showed them how they can have more efficient results with far less money.

That assuming they told the truth. If I know anything about communist or post-communist countries, it is that they are built on lies. Lies are not seen as a negative thing there, lies are woven deep into the fabric of society.
This is my biggest problem reading news from China and specially seeing people assuming 100% of what they say is true.
I do not know what it is specifically what makes countries that tasted communism to rely on lies so much, but I have studied how every bit of USSR was built around lies. Lies about harvests. Lies about population, I recall Stalin killed people who presented population growth numbers that were lower than he expected (the next executives of course showed tremendous population growth). They lied about everything like if it was their air. And when at the end of 80s Gorbachev started to allow a bit of truth to be told, the entire empire started melting like ice-cream on a summer day.

I know some people will take what I said wrong. But what I said comes from observing a communistic country not drastically different to what China looked like before it started to work with the West.
They both had mass starvation, concentration camps, 0 freedoms, bloody suppressions of protests. And a lot of lies.

Taking everything that comes from China as 100% truth is like sending sick Brad Pit money to help pay for treatment.

kiwigraeme · Thursday at 4:18 AM

Really interesting now as cat is out of the bag
Deepseek R1 is open source - so every man and his horse can download it

people already have , they can rip out censorship etc attract FBI attention etc

Now every country can have AI cheaply - take open source models and improve them
So who will will big expensive Meta, Google, MS, Apple or the WWW with open source LLM

This is much bigger than people think - I imagine individuals and collectives can now build purpose built LLM for $3000 to $300 000

Don't like costs of big guys , want less restrictions.

To my TS readers - another money making opportunity
Start a KS to built an model for X purpose

Not sure how open source game playing Models are - start a KS to play the most common board games , with strech goals of Catan, Dominion etc etc
Start a Role playing KS LLM with a Dungeon Master etc

LemmingO · Thursday at 4:19 AM

Lots of Czars in the US these days...

LemmingO · Thursday at 4:22 AM

themastergoose said:
Did I just read that OpenAI is unhappy their stolen data to train AI has been stolen by another AI?

Never thought I’d ever be typing that in my lifetime.

Hey! It may be stolen, but it's stolen by *American* companies! LOL

themastergoose · Thursday at 6:20 AM

LemmingO said:
Hey! It may be stolen, but it's stolen by *American* companies! LOL

Oh, well that’s alright then! /s

Microsoft is investigating if DeepSeek illegally accessed OpenAI data to train its AI model

Posts: 577 +3

Posts: 193 +333

Posts: 802 +1,036

Posts: 202 +278

Posts: 114 +75

Posts: 242 +504

Posts: 196 +394

Posts: 133 +232

Posts: 560 +1,158

Posts: 614 +1,182

Posts: 135 +349

Posts: 707 +494

Posts: 115 +114

Posts: 2,395 +2,225

Posts: 585 +864

Posts: 76 +63

Posts: 156 +100

Posts: 3,634 +2,185

Posts: 2,531 +1,803

Posts: 76 +74

Posts: 76 +74

Posts: 135 +349

Similar threads