Microsoft Cloud is now hosting DeepSeek AI model, even though its suspected of illegal data abuse

Cal Jeffrey

Posts: 4,354   +1,508
Staff member
A hot potato: Microsoft is raising eyebrows after announcing that it will host DeepSeek R1 on its Azure cloud service. The decision comes just days after OpenAI accused DeepSeek of violating its terms of service by allegedly using ChatGPT outputs to train its system, allegations Microsoft is currently investigating.

DeepSeek R1 began making waves in the AI world when it launched last week. Chinese developer DeepSeek touted it as a freely available simulated reasoning model that rivals OpenAI's o1 in performance but at a fraction of the training cost. While OpenAI has priced its o1 model at $60 per million output tokens, DeepSeek lists R1 at just $2.19 per million – a remarkable contrast that sunk stock for AI-adjacent companies like Nvidia.

Microsoft's decision to host R1 on Azure is not too unusual on its surface. The tech giant already offers over 1,800 AI models through its Azure AI Foundry, giving developers access to a variety of AI systems for experimentation and integration.

Microsoft doesn't discriminate since it profits from any AI platform operating on its cloud infrastructure. However, the decision seems ironic since OpenAI (a company Microsoft has invested in and has partnered with) has spent the last week aggressively criticizing the model for distilling ChatGPT outputs.

OpenAI claims the AI startup violated its terms of service by using "distillation," as reported by Fox News. Distillation is when developers train an AI model using outputs from a more advanced system. Suspicions arose after users discovered that an earlier model, DeepSeek V3, sometimes referred to itself as "ChatGPT," suggesting that DeepSeek used OpenAI-generated data to fine-tune its system.

The move also seems somewhat hypocritical, considering Microsoft security researchers reportedly launched an ethics probe into DeepSeek, on Wednesday. Anonymous sources claim that the investigation focuses on whether DeepSeek extracted substantial amounts of data through OpenAI's API during the fall of 2024.

Despite the frustrations with DeepSeek, OpenAI CEO Sam Altman has publicly welcomed the competition. In a tweet on Monday, Altman acknowledged R1's cost efficiency, calling it "an impressive model" but vowing that OpenAI would soon deliver "much better results." Analysts expect the company may release a new model, o3-mini, as early as today.

OpenAI's outcry over DeepSeek's data practices is notable given its own history of alleged data abuse. The New York Times has filed a lawsuit against OpenAI and Microsoft, accusing them of using copyrighted journalism without permission. OpenAI has also struck deals with publishers and online communities – such as The associated Press and others – to access user-generated data for training.

The whole situation exposes the AI industry's hypocritical relationship with data ownership. Investment firm Andreessen Horowitz, another Open AI investor, argued in a 2023 legal filing that training AI models should not be considered copyright infringement, as they merely "extract information" from existing works. If OpenAI truly believes in that principle, then DeepSeek is just playing by the same rules.

The current landscape of the AI industry is more or less a free-for-all. We have no laws on the books to govern AI directly, and those laws that affect it indirectly, like copyright and trade laws, are twisted into a favorable interpretation by the AI firms that are breaking them.

Permalink to story:

 
"If OpenAI truly believes in that principle, then DeepSeek is just playing by the same rules."

No... OpenAI is claiming that DeepSeek illegally "hacked" their PRIVATE information against their TOS. This information was NOT available to the public and is NOT the same thing.

Of course, they'll have to prove that's true - and since OpenAI's "Private" information comes from information that is public, it will be difficult to prove that DeepSeek didn't get that info from public sources.

As for "choosing" to host DeepSeek - it's not really a choice. They would have to have a very good (and provable) reason not to host them - and they clearly don't have that... yet...
 
Would be interesting to see the minimum specs to download and run deepseek R1
I mean you could trim a lot of stuff off - eg complete languages that aren't common
 
I own some Microsoft stock via a few mutual funds. I don't have a problem with this. Maybe DeepSeek will become a Chinese Microsoft joint venture. Microsoft still own 49% of OpenAI. Sam Altman has sold his soul and is a Judas. Larry Ellison is his latest suitor alongside Son of Softbank. Plus he is accused of anally raping his sister. True story. Microsoft are working with Musk on xAI. They bought Inflection and are supporting it. They have Copilot. Apple have nothing. Google Gemini is so lousy I don't bother using it on my Pixel 9 Pro XL in spite of Google giving me 2 years of Gemini Pro gratis. DeepSeek rocks. And is as good as Perplexity.
 
I feel obliged to repeat my comment from yesterday's story...

OpenAI discovering what it feels like to have your data taken without permission is like a pirate getting mad that someone else is looting their treasure.

The irony aside, this raises an important question: If OpenAI is upset about this, does that mean they do believe there should be some kind of ownership or protection over training data? Because that’s not the argument they’ve been making in court.
 
"The current landscape of the AI industry is more or less a free-for-all. We have no laws on the books to govern AI directly, and those laws that affect it indirectly, like copyright and trade laws, are twisted into a favorable interpretation by the AI firms that are breaking them."

Well, of course there are no effective laws. Politicians are more concerned with lining their pockets from the likes of billionaire 'techbros'. That, and basically too dumb to foresee the looming problems with AI. I'm surprised most of them can figure out how to wipe their own arses.
 
"The current landscape of the AI industry is more or less a free-for-all. We have no laws on the books to govern AI directly, and those laws that affect it indirectly, like copyright and trade laws, are twisted into a favorable interpretation by the AI firms that are breaking them."

Well, of course there are no effective laws. Politicians are more concerned with lining their pockets from the likes of billionaire 'techbros'. That, and basically too dumb to foresee the looming problems with AI. I'm surprised most of them can figure out how to wipe their own arses.
Having local laws regarding copyright and AI would probably create an advantage for other countries who could benefit by not hindering the AI development. And then you wonder how that model is so much better and so much cheaper.
 
This just adds to M$'s already present abuse of data. It's just what they've (ostensibly) and particularly, ol' "Satan" Nadella, have been looking for.
 
Last edited:
No open AI model gathered its data "ethically". Save it and stuff it. Its hilarious a company scrubbing the internet and taking our information without our approval is upset someone took their information.

Boohoo.
 
Microsoft in bed with the CCP? The Chinese ripped off something, they probably just hacked and stole the necessary data and code. Now they'll get millions to download and use their spyware - just want the CCP wanted.
 
Back