How DeepSeek is able to compete with Ai startups like Open Ai.
My previous post has been deleted by the moderators of artificial intelligence for undisclosed reasons after it went viral on reddit about @deepseek_ai and got around 700k views and 3.3k likes in 3 days.
https://www.reddit.com/r/ArtificialInteligence/s/8KJ1VEDBhe
Im uploading it here for the people that care
So here is take two. With added information.
In this post i try to explain everything i found in or around deepseek.
So why take 2.
People really seemed to care about what is currently going on in the world of AI and how that results in potential issues: privacy, economy and social change’s etc.
I care about freedom for the people, authenticity and transparency. Everyone deserves an amazing life no matter your economical/sociological background.
I’m going to try my best to give you a logical approach to what is going on by providing you data based on my own inside and what I found through sources. Feel free to educate me, since i'm just a person that cares about this.
Look. I'm by no means an expert, but having my own start up in ai and knowing a bit about that world i could maybe try to give a more clear view on wtf is happening.
getting started:
Why are Ai companies worth so much money? And what did this do to the economy?
Start ups, especially in Ai love to play in on the “potential” of ai since they damn well now this is the time to rip off as much money from VC’s.
Why? well.. VC’s are specialized in managing investment and returning it back to shareholders. But most of them are awful at understanding market trends and making accurate future predictions ( unless specialized in a niche ).
Just so that you know around 50% of VC’s fail to make ROI with only 10/20% of them making significant returns. This is the average. in the sector of Ai, where approx 90% of ai startups fail in their first year you can pretty much see for yourself that this is an extremely big and unpredictable gamble for VC’s.
Ai startups just make use of this since so many people are uneducated about Ai. The public eye thinks Ai is something extraterrestrial that suddenly out of nowhere is going to take your job and make you useless.
So why did deepseek shake things up? well they woke up the west by giving them back some of their senses.
As much as I love the west and the freedom it provides. There have been a lot of questionable moves being made by unicorn status companies. Open Ai for example, who used to be a non profit AI research lab that turned into the fastest growing unicorn status startup is an example of it. This can raise a lot of questions about morals and ethics and how this was done.
Below i will try to explain how deepseek is positioned compared to open ai to give you a comparison and to show to you that even as a smaller company its totally possible to compete.
How on earth can deepseek deliver this for such low costs?
So.. this is a hard one. from what is publically available the narrative seems to be driven towards the possibility of deepseek having a gpu cluster count of around 50.000 H800 GPU'S. this was publically stated by the ceo of scale Ai during an interview. that said that they have way more access to GPU’s than the public eye might think.
Another source seems to estimate that the mother company of DeepSeek ( High Flyer) has approx between 10.000 to 50.000 H100 GPU'S. Another source mentioned that they have two ai super computers available ( firefly I and firefly II ) I think the estimated count of gpu's seems realistic besides the H100 count. Why?
H100 chips are not legally available in china ( they used to be until 2019 ), so for the Chinese market they have the H800 as an alternative. The issue? the H800 scores about 50% lower in performance. so if we would go with sources. the performance of the deepseek cluster could come down to around 25.000 H100 GPU'S.
This could make sense, since open ai trained gpt o1 on approx 25.000 H100 GPU'S. Where it gets weird is that open ai trained o1 on a cost of 100/150 million. Makes me wonder where on earth all that money went. Operation costs are most likely way higher for open ai, due to them being the first and the ones on the forefront of this innovation.
According to sources: deepseek v3, their previous flagship model was trained on 2.048 H800 GPU's for around 5.5 million with its model size being 685B. This was confirmed by deepseek themselves. which later stated the following comment: DeepSeek CEO Liang Wenfeng said, "Money has never been the problem for us; bans on shipments of advanced chips are the problem." which refers to them not being able to use the more powerful H100 GPU due to restrictions and policies. With Deepseek R1 being 671B i would estimate training costs to be in a similar range. there are some rumors deepseek models have been trained on outputs of ai models like gpt 4, o1, llama 3.3 and sonnet. Basically by reverse engineering these models. This has not been confirmed yet. But plenty of use cases have been found online of deepseek r1 and v3 thinking it was a model by open ai. I don’t know about the laws surrounding this, but it seems like legally there might be some issues around that.
Funding wise
Deepseek received an investment of 50 million. if they in fact can use the gpu clusters from their mother company High Flyer ( which seems reasonable ) the calculations related to their funding for these model productions could make a bit more sense.
What is up for question though is the investment for gpu’s..
The cost of H800 gpu’s vary between 17,500 and 75,000 dollars depending on state of cards and the bulk of investment.
so at the lowest we’re looking at an investment that wasn’t taken into consideration of 875 million to around 3,5 billion if you go by the full 50.000 H800 GPU’s.
If we only look at training costs and their previous training statements. it could become more realistic if we talk about around 2000 GPU’s. There is also a possibility that the models weren’t trained on clusters they own. But this seems not reasonable.
if they actually own these gpu’s is still for debate. Either way more pre-investment is definitely involved. Either by more resources of High Fyler or another party.
Where they could be reducing a lot of potential costs.
infrastructure costs.
DeepSeek is located in Hangzhou which is around an 1h away per fastest travel possibility to the nearest factory in taiwan where all these chips/gpu's are being produced ( NVIDIA, AMD ). Having factories very close to where operations are being held could cut down costs tremendously since import and exports costs on goods oversees is where a lot of money is lost. especially due to import laws in united states. Rumor has it that open ai wants to produce their own chips to combat these costs and also due to the rising tensions with china ( tiktok ban etc ), apple is another example. that went fully in on their own silicon chips in november 2020 to cut down prices and have more oversee over operations.
As of energy costs for running these clusters, i did a calculation comparing Hangzhou to Texas. This didn't made much of a difference. so i won’t see this as a valid option.
What is up for debate is the fact that the Ai industry currently uses around 20% of the world’s energy output and is being used for everything in and around Ai.
Software efficiency and team:
With CEO Liang Wenfeng having over 20 years of experience: former ai researcher, quantitative trader and co-founder of high flyer which manages around 8 billion dollars worth of quantitative investment it seems to be going into the direction that he knows quite a bit about ai and machine learning.
We don’t know much about other team members so here is my personal take on the potential of the team:
There are plenty of extremely talented individuals around the world, especially in China that could do the job perfectly well. Typically VC’s want people with high credibility ( c suits or studied at certain universities ) and years of experience. This just lowers the risks for investments for VC’s. But those people typically ask for a very high salary. Since DeepSeek is funded by their own mother company and the same CEO / founder this isn't an issue.
Most likely due to his network he has the ability to attract top tier talent. Next to that as someone who has been around start ups for some time now. You don’t need multi million dollar teams of people. You just need people with raw intelligence and will power that share the same goal as the Start up facilitating them.
Some sources mentioned talent that previously worked at: Open Ai, Google brain, Microsoft Research and top tier Chinese universities known for Ai like Tsinghua and Peking University.
Big if true, the team would look competent and competitive.
sources state that the following software optimizations where used for training models:
MoE ( mixture of experts): similar to META’s llama.
MTP ( multi token prediction )
FP8 ( mixed precision training): uses 8 bit vs 16 or 32 to reduce memory usage.
distillation: used to create smaller and more efficient models that perform similar to larger ai models.
How are deepseeks distribution costs so low?
This is where things are getting weird. like really weird. looking at the artificialanalysis.ai leaderboard you can clearly see the massive difference in api costs between open ai and deepseek.
in short: per million tokens
o1 =
- blended token costs = 26,25
- input costs = 15,00
- output costs = 60,00
R1 =
- blended token costs = 2,00
- input costs = 2,00
- output costs = 2,50
so either o1 has a lot of investment they have to cover with their costs which i personally think can be true. They raised more than 17,9 billion in total and runs far more than just consumer products like models. They also actively build infrastructures for companies, and even governments. Compared to other models that are non open ai the only model with high output tokens cost is claude sonnet 3.5 with 15,00.
R1 is not the only model with extremely low api costs, gemini flash and pro are well known to have extremely low api costs and still deliver pretty good performance. And this was delivered by google, So just so that you know this has nothing to do with china having some sort of secret super efficient ai computer. Ali baba models like qwen 2.5 70b have output costs of 0,75 for example. similar priced to Phi by microsoft and llama 3.3 by meta
so i would say that R1 their distribution cost can definitely be realistic. Open Ai on the other hand. Well.. I have no clue what is going on there.
Privacy issues: I personally don’t think mentioning this makes a lot of sense. People always have this media driven narrative in their minds that everything out of china is either cheap, produced by children or filled with malware and ways of collecting your data.
Here is my take.
Deepseek is definitely politically biased in its answers. It doesn’t answer certain questions and next to that states very clearly what data it uses from users. R1 has real time thinking. So you can literally see where biases can be found and how this model has been partially prompt engineered by its development. Next to that it is open source so consumers or other businesses can download these models and run them locally, fine tune them on their own data and so on.
I personally recommend anyone to not base their political or any kind of opinion on media narrative. Don’t trust my word on it or my opinion. Do your own research and be a free thinker. Doing that will make you realize that even your own government and companies you love do a lot of questionable stuff they never mention to you.
Conclusion.
Writing this made me realize that deepseek is a respectable ai startup that has delivered amazing results so far compared to other ai giants. This is my opinion until more info is shared of course.
What is happening with Open Ai is up for the debate and definitely questionable. Prices of models have only become more expensive over time and performance on the other hand have been questionable.
People can say stuff like 20,- subscriptions are reasonably priced and I would agree, but if you look beyond the surface I think this company might be suffering harder than people think. Which is a shame, they made Ai what it is today and I hope to see it going back to normal.
I'm looking forward to the healthy competition and collaborations in the fields of Ai. I hope to see a lot of cool stuff being made now and a lot more smaller start ups around that challenge tech giants in their own fields.
Again, power to the people and let's hope that open sourced Artificial Intelligence will result in the quality of life every individual on this planet deserves :)