Chinese AI research lab DeepSeek grabbed global attention last week with the release of its open-source AI model, DeepSeek-R1. The company says the model rivals industry giants like OpenAI in critical areas such as mathematical reasoning, code generation, and cost efficiency, signalling a shift in the global AI landscape.
What is Deepseek?
DeepSeek is an artificial intelligence research lab which emerged from Fire-Flyer, a deep-learning branch of High-Flyer, a Chinese quantitative hedge fund. Established in 2015, High-Flyer gained prominence by leveraging advanced computing to analyse financial data. By 2023, its founder, Liang Wenfeng, redirected resources towards creating DeepSeek, aspiring to develop groundbreaking AI models.
Unlike most Chinese AI firms, DeepSeek operates independently of major tech giants such as Baidu and Alibaba. Liang’s motivation for this ambitious venture was rooted in scientific curiosity rather than immediate financial returns. “Basic science research rarely offers high returns on investment,” he remarked.
What is DeepSeek-R1?
DeepSeek-R1 is an advanced reasoning model that claims to surpass existing benchmarks on several critical tasks. The model and its variants, such as DeepSeek-R1-Zero, employ large-scale reinforcement learning (RL) techniques and multi-stage training to achieve their capabilities.
DeepSeek has also taken a notable step by open-sourcing not just its flagship models but also six smaller distilled variants, ranging from 1.5 billion to 70 billion parameters. These models are MIT-licensed, enabling researchers and developers to freely distil, fine-tune, and commercialise their work.
DeepSeek vs OpenAI: Is there a difference?
Both Open AI and Deepseek have leveraged AI to create their own large language models (LLM). However, unlike traditional models that depend on supervised fine-tuning, DeepSeek-R1-Zero claims to have emerged with robust reasoning abilities after training solely with RL. However, to enhance readability and address language inconsistencies, DeepSeek introduced DeepSeek-R1, which matches OpenAI’s o1 model in performance on reasoning tasks.
DeepSeek also advanced technical designs such as multi-head latent attention (MLA) and a mixture of experts, which made its models more cost-effective. The latest DeepSeek model required just one-tenth of the computing power used by Meta’s comparable Llama 3.1 model, according to a report by Epoch AI.
Who are the people behind Deepseek?
Liang Wenfeng, born in 1985, is a Chinese entrepreneur and the founder and CEO of DeepSeek. He is also the co-founder of the quantitative hedge fund High-Flyer. Liang's educational background includes a Bachelor of Engineering in electronic information engineering and a Master of Engineering in information and communication engineering from Zhejiang University.
In 2016, he co-founded the quantitative investment firm Ningbo High-Flyer, which utilised mathematics and AI for investment strategies. Liang expanded his focus on AI by founding High-Flyer AI in 2019, which specialised in AI algorithms and applications. Through DeepSeek, Liang has positioned himself at the forefront of AI research.
Young talents driving AI leadership at DeepSeek
DeepSeek’s workforce is composed of fresh graduates from prestigious Chinese universities like Peking University and Tsinghua University, Founder Liang told Chinese publication 36Kr in 2023. Despite their lack of industry experience, these researchers bring a wealth of academic expertise and a collaborative mindset, which Liang considers ideal for tackling high-investment, low-profit challenges.
Young researchers at DeepSeek view their work as a way to overcome global technological barriers and elevate China’s status as a leader in innovation, Liang said.
Overcoming US chip restrictions with DeepSeek
DeepSeek’s accomplishment is particularly noteworthy given the constraints posed by the ongoing tech competition between the US and China.
In October 2022, the US government imposed export controls aimed at limiting Chinese AI firms’ access to advanced computing hardware, including Nvidia’s H100 chips. While DeepSeek began with a stockpile of 10,000 H100s, it quickly became apparent that more would be needed to compete with global leaders such as OpenAI and Meta.
DeepSeek’s Founder Liang explained in a 2023 interview with 36Kr, “The problem we are facing has never been funding, but the export control on advanced chips.”
With limited access to advanced chips due to export restrictions, Chinese tech firms have often prioritised application-focused development rather than foundational AI research. DeepSeek, however, defied this trend by rethinking AI’s underlying architecture and optimising resource efficiency, as pointed out by a report by the Wired.
Efficient strategies powering DeepSeek’s AI
Highlighting the significance of this, a tech industry analyst told the Wired, “DeepSeek represents a new wave of Chinese companies focused on long-term innovation over short-term gains.”
To overcome the limitations, DeepSeek adopted a range of efficiency-focused strategies to refine its model architecture. By incorporating engineering techniques, the company managed to reduce resource requirements without compromising performance. These innovations included:
- Custom communication schemes: Improved data exchange between chips to save memory.
- Memory optimisation: Reduced field sizes to maximise efficiency.
- Mix-of-models approach: A unique way of combining smaller models to achieve superior results.
DeepSeek’s global impact on AI research
By open-sourcing its models under an MIT licence and sharing its breakthroughs, DeepSeek has gained significant recognition within the global AI research community. By providing access to model weights and outputs, the company aims to empower developers worldwide to build on its technology. This move not only democratises access to cutting-edge AI tools but also challenges the dominance of Western firms in the artificial intelligence space.