What's the AI Buzz

Last week, the AI world was shocked with the release of DeepSeek R1. This caught the community’s attention for several reasons:

  • This wasn’t a model released from one of the big, well-funded tech companies, but rather a startup out of Hangzhou, China.
  • For certain tasks in math, coding, and complex reasoning, the model produces similar results to OpenAI’s o1 reasoning model.
  • Reinforcement learning (RL) was used during post-training as an alternative to supervised fine-tuning with impressive results, pointing to the possibility of a more scalable RL flywheel.
  • And to top it all off, the model weights were released online and are available under an MIT license, going even further than Meta with its transparency.

Wow.

Creating a frontier reasoning model rivaling OpenAI, while on a shoestring budget (relatively speaking), was deemed out of reach for all but the largest tech companies. But this all came from a company called DeepSeek which, in 2023, was born out of a Chinese quantitative hedge fund named High-Flyer.

The researchers at DeepSeek had far fewer GPUs than the large US tech companies, so they focused their attention on how to make the model architecture more efficient for training. As they say, necessity is the mother of invention. The team managed to achieve state-of-the-art performance using roughly ten times less training compute than a contemporary Llama model.

DeepSeek R1’s performance is comparable to OpenAI’s o1 reasoning model across a range of tasks. For example, DeepSeek R1 scored a 79.8% on the 2024 AIME mathematics benchmark. It was only a few months ago that OpenAI o1 reached a similar score on that benchmark, which had been a major jump in performance from previous models. DeepSeek attained similar performance to OpenAI’s o1 model on other benchmarks for coding and complex reasoning tasks.

Another notable result is that DeepSeek R1 leveraged reinforcement learning (RL) during its post-training regime. The more typical approach is to use supervised fine-tuning, which relies on manually curated datasets that are time-consuming and expensive to produce. RL, on the other hand, is a type of machine learning where an agent learns to make decisions by interacting with its environment, aiming to maximize a reward signal. The team reported that by purely using a reward signal, the model was able to automatically develop useful capabilities that, until now, have been manually programmed, such as chain-of-thought (CoT) reasoning, self-verification and reflection. This is an exciting finding as it can lead to more scalable and cost effective techniques for developing reasoning models.

All this sounds amazing, right? Well, add to it that DeepSeek R1 is fully open source and available for anyone in the world to build upon. You can find the model weights available at HuggingFace. DeepSeek itself charges a small fraction of what OpenAI o1 costs for use of its APIs. This large reduction in costs could potentially drive closed-source model providers to reduce costs in turn.

Whether DeepSeek can sustain improvements in the race of rapid AI development will be another story, but in some ways, that’s not the point. They’ve demonstrated that less capital is required to reach the forefront of today’s best AI models. They’ve also shown open-source will continue to be an important driver of innovation and democratization of AI for all. What an exciting time we’re in!

Implications

The release of DeepSeek R1 has implications for our own work at Embed Security, where we’re building an Agentic Security platform. We’ve been incorporating RL into our process of aligning AI reasoning with security analyst intuition. With teams like DeepSeek sharing their research openly, this opens new lines of investigation for creating reliable and useful agents within the cybersecurity domain… something that benefits our customers and the whole community.

Ready to explore the world of Agentic Security? Reach out to us to learn more about how Embed Security can help.