DeepSeek-V4 Official Release: Million-Token Context, Hybrid Attention, and a New SOTA for Open Models

Fanch AIon 22 days ago

DeepSeek-V4 hybrid attention architecture million context infographic cover

Today marks the highly anticipated DeepSeek-V4 official release. With the DeepSeek-V4 official release, the team has delivered a monumental leap in AI capabilities, specifically by mastering the million-token context. By introducing a revolutionary hybrid attention architecture, this update establishes a definitive new SOTA for open models.

Through a profound reconstruction of its underlying systems, the DeepSeek-V4 official release successfully breaks the efficiency barrier of ultra-long processing. Here are the core highlights of how its hybrid attention architecture and million-token context redefine the new SOTA for open models.

1. The Core Model Matrix: Built for a Million-Token Context

The DeepSeek-V4 official release preview includes two powerful Mixture-of-Experts (MoE) language models, both natively supporting a million-token context:

DeepSeek-V4-Pro: Features 1.6T (trillion) total parameters, with 49B activated per token, easily handling a million-token context.
DeepSeek-V4-Flash: Features 284B total parameters, with 13B activated per token, making the million-token context highly accessible.

2. Breakthrough: The Hybrid Attention Architecture

To achieve extreme efficiency under a million-token context, the DeepSeek-V4 official release introduces three key innovations, led by its hybrid attention architecture:

Hybrid Attention Architecture: This hybrid attention architecture combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), significantly boosting computational efficiency for long contexts.
Manifold-Constrained Hyper-Connections (mHC): Enhances traditional residual connections alongside the hybrid attention architecture.
Muon Optimizer: Introduced during training to achieve faster convergence and greater training stability.

3. Efficiency Powered by the Hybrid Attention Architecture

DeepSeek-V4 CSA and HCA efficiency comparison diagram against traditional architecture

Processing a million-token context requires immense power, but the DeepSeek-V4 official release demonstrates an incredible squeeze on compute and memory:

Powered by the hybrid attention architecture, DeepSeek-V4-Pro requires only 27% of the single-token inference FLOPs compared to V3.2.
Under the same million-token context setting, its KV Cache requirement is a mere 10% of V3.2's.
The lighter DeepSeek-V4-Flash utilizes the hybrid attention architecture to push efficiency even further.

4. Benchmarks: A New SOTA for Open Models

The DeepSeek-V4 official release was pre-trained on up to 32T high-quality tokens. In the post-training phase, On-Policy Distillation (OPD) was utilized to secure this new SOTA for open models.

DeepSeek-V4-Pro-Max redefines the new SOTA for open models, comprehensively outperforming its predecessors in core tasks.
In competitive programming, this new SOTA for open models ranks 23rd on the Codeforces human leaderboard.
In the Putnam-2025 formal mathematical reasoning test, the DeepSeek-V4 official release achieved a perfect proof score of 120/120.

5. Tool Calling Upgrades Designed for Agentic AI

$DeepSeek-V4 agentic AI performance in coding, math, and tool calling$

Beyond the million-token context, the DeepSeek-V4 official release upgrades its tool-calling mechanisms to solidify its position as the new SOTA for open models in Agent workflows:

Introduced a new XML-format tool-calling schema (based on a special <|DSML|> token), effectively mitigating escaping failures and reducing tool-call errors.
Adopted an