DeepSeek

164 posts

DeepSeek

@deepseek_ai

Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.

DeepSeek’s posts

DeepSeek

@deepseek_ai

Jan 28, 2025

To prevent any potential harm, we reiterate that

@deepseek_ai

is our sole official account on Twitter/X. Any accounts: - representing us - using identical avatars - using similar names are impersonations. Please stay vigilant to avoid being misled!

DeepSeek-R1 is here!

Performance on par with OpenAI-o1

Fully open-source model & technical report

MIT licensed: Distill & commercialize freely!

Website & API are live now! Try DeepThink at chat.deepseek.com today!

1/n

Day 0: Warming up for #OpenSourceWeek! We're a tiny team

@deepseek_ai

exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented,

Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience: • No system prompt • Temperature: 0.6 • Official prompts for search & file upload: bit.ly/4hyH8np • Guidelines to mitigate model bypass

Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection

With

Introducing DeepSeek-V3.1: our first step toward the agent era!

Hybrid inference: Think & Non-Think — one model, two modes

Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528

Stronger agent skills: Post-training boosts tool use and

DeepSeek

From chat.deepseek.com

DeepSeek

@deepseek_ai

Dec 26, 2024

Introducing DeepSeek-V3! Biggest leap forward yet:

60 tokens/second (3x faster than V2!)

Enhanced capabilities

API compatibility intact

Fully open-source models & papers

1/n

GIF

DeepSeek-V3-0324 is out now!

Major boost in reasoning performance

Stronger front-end development skills

Smarter tool-use capabilities

For non-complex reasoning tasks, we recommend using V3 — just turn off “DeepThink”

API usage remains unchanged

Models are

GIF

Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.

6.6 TiB/s aggregate read throughput in a 180-node cluster

3.66 TiB/min

GitHub - deepseek-ai/3FS: A high-performance distributed file system designed to address the...

Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.

BF16 support

Paged KV cache (block size 64)

3000 GB/s memory-bound & 580 TFLOPS

GitHub - deepseek-ai/FlashMLA: FlashMLA: Efficient Multi-head Latent Attention Kernels

DeepSeek-R1-0528 is here!

Improved benchmark performance

Enhanced front-end capabilities

Reduced hallucinations

Supports JSON output & function calling

Try it now: chat.deepseek.com

No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni

GIF

Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via:

Cross-node EP-powered batch scaling

Computation-communication overlap

Load balancing Statistics of DeepSeek's Online Service:

73.7k/14.8k

Day 2 of #OpenSourceWeek: DeepEP Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.

Efficient and optimized all-to-all communication

Both intranode and internode support with NVLink and RDMA

GitHub - deepseek-ai/DeepEP: DeepEP: an efficient expert-parallel communication library

Replying to

DeepSeek has not issued any cryptocurrency. Currently, there is only one official account on the Twitter platform. We will not contact anyone through other accounts.Please stay vigilant and guard against potential scams.

Off-Peak Discounts Alert! Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily:

DeepSeek-V3 at 50% off

DeepSeek-R1 at a massive 75% off Maximize your resources smarter — save more during these high-value hours!

Introducing DeepSeek-V3.2-Exp — our latest experimental model!

Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context.

Now live on App, Web, and API.

API prices cut by 50%+! 1/n

Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

Up to 1350+ FP8 TFLOPS on Hopper GPUs

No heavy dependency, as clean as a tutorial

Fully Just-In-Time compiled

GitHub - deepseek-ai/DeepGEMM: DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained...

From github.com