Innovations of DeepSeek
V3-Base
- 671B decoder-only transformer
- Self-supervised pre-training: No label, train on first half of sentence to predict the second half
R1-Zero
- Trained on v3-base using reinforcement learning:
- Supervised fine tuning (SFT)
- Reinforcement Learning from Human Feedback (RLHF)
R1
- Use R1-zero to come up with good CoT
- Use those CoT to fine-tune V3-Base, Then train with SFT to get R1
Llama/Qwen Distilled
Use R1 to come up with 800k samples, then use those to train other open-source models: Meta & Qwen
Other techniques
- MoE system
- Multihead latent Attention (MLA)
- Distilling using O1
- Low-level programming GPU cores
- Group Relative Policy Optimization (GRPO)
Implementing DeepSeek Locally
OLlama
This is the easier way.
Simply go to Ollama, and select a model to download.
-
7b which takes 4.7GB of local disk space actually works pretty well. When you serve it locally, it takes only 600Mb of RAM on my M1 Mac
-
14b which takes 9.0GB is also a feasible choice
Then locally run
ollama run deepseek-r1:7b
A good UI is Page Assist, which you can install as an extention on your chrome. Then simply open the extention and you get a nice chat box to talk to your own 7b model locally. 7b is actually way smarter than Llama 70b in my opinion
VLLM
This is the official supported option of DeepSeek, but requires more setup.
You can checkout my setup here
This allows you to serve your local model on your Linux HPC, customizing the number of GPUs to use, quantized datatype( bfloat16), model length etc, e.g.
export CUDA_VISIBLE_DEVICES=1,2
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
--dtype bfloat16 \
--tensor-parallel-size 2 \
--max-model-len 4096
The disadvantage is you don’t get a nice UI extension like Page Assistant. Instead, you can prompt it in terminal like this:
curl http://localhost:8000/v1/completions \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
"prompt": "蔡徐坤擅长唱跳和什么?",
"temperature": 0.6,
"max_tokens": 512
}'
And receive its response also in terminal like this:
"id":"cmpl-78xxxxx","object":"text_completion","created":1738xxxxx,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-14B","choices":[{"index":0,"text":" 蔡徐坤擅长唱跳和什么?\n好,用户问蔡徐坤擅长唱跳和什么。首先,我需要确认蔡徐坤的主要才能。他确实是唱跳歌手,所以唱跳肯定是他的强项。接下来,他还有什么特别的技能呢?\n\n我记得他参加过《偶像练习生》节目,表现出色,所以可能还有其他的才能。比如,他有没有特别擅长的舞蹈风格?或者他在音乐创作方面有贡献?\n\n另外,蔡徐坤可能在其他领域也有涉猎,比如时尚或者主持。这些可能也是他的强项。需要详细列举出来,让用户全面了解。\n\n所以,整理一下,我会回答说他唱跳rap篮球\n</think>\n\n蔡徐坤擅长唱跳rap篮球,练习两年半。","logprobs":null,"finish_reason":"stop","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":10,"total_tokens":295,"completion_tokens":285,"prompt_tokens_details":null}}