Tags
- AI Infra 2
- Alignment 1
- Angular Update 1
- Checkpoint 1
- Communication 1
- Dataset 1
- DDP 1
- Distributed 1
- Distributed Training 1
- Effective Learning Rate 1
- Export 1
- Femtotron 1
- FP8 1
- FSDP 1
- Hyperparameter Transfer 1
- Initialization 1
- KV Cache 2
- LLM 22
- LLM System 16
- LLM Theory 5
- MCore 3
- Meeting Notes 1
- Megatron 8
- MoE 1
- MuP 2
- NCCL 2
- Normalization 1
- Optimization 3
- PagedAttention 1
- PD Disaggregation 1
- Pipeline Parallel 3
- Pretraining 2
- Probability 1
- Ray 1
- Resharding 1
- RL 1
- RLHF 1
- Schedule 3
- Serving 2
- SGD 1
- SMD 2
- Training 12
- Training Framework 9
- Transformer Engine 2
- Weight 1
- Weight Decay 2
- 人生感悟 1
- 内存管理 1
- 分布式训练 1
- 哲学 1
- 哲学研究 1
- 基础知识速查 1
- 时间管理 1
- 维特根斯坦 1
- 职业发展 1
- 行业经验 1
- 读书笔记 1