Distributed Checkpoint Failures on 2× RTX 4090 for Local LLM Pretraining
Technical note on how FSDP checkpoint saves reproducibly deadlock NCCL on consumer GPUs without NVLink. 26 WandB runs documenting the boundary between what works and what fails.
pretraining distributed-training nvidia rtx-4090