Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Avoid common PyTorch mistakes — train/eval mode, gradient leaks, device mismatches, and checkpoint gotchas.
Avoid common PyTorch mistakes — train/eval mode, gradient leaks, device mismatches, and checkpoint gotchas.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
model.train() enables dropout, BatchNorm updates — default after init model.eval() disables dropout, uses running stats — MUST call for inference Mode is sticky — train/eval persists until explicitly changed model.eval() doesn't disable gradients — still need torch.no_grad()
torch.no_grad() for inference — reduces memory, speeds up computation loss.backward() accumulates gradients — call optimizer.zero_grad() before backward zero_grad() placement matters — before forward pass, not after backward .detach() to stop gradient flow — prevents memory leak in logging
Model AND data must be on same device — model.to(device) and tensor.to(device) .cuda() vs .to('cuda') — both work, .to(device) more flexible CUDA tensors can't convert to numpy directly — .cpu().numpy() required torch.device('cuda' if torch.cuda.is_available() else 'cpu') — portable code
num_workers > 0 uses multiprocessing — Windows needs if __name__ == '__main__': pin_memory=True with CUDA — faster transfer to GPU Workers don't share state — random seeds differ per worker, set in worker_init_fn Large num_workers can cause memory issues — start with 2-4, increase if CPU-bound
torch.save(model.state_dict(), path) — recommended, saves only weights Loading: create model first, then model.load_state_dict(torch.load(path)) map_location for cross-device — torch.load(path, map_location='cpu') if saved on GPU Saving whole model pickles code path — breaks if code changes
In-place ops end with _ — tensor.add_(1) vs tensor.add(1) In-place on leaf variable breaks autograd — error about modified leaf In-place on intermediate can corrupt gradient — avoid in computation graph tensor.data bypasses autograd — legacy, prefer .detach() for safety
Accumulated tensors leak memory — .detach() logged metrics torch.cuda.empty_cache() releases cached memory — but doesn't fix leaks Delete references and call gc.collect() — before empty_cache if needed with torch.no_grad(): prevents graph storage — crucial for validation loop
BatchNorm with batch_size=1 fails in train mode — use eval mode or track_running_stats=False Loss function reduction default is 'mean' — may want 'sum' for gradient accumulation cross_entropy expects logits — not softmax output .item() to get Python scalar — .numpy() or [0] deprecated/error
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.