# Send Peft Fine Tuning to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "peft",
    "name": "Peft Fine Tuning",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/Desperado991128/peft",
    "canonicalUrl": "https://clawhub.ai/Desperado991128/peft",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/peft",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=peft",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "SKILL.md",
      "references/advanced-usage.md",
      "references/troubleshooting.md"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "peft",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-03T01:59:02.398Z",
      "expiresAt": "2026-05-10T01:59:02.398Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=peft",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=peft",
        "contentDisposition": "attachment; filename=\"peft-0.1.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "peft"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/peft"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/peft",
    "downloadUrl": "https://openagent3.xyz/downloads/peft",
    "agentUrl": "https://openagent3.xyz/skills/peft/agent",
    "manifestUrl": "https://openagent3.xyz/skills/peft/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/peft/agent.md"
  }
}
```
## Documentation

### PEFT (Parameter-Efficient Fine-Tuning)

Fine-tune LLMs by training <1% of parameters using LoRA, QLoRA, and 25+ adapter methods.

### When to use PEFT

Use PEFT/LoRA when:

Fine-tuning 7B-70B models on consumer GPUs (RTX 4090, A100)
Need to train <1% parameters (6MB adapters vs 14GB full model)
Want fast iteration with multiple task-specific adapters
Deploying multiple fine-tuned variants from one base model

Use QLoRA (PEFT + quantization) when:

Fine-tuning 70B models on single 24GB GPU
Memory is the primary constraint
Can accept ~5% quality trade-off vs full fine-tuning

Use full fine-tuning instead when:

Training small models (<1B parameters)
Need maximum quality and have compute budget
Significant domain shift requires updating all weights

### Installation

# Basic installation
pip install peft

# With quantization support (recommended)
pip install peft bitsandbytes

# Full stack
pip install peft transformers accelerate bitsandbytes datasets

### LoRA fine-tuning (standard)

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType
from datasets import load_dataset

# Load base model
model_name = "meta-llama/Llama-3.1-8B"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# LoRA configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,                          # Rank (8-64, higher = more capacity)
    lora_alpha=32,                 # Scaling factor (typically 2*r)
    lora_dropout=0.05,             # Dropout for regularization
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],  # Attention layers
    bias="none"                    # Don't train biases
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 13,631,488 || all params: 8,043,307,008 || trainable%: 0.17%

# Prepare dataset
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

def tokenize(example):
    text = f"### Instruction:\\n{example['instruction']}\\n\\n### Response:\\n{example['response']}"
    return tokenizer(text, truncation=True, max_length=512, padding="max_length")

tokenized = dataset.map(tokenize, remove_columns=dataset.column_names)

# Training
training_args = TrainingArguments(
    output_dir="./lora-llama",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized,
    data_collator=lambda data: {"input_ids": torch.stack([f["input_ids"] for f in data]),
                                 "attention_mask": torch.stack([f["attention_mask"] for f in data]),
                                 "labels": torch.stack([f["input_ids"] for f in data])}
)

trainer.train()

# Save adapter only (6MB vs 16GB)
model.save_pretrained("./lora-llama-adapter")

### QLoRA fine-tuning (memory-efficient)

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",           # NormalFloat4 (best for LLMs)
    bnb_4bit_compute_dtype="bfloat16",   # Compute in bf16
    bnb_4bit_use_double_quant=True       # Nested quantization
)

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-70B",
    quantization_config=bnb_config,
    device_map="auto"
)

# Prepare for training (enables gradient checkpointing)
model = prepare_model_for_kbit_training(model)

# LoRA config for QLoRA
lora_config = LoraConfig(
    r=64,                              # Higher rank for 70B
    lora_alpha=128,
    lora_dropout=0.1,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
# 70B model now fits on single 24GB GPU!

### Rank (r) - capacity vs efficiency

RankTrainable ParamsMemoryQualityUse Case4~3MMinimalLowerSimple tasks, prototyping8~7MLowGoodRecommended starting point16~14MMediumBetterGeneral fine-tuning32~27MHigherHighComplex tasks64~54MHighHighestDomain adaptation, 70B models

### Alpha (lora_alpha) - scaling factor

# Rule of thumb: alpha = 2 * rank
LoraConfig(r=16, lora_alpha=32)  # Standard
LoraConfig(r=16, lora_alpha=16)  # Conservative (lower learning rate effect)
LoraConfig(r=16, lora_alpha=64)  # Aggressive (higher learning rate effect)

### Target modules by architecture

# Llama / Mistral / Qwen
target_modules = ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

# GPT-2 / GPT-Neo
target_modules = ["c_attn", "c_proj", "c_fc"]

# Falcon
target_modules = ["query_key_value", "dense", "dense_h_to_4h", "dense_4h_to_h"]

# BLOOM
target_modules = ["query_key_value", "dense", "dense_h_to_4h", "dense_4h_to_h"]

# Auto-detect all linear layers
target_modules = "all-linear"  # PEFT 0.6.0+

### Load trained adapter

from peft import PeftModel, AutoPeftModelForCausalLM
from transformers import AutoModelForCausalLM

# Option 1: Load with PeftModel
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
model = PeftModel.from_pretrained(base_model, "./lora-llama-adapter")

# Option 2: Load directly (recommended)
model = AutoPeftModelForCausalLM.from_pretrained(
    "./lora-llama-adapter",
    device_map="auto"
)

### Merge adapter into base model

# Merge for deployment (no adapter overhead)
merged_model = model.merge_and_unload()

# Save merged model
merged_model.save_pretrained("./llama-merged")
tokenizer.save_pretrained("./llama-merged")

# Push to Hub
merged_model.push_to_hub("username/llama-finetuned")

### Multi-adapter serving

from peft import PeftModel

# Load base with first adapter
model = AutoPeftModelForCausalLM.from_pretrained("./adapter-task1")

# Load additional adapters
model.load_adapter("./adapter-task2", adapter_name="task2")
model.load_adapter("./adapter-task3", adapter_name="task3")

# Switch between adapters at runtime
model.set_adapter("task1")  # Use task1 adapter
output1 = model.generate(**inputs)

model.set_adapter("task2")  # Switch to task2
output2 = model.generate(**inputs)

# Disable adapters (use base model)
with model.disable_adapter():
    base_output = model.generate(**inputs)

### PEFT methods comparison

MethodTrainable %MemorySpeedBest ForLoRA0.1-1%LowFastGeneral fine-tuningQLoRA0.1-1%Very LowMediumMemory-constrainedAdaLoRA0.1-1%LowMediumAutomatic rank selectionIA30.01%MinimalFastestFew-shot adaptationPrefix Tuning0.1%LowMediumGeneration controlPrompt Tuning0.001%MinimalFastSimple task adaptationP-Tuning v20.1%LowMediumNLU tasks

### IA3 (minimal parameters)

from peft import IA3Config

ia3_config = IA3Config(
    target_modules=["q_proj", "v_proj", "k_proj", "down_proj"],
    feedforward_modules=["down_proj"]
)
model = get_peft_model(model, ia3_config)
# Trains only 0.01% of parameters!

### Prefix Tuning

from peft import PrefixTuningConfig

prefix_config = PrefixTuningConfig(
    task_type="CAUSAL_LM",
    num_virtual_tokens=20,      # Prepended tokens
    prefix_projection=True       # Use MLP projection
)
model = get_peft_model(model, prefix_config)

### With TRL (SFTTrainer)

from trl import SFTTrainer, SFTConfig
from peft import LoraConfig

lora_config = LoraConfig(r=16, lora_alpha=32, target_modules="all-linear")

trainer = SFTTrainer(
    model=model,
    args=SFTConfig(output_dir="./output", max_seq_length=512),
    train_dataset=dataset,
    peft_config=lora_config,  # Pass LoRA config directly
)
trainer.train()

### With Axolotl (YAML config)

# axolotl config.yaml
adapter: lora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_target_linear: true  # Target all linear layers

### With vLLM (inference)

from vllm import LLM
from vllm.lora.request import LoRARequest

# Load base model with LoRA support
llm = LLM(model="meta-llama/Llama-3.1-8B", enable_lora=True)

# Serve with adapter
outputs = llm.generate(
    prompts,
    lora_request=LoRARequest("adapter1", 1, "./lora-adapter")
)

### Memory usage (Llama 3.1 8B)

MethodGPU MemoryTrainable ParamsFull fine-tuning60+ GB8B (100%)LoRA r=1618 GB14M (0.17%)QLoRA r=166 GB14M (0.17%)IA316 GB800K (0.01%)

### Training speed (A100 80GB)

MethodTokens/secvs Full FTFull FT2,5001xLoRA3,2001.3xQLoRA2,1000.84x

### Quality (MMLU benchmark)

ModelFull FTLoRAQLoRALlama 2-7B45.344.844.1Llama 2-13B54.854.253.5

### CUDA OOM during training

# Solution 1: Enable gradient checkpointing
model.gradient_checkpointing_enable()

# Solution 2: Reduce batch size + increase accumulation
TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16
)

# Solution 3: Use QLoRA
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")

### Adapter not applying

# Verify adapter is active
print(model.active_adapters)  # Should show adapter name

# Check trainable parameters
model.print_trainable_parameters()

# Ensure model in training mode
model.train()

### Quality degradation

# Increase rank
LoraConfig(r=32, lora_alpha=64)

# Target more modules
target_modules = "all-linear"

# Use more training data and epochs
TrainingArguments(num_train_epochs=5)

# Lower learning rate
TrainingArguments(learning_rate=1e-4)

### Best practices

Start with r=8-16, increase if quality insufficient
Use alpha = 2 * rank as starting point
Target attention + MLP layers for best quality/efficiency
Enable gradient checkpointing for memory savings
Save adapters frequently (small files, easy rollback)
Evaluate on held-out data before merging
Use QLoRA for 70B+ models on consumer hardware

### References

Advanced Usage - DoRA, LoftQ, rank stabilization, custom modules
Troubleshooting - Common errors, debugging, optimization

### Resources

GitHub: https://github.com/huggingface/peft
Docs: https://huggingface.co/docs/peft
LoRA Paper: arXiv:2106.09685
QLoRA Paper: arXiv:2305.14314
Models: https://huggingface.co/models?library=peft
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: Desperado991128
- Version: 0.1.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-03T01:59:02.398Z
- Expires at: 2026-05-10T01:59:02.398Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/peft)
- [Send to Agent page](https://openagent3.xyz/skills/peft/agent)
- [JSON manifest](https://openagent3.xyz/skills/peft/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/peft/agent.md)
- [Download page](https://openagent3.xyz/downloads/peft)