Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Token-based context compaction for local models (MLX, llama.cpp, Ollama) that don't report context limits.
Token-based context compaction for local models (MLX, llama.cpp, Ollama) that don't report context limits.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Automatic context compaction for OpenClaw when using local models that don't properly report token limits or context overflow errors.
Cloud APIs (Anthropic, OpenAI) report context overflow errors, allowing OpenClaw's built-in compaction to trigger. Local models (MLX, llama.cpp, Ollama) often: Silently truncate context Return garbage when context is exceeded Don't report accurate token counts This leaves you with broken conversations when context gets too long.
Context Compactor estimates tokens client-side and proactively summarizes older messages before hitting the model's limit.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β 1. Message arrives β β 2. before_agent_start hook fires β β 3. Plugin estimates total context tokens β β 4. If over maxTokens: β β a. Split into "old" and "recent" messages β β b. Summarize old messages (LLM or fallback) β β c. Inject summary as compacted context β β 5. Agent sees: summary + recent + new message β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# One command setup (recommended) npx jasper-context-compactor setup # Restart gateway openclaw gateway restart The setup command automatically: Copies plugin files to ~/.openclaw/extensions/context-compactor/ Adds plugin config to openclaw.json with sensible defaults
Add to openclaw.json: { "plugins": { "entries": { "context-compactor": { "enabled": true, "config": { "maxTokens": 8000, "keepRecentTokens": 2000, "summaryMaxTokens": 1000, "charsPerToken": 4 } } } } }
OptionDefaultDescriptionenabledtrueEnable/disable the pluginmaxTokens8000Max context tokens before compactionkeepRecentTokens2000Tokens to preserve from recent messagessummaryMaxTokens1000Max tokens for the summarycharsPerToken4Token estimation ratiosummaryModel(session model)Model to use for summarization
MLX (8K context models): { "maxTokens": 6000, "keepRecentTokens": 1500, "charsPerToken": 4 } Larger context (32K models): { "maxTokens": 28000, "keepRecentTokens": 4000, "charsPerToken": 4 } Small context (4K models): { "maxTokens": 3000, "keepRecentTokens": 800, "charsPerToken": 4 }
Force clear the summary cache and trigger fresh compaction on next message. /compact-now
When compaction triggers: Split messages into "old" (to summarize) and "recent" (to keep) Generate summary using the session model (or configured summaryModel) Cache the summary to avoid regenerating for the same content Inject context with the summary prepended If the LLM runtime isn't available (e.g., during startup), a fallback truncation-based summary is used.
FeatureBuilt-inContext CompactorTriggerModel reports overflowToken estimate thresholdWorks with local modelsβ (need overflow error)β Persists to transcriptβ β (session-only)SummarizationPi runtimePlugin LLM call Context Compactor is complementary β it catches cases before they hit the model's hard limit.
Summary quality is poor: Try a better summaryModel Increase summaryMaxTokens The fallback truncation is used if LLM runtime isn't available Compaction triggers too often: Increase maxTokens Decrease keepRecentTokens (keeps less, summarizes earlier) Not compacting when expected: Check /context-stats to see current usage Verify enabled: true in config Check logs for [context-compactor] messages Characters per token wrong: Default of 4 works for English Try 3 for CJK languages Try 5 for highly technical content
Enable debug logging: { "plugins": { "entries": { "context-compactor": { "config": { "logLevel": "debug" } } } } } Look for: [context-compactor] Current context: ~XXXX tokens [context-compactor] Compacted X messages β summary
GitHub: https://github.com/E-x-O-Entertainment-Studios-Inc/openclaw-context-compactor OpenClaw Docs: https://docs.openclaw.ai/concepts/compaction
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.