Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Install, configure, and operate the TokenRanger OpenClaw plugin. Use when you want to reduce cloud LLM token costs by 50-80% via local Ollama context compres...
Install, configure, and operate the TokenRanger OpenClaw plugin. Use when you want to reduce cloud LLM token costs by 50-80% via local Ollama context compres...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
TokenRanger compresses session context through a local Ollama SLM before sending to cloud LLMs β reducing input token costs by 50β80% per turn with graceful fallthrough if anything goes wrong. Plugin repo: https://github.com/peterjohannmedina/openclaw-plugin-tokenranger npm: openclaw-plugin-tokenranger Maintained by: @peterjohannmedina
User asks to install, configure, or troubleshoot TokenRanger User wants to reduce token costs or enable context compression User runs /tokenranger commands and needs help interpreting output User wants to switch compression strategy (GPU/CPU/off) User asks about upgrading or uninstalling TokenRanger
User message β OpenClaw gateway β before_agent_start hook β Turn 1: skip (full fidelity) β Turn 2+: send history to localhost:8100/compress β FastAPI sidecar runs LangChain LCEL chain via Ollama β Compressed summary prepended to context β Cloud LLM receives compressed context instead of full history Inference strategy is auto-selected by GPU availability: StrategyTriggerModelApproachfullGPU availablemistral:7bDeep semantic summarizationlightCPU onlyphi3.5:3bExtractive bullet pointspassthroughOllama unreachableβTruncate to last 20 lines
openclaw plugins install openclaw-plugin-tokenranger To pin an exact version: openclaw plugins install openclaw-plugin-tokenranger@1.0.0 --pin
openclaw tokenranger setup This pulls Ollama models, creates the Python venv, installs FastAPI/LangChain deps, and registers the sidecar as a system service (systemd on Linux, launchd on macOS).
openclaw gateway restart
openclaw tokenranger Should show current settings and sidecar status (reachable / unreachable).
Set config values with: openclaw config set plugins.entries.tokenranger.config.<key> <value> openclaw gateway restart KeyDefaultDescriptionserviceUrlhttp://127.0.0.1:8100TokenRanger sidecar URLtimeoutMs10000Max wait before fallthroughminPromptLength500Min chars before compressingollamaUrlhttp://127.0.0.1:11434Ollama API URLpreferredModelmistral:7bModel for GPU strategycompressionStrategyautoauto / full / light / passthroughinferenceModeautoauto / cpu / gpu / remote Force CPU-only mode: openclaw config set plugins.entries.tokenranger.config.compressionStrategy light openclaw config set plugins.entries.tokenranger.config.inferenceMode cpu openclaw gateway restart
CommandDescription/tokenrangerShow current settings and sidecar health/tokenranger mode gpuForce GPU (full) compression/tokenranger mode cpuForce CPU (light) compression/tokenranger mode offDisable compression (passthrough)/tokenranger modelList available Ollama models/tokenranger toggleEnable / disable the plugin
# Check for updates (dry run) openclaw plugins update tokenranger --dry-run # Apply update openclaw plugins update tokenranger openclaw tokenranger setup # re-runs setup if sidecar deps changed openclaw gateway restart To pin a specific version: openclaw plugins install openclaw-plugin-tokenranger@2026.3.1 --pin openclaw tokenranger setup openclaw gateway restart List all published versions: npm view openclaw-plugin-tokenranger versions --json
openclaw plugins uninstall tokenranger openclaw gateway restart Remove the sidecar service manually: # Linux systemctl --user stop tokenranger && systemctl --user disable tokenranger rm ~/.config/systemd/user/tokenranger.service # macOS launchctl unload ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plist rm ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plist
Sidecar unreachable after setup: # Linux systemctl --user status tokenranger journalctl --user -u tokenranger -n 50 # macOS launchctl list | grep tokenranger cat ~/Library/Logs/tokenranger.log # Manual start (any platform) ~/.openclaw/extensions/tokenranger/service/start.sh Ollama not found: curl http://127.0.0.1:11434/api/tags # If not running: ollama serve Compression not reducing tokens: Check minPromptLength β default 500 chars; short conversations are skipped by design Run /tokenranger to confirm strategy is not passthrough Check sidecar logs for errors Graceful degradation: TokenRanger never blocks a message. Any failure β silent fallthrough to uncompressed cloud LLM call.
5-turn Discord benchmark (GPU, mistral:7b-instruct): TurnInput tokensCompressedReduction273212582.9%31,18015087.3%41,68521287.4%52,02827786.3% Cumulative: 5,866 β 885 tokens (84.9% reduction)
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.