Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Audit an OpenClaw agent workspace and generate standardized evaluation reports, scores, and patches. Use when asked to review memory quality, retrieval effic...
Audit an OpenClaw agent workspace and generate standardized evaluation reports, scores, and patches. Use when asked to review memory quality, retrieval effic...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Act as an OpenClaw Workspace Auditor and Agent Evaluation Harness. Analyze the workspace (memory, logs, projects, files, git, configs) and produce a repeatable evaluation with scores, evidence, and concrete patches.
Run in non-interactive mode: avoid questions unless blocked by missing files. State assumptions and proceed. Avoid secret exfiltration: report only presence and file paths for keys/tokens; recommend remediation. Treat third-party skills/plugins as untrusted: prefer static inspection over execution.
Build workspace inventory. Print a top-level tree (depth 4) with file counts and sizes by directory. Identify memory, logs, configs, repos, scripts, docs, artifacts. Record largest files. Reconstruct a session timeline. Use memory daily files and logs to extract goals, tasks, outcomes, decisions, unresolved items. Analyze memory. Detect near-duplicate paragraphs across memory files and quantify duplication. Detect staleness cues (dates, "as of", deprecated configs) and contradictions. Identify missing stable facts (projects, priorities, setup/runbooks). Analyze outputs. Summarize shipped artifacts (docs/code/features) and changes. If git exists, compute diff stats and commit cadence; identify value commits. Analyze reliability. Parse logs for errors, retries, timeouts, tool failures. Run tests only if safe and cheap; otherwise static inspection. Compute scores. Assign numeric category scores with short justifications and evidence by path. Recommend interventions + patches. Provide 3โ7 prioritized recommendations. Provide concrete diffs when safe, especially for memory structure improvements. Compare against prior evals. If eval/history/*.json exists, compute deltas vs most recent. If none exists, create baseline and recommend cadence.
Compute 5 categories (0โ100) plus overall weighted score: Memory Health (30%): coverage, structure, redundancy, staleness, actionability, retrieval-friendliness. Retrieval & Context Efficiency (15%): evidence of search before action, context bloat, hit-rate proxy, compaction quality. Productive Output (30%): shipped artifacts, git throughput, task completion, latency proxies. Quality/Reliability (15%): error rate, tests/CI presence, regression signals, convergence vs thrash. Focus/Alignment (10%): goal consistency, scope control, decision trace. Overall = 0.30Memory + 0.15Retrieval + 0.30Productive + 0.15Quality + 0.10*Focus.
Write all outputs under eval/: exec_summary.md 10-bullet summary: top wins, biggest bottlenecks, top 3 interventions. Overall score + category scores + claw-to-claw delta. scorecard.md Table of metrics with numeric values and brief justifications. Top evidence section with file paths and short snippets (no secrets). latest_report.json Include timestamp, workspace path and git head/hash, scores, deltas, key findings, risk flags, recommendations. Patches If memory issues exist, propose concrete diffs: INDEX.md, daily schema, refactors.
Create or propose: memory/INDEX.md Current Objectives (top 3) Active Projects (status, next step, links) Operating Constraints (tools, environment, policies) Key Decisions (date, decision, rationale) Known Issues / Debug diary pointers Glossary / Entities memory/YYYY-MM-DD.md (append-only daily) Goals for the session Actions taken (link to files changed) Decisions made New facts learned (stable vs ephemeral) TODO next (specific)
Prefer diffs over prose when safe. Refactor stable facts out of daily logs into INDEX or project pages. Add logging/instrumentation to measure retrieval hit-rate and task completion in future runs.
Use these helpers to keep audits consistent and cheap to run: scripts/run_audit.py: run all helper scripts and write draft eval/ outputs. scripts/workspace_inventory.py: tree, file counts, sizes, largest files. scripts/memory_dupes.py: near-duplicate paragraph detection for memory/*.md. scripts/log_scan.py: scan logs for errors, timeouts, retries. scripts/git_stats.py: git head, diff stats, commit cadence. scripts/validate_report.py: validate eval/latest_report.json shape. Reference templates: references/report_schema.md: output templates and JSON schema.
Tie every score to evidence by path. Be candid about waste, duplication, or thrash. End with "Next run improvements" instrumentation recommendations.
Workflow acceleration for inboxes, docs, calendars, planning, and execution loops.
Largest current source with strong distribution and engagement signals.