{
  "schemaVersion": "1.0",
  "item": {
    "slug": "skill-engineer",
    "name": "Skill Engineer",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/chunhualiao/skill-engineer",
    "canonicalUrl": "https://clawhub.ai/chunhualiao/skill-engineer",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/skill-engineer",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=skill-engineer",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "CHANGELOG.md",
      "README.md",
      "SKILL.md",
      "references/README.md",
      "references/designer-guide.md",
      "references/reviewer-rubric.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/skill-engineer"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/skill-engineer",
    "agentPageUrl": "https://openagent3.xyz/skills/skill-engineer/agent",
    "manifestUrl": "https://openagent3.xyz/skills/skill-engineer/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/skill-engineer/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Skill Engineer",
        "body": "Own the full lifecycle of agent skills in your OpenClaw agent kit. The entire multi-agent workflow depends on skill quality — a weak skill produces weak results across every run.\n\nCore principle: Builders don't evaluate their own work. This skill enforces separation of concerns through a multi-agent architecture where design, review, and testing are performed by independent subagents."
      },
      {
        "title": "Skill Taxonomy",
        "body": "Source: Anthropic \"Improving skill-creator\" (2026-03-03)\n\nSkills fall into two categories. This distinction drives design decisions, testing strategy, and lifecycle management."
      },
      {
        "title": "Capability Uplift Skills",
        "body": "The model can't do it well alone — the skill injects techniques, patterns, or constraints that produce better output than prompting alone.\n\nExamples: Document creation skills (PDF generation), complex formatting, specialized analysis pipelines.\n\nTesting focus: Monitor whether the base model has caught up. If the base model passes your evals without the skill loaded, the skill's techniques have been incorporated into model default behavior. The skill isn't broken — it's no longer necessary.\n\nLifecycle: These skills may \"retire\" as models improve. Build evals that can detect when retirement is appropriate."
      },
      {
        "title": "Encoded Preference Skills",
        "body": "The model can already do each step — the skill sequences operations according to your team's specific process.\n\nExamples: NDA review against set criteria, weekly report generation from specific data sources, brand compliance checks.\n\nTesting focus: Verify the skill faithfully reproduces your actual workflow, not the model's \"free improvisation.\" Fidelity to process is the metric.\n\nLifecycle: These skills are durable — they encode organizational knowledge that doesn't change with model capability. They need maintenance when processes change, not when models change."
      },
      {
        "title": "Design Implication",
        "body": "When the Designer begins work, classify the skill:\n\nClassificationDesign priorityTest priorityRetirement riskCapability upliftTechnique precisionBase model comparisonHigh — monitor model progressEncoded preferenceProcess fidelityWorkflow reproductionLow — tied to org process"
      },
      {
        "title": "Mandatory Dependencies",
        "body": "This skill requires the following to be installed and available:\n\nDependencyTypePurposeInstall fromdeepwikiSkillQuery OpenClaw source for current API behaviorliaosvcaf/openclaw-skill-deepwikiVector memory DBOpenClaw featureSemantic search across session history, notes, and memory filesEnable in openclaw.json (memory.enabled: true)\n\nBefore starting any skill design or update session, verify both are available:\n\n# Check deepwiki\nls ~/.openclaw/skills/deepwiki/deepwiki.sh || ls ~/.openclaw/workspace-*/skills/deepwiki/deepwiki.sh\n\n# Check vector memory (should return results, not empty)\n# Use the memory_search tool with a known topic from recent sessions\n\nIf deepwiki is missing, install from liaosvcaf/openclaw-skill-deepwiki.\nIf vector memory returns no results on known topics, check that memory.enabled is true in openclaw.json and that indexing has run."
      },
      {
        "title": "Why These Are Non-Negotiable",
        "body": "DeepWiki: OpenClaw APIs are version-specific. Without DeepWiki, skills are written against memory of past behavior — which drifts as OpenClaw updates. DeepWiki grounds skill content in actual source code. A skill engineer without DeepWiki is working blind.\n\nVector memory DB: Session history, Obsidian notes, and past decisions are indexed here. Without it, the agent falls back to manual file search — slower, less accurate, and misses cross-document connections. Critical context from past sessions (installation guides, design decisions, pitfalls) lives in this index."
      },
      {
        "title": "Memory Search Protocol (MANDATORY)",
        "body": "Before searching files manually, always query the vector memory database first. It indexes session history, Obsidian notes, and memory files — and finds cross-document connections that manual search misses.\n\nWhen to query vector memory:\n\nUser asks \"do you remember...\", \"find the notes about...\", \"we did X before...\"\nLooking for past installation guides, design decisions, or troubleshooting records\nAny question about prior work, configurations, or lessons learned\n\nHow to query correctly:\n\nmemory_search(\"your query here\", maxResults=5)\n\nCritical rule: try multiple queries before giving up.\n\nIf the first query returns empty, do NOT fall back to manual file search immediately. Try at least 3 different phrasings:\n\nFirst query failsTry instead\"Docker OpenClaw installation\"\"dockerized openclaw Titan\"\"dockerized openclaw Titan\"\"openclaw isolation install guide\"Still emptyThen fall back to manual file search\n\nLesson learned (2026-03-03): When asked to find Docker/OpenClaw installation notes, memory_search returned empty on the first query and the agent immediately switched to manual SQLite/file search. The correct approach was to try different query phrasings — the second attempt (\"dockerized OpenClaw installation Titan setup\") returned 5 relevant results directly from indexed Obsidian notes. Manual file search is a last resort, not a first response."
      },
      {
        "title": "DeepWiki Staleness Protocol (MANDATORY)",
        "body": "OpenClaw APIs, skill loading behavior, subagent mechanics, and frontmatter fields are version-specific. Information in this skill or any skill referencing OpenClaw internals may be outdated.\n\nALWAYS query DeepWiki when:\n\nDesigning a skill that uses sessions_spawn, tool calls, or OpenClaw-specific APIs\nReferencing skill frontmatter fields or loading precedence\nUpdating an existing skill that has version-tagged sections\nThe installed OpenClaw version differs from any version tag in the skill\nYou are unsure whether an API, field, or behavior still exists\n\nHow to check:\n\n# Check current OpenClaw version\nopenclaw --version\n\n# Query DeepWiki for current behavior\n~/.openclaw/skills/deepwiki/deepwiki.sh ask openclaw/openclaw \"YOUR QUESTION\"\n\nDo NOT rely on memory or this skill's documented behavior without verifying when the topic is OpenClaw internals. DeepWiki is grounded in the actual source code. This skill's documentation is not.\n\nVerification checklist before shipping any skill that references OpenClaw internals:\n\nChecked openclaw --version against version tags in the skill\n Queried DeepWiki to confirm API/field behavior is current\n Updated version tags if behavior has changed"
      },
      {
        "title": "What This Skill Handles",
        "body": "Skill design: SKILL.md, skill.yml, README.md, tests, scripts, references\nSkill review: quality evaluation, rubric scoring, gap analysis\nSkill testing: self-play validation, trigger testing, functional testing\nSkill maintenance: iteration based on feedback, refactoring\nAgent kit audit: inventory, consistency, quality scoring across all skills"
      },
      {
        "title": "What This Skill Does NOT Handle",
        "body": "Release pipeline — publishing, versioning, changelogs belong to release processes\nRepository management — git submodules, repo creation, branch strategy belong to your VCS workflow\nDeployment — installing skills to agents, configuration management\nTracking — progress tracking, task management, project boards\nInfrastructure — MCP servers, API keys, environment setup"
      },
      {
        "title": "Where This Skill Ends",
        "body": "This skill produces validated skill artifacts (SKILL.md, skill.yml, README.md, tests, scripts). Once artifacts pass quality gates, responsibility transfers to whatever system handles publishing and deployment."
      },
      {
        "title": "Success Criteria",
        "body": "A skill development cycle is considered successful when:\n\nQuality gates passed — Reviewer scores ≥28/33 (Deploy threshold)\nNo blocking issues — Tester reports no issues marked as \"blocking\"\nAll artifacts generated — SKILL.md, skill.yml, README.md, tests/, scripts/ (if needed), references/ (if needed)\nOPSEC clean — No hardcoded secrets, paths, org names, or private URLs\nScripts validated — All deterministic validation scripts execute successfully on target platform(s)\nTrigger accuracy — Tester reports ≥90% trigger accuracy (true positives + true negatives)\n\nIf any criterion fails, the skill returns to the Designer for revision."
      },
      {
        "title": "Inputs",
        "body": "When invoking this skill, the orchestrator must gather:\n\nInputDescriptionRequiredSourceProblem descriptionWhat capability or workflow needs to be enabledYesUser conversationTarget audienceWhich agent(s) will use this skillYesUser or inferredExpected interactionsWith users, APIs, files, MCP servers, other skillsYesRequirements discussionInputs/OutputsWhat data the skill receives and producesYesRequirements discussionConstraintsPerformance limits, security requirements, dependenciesNoUser or systemPrior feedbackReview or test reports from previous iterationsNoPrevious Reviewer/TesterExisting artifactsIf refactoring/maintaining an existing skillNoFile system\n\nExample requirements gathering:\n\nUser: \"I need a skill for analyzing competitor websites\"\n\nOrchestrator gathers:\n- Problem: Automate competitor analysis with structured output\n- Audience: research-agent\n- Interactions: web_fetch, browser tool, writes markdown reports\n- Inputs: competitor URLs, analysis criteria\n- Outputs: comparison table, insights markdown\n- Constraints: must complete in <60s per site\n\nThese inputs are then passed to the Designer to begin the design process."
      },
      {
        "title": "Architecture Overview",
        "body": "The skill-engineer uses a three-role iterative architecture. The orchestrator spawns subagents for each role and never does creative or evaluation work directly."
      },
      {
        "title": "Pattern Selection (IMPORTANT)",
        "body": "Two architecture modes are available. Choose based on complexity:\n\nMode A: Director-Controlled (simple/short skill work)\nUse when: ≤2 phases, <10 minutes total, user interaction needed between phases (e.g., quick fixes, single-skill reviews).\n\nDirector/Orchestrator (main agent, depth 0)\n    ├─ Spawn ──→ Designer (depth 1)\n    ├─ Spawn ──→ Reviewer (depth 1)\n    └─ Spawn ──→ Tester (depth 1)\n\nRisk: announce-to-action gap — if user sends a message while waiting for a subagent, the main agent may handle that instead of chaining the next phase. Mitigate with cron safety net (see below).\n\nMode B: Orchestrator Subagent Pattern (complex/long skill work)\nUse when: 3+ phases, >10 minutes, pipeline must not stall, parallel workers needed.\n\nDirector (user-facing, depth 0)\n    └── Orchestrator (pipeline owner, depth 1)\n        ├─ Spawn ──→ Designer (depth 2)\n        ├─ Spawn ──→ Reviewer (depth 2)\n        └─ Spawn ──→ Tester (depth 2)\n\nThe Director spawns a single Orchestrator subagent with the full task description. The Orchestrator owns the entire Design→Review→Test loop without yielding control between phases. User messages go to the Director; the pipeline runs uninterrupted.\n\nRequired config for Mode B:\n\n{\n  \"agents\": { \"defaults\": { \"subagents\": { \"maxSpawnDepth\": 2 } } }\n}\n\nWhy Mode B is superior for complex work:\n\nNo announce-to-action gap (orchestrator chains phases immediately within the same session)\nImmune to user interruption between phases\nPersistent pipeline state without re-deriving from files each turn\n\nReference: orchestrator-subagent-pattern-2026-02-28.md (Obsidian notes) — documented after a real 70-minute pipeline stall incident."
      },
      {
        "title": "Mode A Safety Net (cron)",
        "body": "When using Mode A, set a cron safety net after each spawn to catch announce-to-action failures:\n\n\"Check if [designer/reviewer/tester] subagent has completed. If so and next phase not started, resume pipeline.\"\n(fires 15 min after spawn)"
      },
      {
        "title": "Iteration Loop",
        "body": "Designer → Reviewer ──pass──→ Tester ──pass──→ Ship\n              │                  │\n              fail               fail\n              │                  │\n              ▼                  ▼\n         Designer revises   Designer revises\n              │                  │\n              ▼                  ▼\n           Reviewer          Reviewer + Tester\n              │\n           (max 3 iterations, then fail)\n\nExit conditions:\n\nShip: Reviewer scores ≥ 28/33 (85%+) AND Tester reports no blocking issues\nRevise: Reviewer or Tester found fixable issues (iterate)\nFail: 3 iterations exhausted and still below quality bar"
      },
      {
        "title": "Iteration Failure Path",
        "body": "After 3 failed iterations, the orchestrator must:\n\nStop iteration — do not continue trying\nReport failure to user with:\n\nSummary: \"Skill development failed after 3 iterations\"\nAll 3 iteration reports (Reviewer + Tester feedback)\nFinal quality score\nList of unresolved blocking issues\n\n\nPresent options to user:\n\nProvide more context or clarify requirements (restart with better inputs)\nSimplify scope (reduce skill complexity and retry)\nAbandon this skill (requirements may be infeasible)\n\n\nDo NOT silently fail — always report to user and await decision\n\nNever: Continue past 3 iterations or ship a skill that hasn't passed quality gates."
      },
      {
        "title": "Subagent Spawning Mechanism",
        "body": "Version note: Verified against OpenClaw v2026.2.26. API may change.\n\nIn OpenClaw, subagents are spawned using the sessions_spawn tool (not a CLI command). Subagents run in isolated sessions, announce results back to the requester's channel when complete, and are auto-archived after 60 minutes by default.\n\nKey constraints on subagents:\n\nDefault max spawn depth is 1 (subagents cannot spawn further subagents unless maxSpawnDepth: 2 is configured)\nDefault max 5 active children per agent at once\nSubagents do NOT receive SOUL.md, IDENTITY.md, or USER.md — only AGENTS.md and TOOLS.md\nUse runTimeoutSeconds to prevent hanging (900s for Designer, 600s for Reviewer/Tester)\nResults are announced back automatically; reply ANNOUNCE_SKIP to suppress"
      },
      {
        "title": "Director vs. Orchestrator Roles",
        "body": "This is the most important architectural decision. Understand it before proceeding."
      },
      {
        "title": "The Problem with Naive Single-Agent Control",
        "body": "The natural instinct is to have the main agent (you) directly manage the Design→Review→Test loop:\n\nMain agent\n    ├── spawns Designer → waits for announce → spawns Reviewer → waits → spawns Tester\n\nThis breaks in three ways:\n\nAnnounce-to-action gap: When a subagent finishes, OpenClaw sends a completion announce that triggers a new LLM turn. The LLM may report results to the user and stop — treating the announce as informational rather than a pipeline trigger. There is no mechanism that forces the next action.\n\n\nContext loss: Each new turn is a fresh LLM call. Between subagent completion and the next turn, there is no persistent state machine tracking \"we're in iteration 2, reviewer passed, now run Tester.\" The agent must re-derive this from files every time — fragile over 3+ iterations.\n\n\nUser message interruption: If the user sends a message while the pipeline is between phases, the main agent handles that message instead of continuing. The pipeline stalls silently until the user notices.\n\nReal incident: A book-writer pipeline stalled for 70 minutes because a research subagent completed and announced back, but the Director reported results to the user and stopped — never spawning the writing phase. (2026-02-28)"
      },
      {
        "title": "The Solution: One Level of Indirection",
        "body": "Add an intermediate Orchestrator subagent that owns the pipeline. The main agent becomes the Director — it talks to the user. The Orchestrator does the pipeline work. They don't share context.\n\nDirector (main agent, depth 0)  ←→  User\n    │\n    └── Orchestrator (subagent, depth 1) — owns Design→Review→Test loop\n        ├── Designer (depth 2)\n        ├── Reviewer (depth 2)\n        └── Tester (depth 2)\n\nWhy this works:\n\nThe Orchestrator runs as a single continuous session. It processes each subagent's completion announce immediately — no turn boundary between phases, no gap.\nUser messages go to the Director (depth 0), not the Orchestrator. The pipeline cannot be interrupted by user activity.\nThe Orchestrator maintains full pipeline state throughout its run without re-deriving from files.\n\nRequired config (add to openclaw.json before using this pattern):\n\n{\n  \"agents\": { \"defaults\": { \"subagents\": { \"maxSpawnDepth\": 2 } } }\n}"
      },
      {
        "title": "When to Use Each Mode",
        "body": "SituationUseWhyQuick fix, single skill review, <10 minDirector-only (depth 1 subagents)Simpler, fewer spawnsFull design cycle (Design+Review+Test)Director + Orchestrator (depth 2)Pipeline cannot afford to stallAny pipeline with 3+ sequential phasesDirector + Orchestrator (depth 2)Announce-to-action gap becomes criticalmaxSpawnDepth not set to 2Director-only with cron safety netNo choice — see fallback below"
      },
      {
        "title": "Fallback: Director-Only with Cron Safety Net",
        "body": "If maxSpawnDepth: 2 is not configured, use Director-only mode but add a cron safety net after each subagent spawn:\n\nAfter spawning Designer, register a cron job:\n\"Check if Designer has completed (look for output at /path/to/skill/SKILL.md).\n If completed and Reviewer not yet started, spawn Reviewer now.\"\n(fires 15 minutes after spawn)\n\nThis mitigates but does not eliminate the announce-to-action gap."
      },
      {
        "title": "Director Responsibilities",
        "body": "The Director (main agent) talks to the user and kicks off the pipeline. It does NOT do design, review, or testing work.\n\nGather requirements from the user (problem, audience, inputs/outputs, interactions)\nQuery DeepWiki — if the skill touches any OpenClaw internals, query DeepWiki FIRST:\n~/.openclaw/skills/deepwiki/deepwiki.sh ask openclaw/openclaw \"RELEVANT QUESTION\"\n\n\nChoose mode — Director-only (simple) or Director+Orchestrator (full cycle)\nFor Director+Orchestrator mode: Spawn a single Orchestrator subagent with complete task description including: requirements, DeepWiki findings, artifact output path, quality rubric location, max iterations\nFor Director-only mode: Execute Orchestrator Responsibilities directly (see below)\nRelay final result to user when pipeline completes"
      },
      {
        "title": "Orchestrator Responsibilities",
        "body": "The Orchestrator (depth-1 subagent in Mode B, or main agent in fallback mode) owns the Design→Review→Test loop. It does NOT write skill content or evaluate quality — it only coordinates.\n\nQuery DeepWiki for any OpenClaw-specific topics in the requirements (if Director hasn't already)\nSpawn Designer with requirements, DeepWiki findings, and any prior feedback\nsessions_spawn(\n  task=\"Act as Designer. Requirements: [...]. Write artifacts to /path/to/skill/\",\n  label=\"skill-v1-designer\",\n  runTimeoutSeconds=900\n)\n\n\nCollect Designer output — verify all required files exist at output path\nSpawn Reviewer with artifacts and quality rubric\nsessions_spawn(\n  task=\"Act as Reviewer. Evaluate skill at /path/to/skill/ using rubric: [...]. Score all 33 checks.\",\n  label=\"skill-v1-reviewer\",\n  runTimeoutSeconds=600\n)\n\n\nCollect Reviewer feedback (scores + structured issues)\nIf score <28/33 or blocking issues: feed feedback back to Designer → go to step 2, increment iteration count\nIf passing review: Spawn Tester\nsessions_spawn(\n  task=\"Act as Tester. Run self-play on skill at /path/to/skill/. Test triggers, functional steps, edge cases.\",\n  label=\"skill-v1-tester\",\n  runTimeoutSeconds=600\n)\n\n\nCollect Tester results (pass/fail + report)\nIf blocking issues: feed test results back to Designer → go to step 2\nIf all pass: add quality scorecard to README.md → announce completion to Director\nTrack iteration count — after 3 failed iterations, report failure with all iteration logs"
      },
      {
        "title": "Final Review Scores in README",
        "body": "Every shipped skill must include a quality scorecard in its README.md. This is the Reviewer's final scores, added by the Orchestrator before delivery:\n\n## Quality Scorecard\n\n| Category | Score | Details |\n|----------|-------|---------|\n| Completeness (SQ-A) | 7/7 | All checks pass |\n| Clarity (SQ-B) | 4/5 | Minor ambiguity in edge case handling |\n| Balance (SQ-C) | 4/4 | AI/script split appropriate |\n| Integration (SQ-D) | 4/4 | Compatible with standard agent kit |\n| Scope (SCOPE) | 3/3 | Clean boundaries, no leaks |\n| OPSEC | 2/2 | No violations |\n| References (REF) | 3/3 | All sources cited |\n| Architecture (ARCH) | 2/2 | Separation of concerns maintained |\n| **Total** | **29/30** | |\n\n*Scored by skill-engineer Reviewer (iteration 2)*\n\nThis scorecard serves as a quality certificate. Users can assess skill quality before installing."
      },
      {
        "title": "Version Control",
        "body": "The orchestrator manages git commits throughout the workflow:\n\nWhen to commit:\n\nAfter Designer produces initial artifacts (iteration 1): git add . && git commit -m \"feat: initial design for <skill-name>\"\nAfter Designer revisions (iteration 2+): git add . && git commit -m \"fix: address review issues (iteration N)\"\nAfter Tester passes and before ship: git add README.md && git commit -m \"docs: add quality scorecard for <skill-name>\"\n\nWhen to push:\n\nAfter final ship (all gates passed): git push origin main\nDo NOT push intermediate iterations — only ship-ready artifacts\n\nBranch strategy:\n\nWork in main branch for routine skill development\nUse feature branches for experimental or breaking changes"
      },
      {
        "title": "Error Handling",
        "body": "The orchestrator must handle technical failures gracefully:\n\nFailure TypeDetectionResponseGit push failsExit code ≠ 0Retry once. If fails again, report to user: \"Cannot push to remote. Check network/permissions.\"OPSEC scan script missingFile not foundSkip OPSEC automated check, but flag in review: \"Manual OPSEC review required — script not found.\"File write errorsPermission deniedReport: \"Cannot write to [path]. Check file permissions.\" Fail workflow.Subagent crashesTimeout or errorLog the error, attempt retry once. If fails again, report: \"Subagent failed. Manual intervention required.\"Review score = 0All checks failReport: \"Skill failed all quality checks. Requirements may be unclear or skill design is fundamentally flawed. Recommend starting over.\"\n\nRetry logic:\n\nGit operations: 1 retry after 5s delay\nFile operations: 1 retry after 2s delay\nSubagent spawns: 1 retry with fresh context\n\nFail-fast rules:\n\nIf iteration count exceeds 3, fail immediately (no further retries)\nIf OPSEC violations found, fail immediately (no iteration)\nIf required files cannot be written, fail immediately"
      },
      {
        "title": "Performance Notes",
        "body": "Orchestrator workload: Coordinating Designer/Reviewer/Tester across 1-3 iterations can be complex, especially for large skills (1000+ lines). The orchestrator manages:\n\nRequirements gathering\nSubagent coordination (3-9 spawns in typical workflow)\nFeedback routing between roles\nIteration tracking\nFinal scorecard assembly\nGit operations\n\nToken considerations: A full 3-iteration cycle can consume 50k-150k tokens depending on skill complexity. For extremely complex skills, consider:\n\nBreaking into sub-skills (each with simpler scope)\nUsing separate agent sessions (Option 2 spawning) to isolate token contexts\nSimplifying requirements before starting iteration\n\nIf orchestrator feels overwhelmed: This is a signal that the skill being designed may be too complex. Revisit the scope definition and consider decomposition."
      },
      {
        "title": "Spawning Context",
        "body": "Each subagent receives only what it needs:\n\nRoleReceivesDoes NOT ReceiveDesignerRequirements, prior feedback (if any), design principlesReviewer rubric internalsReviewerSkill artifacts, quality rubric, scope boundariesRequirements discussionTesterSkill artifacts, test protocolReview scores"
      },
      {
        "title": "Designer Role",
        "body": "Purpose: Generate and revise skill content.\n\nFor complete Designer instructions, see: references/designer-guide.md"
      },
      {
        "title": "Quick Reference",
        "body": "Inputs: Requirements, design principles, feedback (on iterations 2+)\n\nOutputs: SKILL.md, skill.yml, README.md, tests/, scripts/, references/\n\nKey constraints:\n\nApply progressive disclosure (frontmatter → body → linked files)\nApply scoping rules (explicit boundaries, no scope creep)\nApply tool selection guardrails (validate before execution)\nREADME for strangers only (no internal org details)\nFollow AI vs. Script decision framework\n\nDesign principles:\n\nProgressive disclosure (3-level system)\nComposability (works alongside other skills)\nPortability (same skill works across Claude.ai, Claude Code, API)"
      },
      {
        "title": "Reviewer Role",
        "body": "Purpose: Independent quality evaluation. The Reviewer has never seen the requirements discussion — it evaluates artifacts on their own merits.\n\nFor complete Reviewer rubric and scoring guide, see: references/reviewer-rubric.md"
      },
      {
        "title": "Quick Reference",
        "body": "Inputs: Skill artifacts, quality rubric, scope boundaries\n\nOutputs: Review report with scores, verdict (PASS/REVISE/FAIL), issues, strengths\n\nQuality rubric (33 checks total):\n\nSQ-A: Completeness (8 checks)\nSQ-B: Clarity (5 checks)\nSQ-C: Balance (5 checks)\nSQ-D: Integration (5 checks)\nSCOPE: Boundaries (3 checks)\nOPSEC: Security (2 checks)\nREF: References (3 checks)\nARCH: Architecture (2 checks)\n\nScoring thresholds:\n\n28-33 pass → Deploy (PASS verdict)\n20-27 pass → Revise (fixable issues)\n10-19 pass → Redesign (major rework)\n0-9 pass → Reject (fundamentally flawed)\n\nPre-review: Run deterministic validation scripts before manual evaluation"
      },
      {
        "title": "Tester Role",
        "body": "Purpose: Empirical validation via self-play. The Tester loads the skill and attempts realistic tasks.\n\nFor complete Tester protocol, see: references/tester-protocol.md"
      },
      {
        "title": "Quick Reference",
        "body": "Inputs: Skill artifacts, test protocol\n\nOutputs: Test report with trigger accuracy, functional test results, edge cases, blocking/non-blocking issues, verdict (PASS/FAIL)\n\nTest protocol:\n\nTrigger tests — verify skill loads correctly (≥90% accuracy threshold)\nFunctional tests — execute 2-3 realistic tasks, note confusion points\nEdge case tests — missing inputs, ambiguous requirements, boundary cases\n\nIssue classification:\n\nBlocking: Prevents skill from functioning (must fix before ship)\nNon-blocking: Impacts quality but doesn't break core functionality\n\nPass criteria: No blocking issues + ≥90% trigger accuracy"
      },
      {
        "title": "Separation of Concerns Rule",
        "body": "The agent that DESIGNS a skill must NOT be the same agent that AUDITS it in the same session.\n\nThis is a hard architectural rule, not a guideline. When the same agent designs and audits in one session, it creates structural circularity: the designer unconsciously frames evaluation in terms of their own intentions, missing gaps that a fresh reader would catch.\n\nEnforcement:\n\nAll audit work (Reviewer role, Tester role) MUST be performed by a fresh subagent spawned after design is complete.\nUse openclaw agent --session-id <unique-id> (Option 2 spawning) when auditing a skill the current session has designed.\nThe orchestrator may never evaluate its own spawned Designer's output directly — it must route all evaluation through an independent Reviewer subagent.\nIn role-based execution (Option 1), the agent must explicitly transition: complete all Designer work, then start the Reviewer role with no reference to design-time reasoning.\n\nWhy this matters:\n\nA designer who audits their own work will score it against their intentions, not against what a new agent will actually experience.\nThe rubric (SQ-C3) explicitly prohibits a sub-agent from being both producer AND evaluator of the same output.\nThis rule is the implementation of that check at the session level.\n\nExample — correct:\n\n# Session A: Designer work\nsessions_spawn(\n  task=\"Design a skill for X. Write artifacts to /path/to/skill/\",\n  label=\"skill-v1-designer\",\n  runTimeoutSeconds=900\n)\n\n# Session B: Audit (fresh session, no context from Session A)\nsessions_spawn(\n  task=\"Audit the skill at /path/to/skill/ using the reviewer rubric.\",\n  label=\"skill-v1-auditor\",\n  runTimeoutSeconds=600\n)\n\nExample — incorrect:\n\n[Session A]\n1. Design the skill...\n2. Now let me review the skill I just designed...  ← VIOLATION"
      },
      {
        "title": "Evals Framework",
        "body": "Source: Anthropic \"Improving skill-creator\" (2026-03-03). Adapted for OpenClaw skill-engineer.\n\nEvals turn \"seems to work\" into \"verified to work.\" Every shipped skill should have persistent evals that can be re-run after model updates, skill edits, or environment changes."
      },
      {
        "title": "Eval Structure",
        "body": "An eval consists of:\n\nTest prompt — a realistic user input that should trigger the skill\nExpected behavior description — what \"good\" looks like (natural language, not exact match)\nPass/fail criteria — specific, observable conditions\n\nStore evals in the skill's tests/ directory:\n\ntests/\n├── evals.json           # Eval definitions\n├── benchmarks/          # Benchmark run results (timestamped)\n└── comparisons/         # A/B comparison results"
      },
      {
        "title": "Eval Types",
        "body": "TypePurposeWhen to runRegression evalCatch quality drops after changesAfter every skill edit or model updateCapability evalDetect if base model has outgrown the skillMonthly, or after major model releasesTrigger evalVerify skill fires correctlyAfter description changes"
      },
      {
        "title": "Benchmark Mode",
        "body": "Run standardized assessments tracking:\n\nEval pass rate — what percentage of evals pass\nElapsed time — how long each eval takes\nToken usage — cost per eval run\n\nStore benchmark results with timestamps for trend tracking:\n\n{\n  \"timestamp\": \"2026-03-04T12:00:00Z\",\n  \"skill\": \"my-skill\",\n  \"model\": \"claude-sonnet-4-5\",\n  \"pass_rate\": 0.85,\n  \"avg_time_s\": 12.3,\n  \"avg_tokens\": 4200,\n  \"evals_run\": 10\n}"
      },
      {
        "title": "Comparator Testing (A/B Blind Test)",
        "body": "Compare two skill versions — or skill vs. no skill — using a blind judge:\n\nRun the same test prompt through Version A and Version B\nA Comparator subagent (fresh context, no knowledge of which is which) evaluates both outputs\nThe Comparator scores on relevant dimensions without knowing the source\n\nWhen to use:\n\nBefore shipping a major revision (old vs. new)\nTo justify a skill's existence (with-skill vs. without-skill)\nTo compare two alternative approaches during design\n\nSpawning a Comparator:\n\nsessions_spawn(\n  task=\"You are a blind comparator. You will receive Output A and Output B for the same task. Score each on [dimensions]. You do NOT know which version produced which output. Be objective.\",\n  label=\"skill-comparator\",\n  runTimeoutSeconds=300\n)"
      },
      {
        "title": "Description Tuning",
        "body": "Skill descriptions determine trigger accuracy. As skill count grows, description precision becomes critical:\n\nToo broad → false triggers (skill loads when irrelevant)\nToo narrow → misses (skill doesn't load when needed)\n\nTuning protocol:\n\nCollect 10-20 sample prompts (mix of should-trigger and should-not-trigger)\nRun each prompt and check whether the skill triggers correctly\nAnalyze false positives and false negatives\nRevise the description field to be more precise\nRe-run trigger tests to verify improvement\n\nTarget: ≥90% trigger accuracy on sample prompts. Anthropic's internal testing improved 5 out of 6 public skills using this method."
      },
      {
        "title": "Skill Retirement Protocol",
        "body": "Skills are not forever. Capability uplift skills may become unnecessary as models improve.\n\nRetirement signal: Base model passes ≥80% of the skill's evals without the skill loaded.\n\nRetirement process:\n\nRun capability evals with skill disabled\nIf pass rate ≥80%, flag skill as \"retirement candidate\"\nRun comparator test (with-skill vs. without-skill) to confirm\nIf comparator shows no significant quality difference, retire the skill\nArchive (don't delete) — the skill may become relevant again with different models\n\nTrack in audit reports:\n\n## Retirement Candidates\n\n| Skill | Capability Eval (no skill) | Comparator Result | Recommendation |\n|-------|---------------------------|-------------------|----------------|\n| pdf-creator | 85% pass | No significant difference | Retire |"
      },
      {
        "title": "Agent Kit Audit Protocol",
        "body": "Periodic full audit of the agent kit:\n\nInventory all skills — list every SKILL.md with owner agent\nCheck for orphans — skills that no agent uses\nCheck for duplicates — overlapping functionality\nCheck for gaps — workflow steps that have no skill\nCheck balance — are some agents overloaded while others idle?\nCheck consistency — naming conventions, output formats\nRun quality score on each skill (SQ-A through SQ-D)\nProduce audit report with scores and recommendations"
      },
      {
        "title": "Audit Output Template",
        "body": "# Agent Kit Audit Report\n\n**Date:** [date]\n**Skills audited:** [count]\n\n## Skill Inventory\n\n| # | Skill | Agent | Quality Score | Status |\n|---|-------|-------|--------------|--------|\n| 1 | [name] | [agent] | X/33 | Deploy/Revise/Redesign |\n\n## Issues Found\n1. ...\n\n## Recommendations\n1. ...\n\n## Action Items\n| # | Action | Priority | Owner |\n|---|--------|----------|-------|"
      },
      {
        "title": "Skill Interaction Map",
        "body": "Maintain a map of how skills interact:\n\norchestrator-agent (coordinates workflow)\n    ├── content-creator (writes content)\n    │   └── consumes: research outputs, review feedback\n    ├── content-reviewer (reviews content)\n    │   └── produces: review reports\n    ├── research-analyst (researches topics)\n    │   └── produces: research consumed by content-creator\n    ├── validator (validates outputs)\n    └── skill-engineer (this skill — meta)\n        └── consumes: all skills for audit\n\nAdapt this to your specific agent architecture."
      },
      {
        "title": "OpenClaw Skill System Reference",
        "body": "Version note: This section is based on OpenClaw v2026.2.26. Skill system behavior (frontmatter fields, loading precedence, subagent APIs) may change across versions. Verify against source or DeepWiki when upgrading."
      },
      {
        "title": "Skill Structure",
        "body": "A skill is a directory containing at minimum a SKILL.md file:\n\nmy-skill/\n├── SKILL.md          # Required: frontmatter + instructions\n├── skill.yml         # Optional: ClawhHub publish metadata\n├── README.md         # Optional: human-facing documentation\n├── scripts/          # Optional: deterministic helper scripts\n├── tests/            # Optional: test cases and fixtures\n└── references/       # Optional: detailed linked documentation"
      },
      {
        "title": "SKILL.md Frontmatter Format",
        "body": "Required fields:\n\n---\nname: skill-name          # kebab-case, no spaces/capitals/underscores\ndescription: |            # What it does + when to use it + trigger phrases\n  [What it does]. Use when user [trigger phrases]. [Key capabilities].\n---\n\nFull supported fields:\n\n---\nname: skill-name\ndescription: ...\nhomepage: https://...                    # URL for Skills UI\nuser-invocable: true                     # Expose as slash command (default: true)\ndisable-model-invocation: false          # Exclude from model prompt (default: false)\ncommand-dispatch: tool                   # Bypass model, dispatch to tool directly\ncommand-tool: tool-name                  # Tool to invoke when command-dispatch is set\ncommand-arg-mode: raw                    # Argument forwarding mode (default: raw)\nmetadata: {\"openclaw\": {\"always\": true, \"emoji\": \"🔧\", \"os\": [\"darwin\",\"linux\"], \"requires\": {\"bins\": [\"curl\",\"python3\"]}, \"primaryEnv\": \"MY_API_KEY\"}}\n---\n\nmetadata.openclaw load-time gates:\n\nFieldPurposealways: trueAlways include, skip all other gatesemojiEmoji shown in macOS Skills UIosLimit to platforms: darwin, linux, win32requires.binsAll binaries must exist on PATHrequires.anyBinsAt least one binary must existrequires.envEnvironment variables must existrequires.configopenclaw.json paths must be truthyprimaryEnvLinks to skills.entries.<name>.apiKey in config"
      },
      {
        "title": "Skill Loading Precedence",
        "body": "Skills are loaded from these locations (highest → lowest priority):\n\n<workspace>/skills/ — agent-specific, highest precedence\n~/.openclaw/skills/ — shared across all agents on machine\nskills.load.extraDirs in openclaw.json — additional directories\nBundled skills — shipped with OpenClaw, lowest precedence\nPlugin skills — listed in openclaw.plugin.json"
      },
      {
        "title": "When to Use Each Location",
        "body": "LocationUse when<workspace>/skills/Skill is specific to one agent's role; under active development~/.openclaw/skills/Skill should be available to all agents on this machine"
      },
      {
        "title": "How Skills Are Triggered",
        "body": "OpenClaw builds a system prompt with a compact XML list of available skills (name, description, location). The model reads this list and decides which skills to load. Skills are NOT auto-injected — the model must explicitly read the SKILL.md when needed.\n\nTrigger accuracy goal: ≥90% (skill loads when relevant, does NOT load when irrelevant)."
      },
      {
        "title": "Skill Discovery Command",
        "body": "To inventory all skills on a machine:\n\nfind ~/.openclaw/ -name \"SKILL.md\" -not -path \"*/node_modules/*\" | sort"
      },
      {
        "title": "Configuration",
        "body": "No persistent configuration required. The skill uses tools available in the agent's environment.\n\nRequirementDescriptiondeepwiki skillQuery OpenClaw source for current API behavior (liaosvcaf/openclaw-skill-deepwiki)Vector memorySemantic search across session history (memory.enabled: true in openclaw.json)gh CLIGitHub repo creation and visibility changes for release pipelineclawhub CLIPublish skills to ClawhHub registry (npm i -g clawhub)\n\nSee references/designer-guide.md for full environment setup."
      }
    ],
    "body": "Skill Engineer\n\nOwn the full lifecycle of agent skills in your OpenClaw agent kit. The entire multi-agent workflow depends on skill quality — a weak skill produces weak results across every run.\n\nCore principle: Builders don't evaluate their own work. This skill enforces separation of concerns through a multi-agent architecture where design, review, and testing are performed by independent subagents.\n\nSkill Taxonomy\n\nSource: Anthropic \"Improving skill-creator\" (2026-03-03)\n\nSkills fall into two categories. This distinction drives design decisions, testing strategy, and lifecycle management.\n\nCapability Uplift Skills\n\nThe model can't do it well alone — the skill injects techniques, patterns, or constraints that produce better output than prompting alone.\n\nExamples: Document creation skills (PDF generation), complex formatting, specialized analysis pipelines.\n\nTesting focus: Monitor whether the base model has caught up. If the base model passes your evals without the skill loaded, the skill's techniques have been incorporated into model default behavior. The skill isn't broken — it's no longer necessary.\n\nLifecycle: These skills may \"retire\" as models improve. Build evals that can detect when retirement is appropriate.\n\nEncoded Preference Skills\n\nThe model can already do each step — the skill sequences operations according to your team's specific process.\n\nExamples: NDA review against set criteria, weekly report generation from specific data sources, brand compliance checks.\n\nTesting focus: Verify the skill faithfully reproduces your actual workflow, not the model's \"free improvisation.\" Fidelity to process is the metric.\n\nLifecycle: These skills are durable — they encode organizational knowledge that doesn't change with model capability. They need maintenance when processes change, not when models change.\n\nDesign Implication\n\nWhen the Designer begins work, classify the skill:\n\nClassification\tDesign priority\tTest priority\tRetirement risk\nCapability uplift\tTechnique precision\tBase model comparison\tHigh — monitor model progress\nEncoded preference\tProcess fidelity\tWorkflow reproduction\tLow — tied to org process\nMandatory Dependencies\n\nThis skill requires the following to be installed and available:\n\nDependency\tType\tPurpose\tInstall from\ndeepwiki\tSkill\tQuery OpenClaw source for current API behavior\tliaosvcaf/openclaw-skill-deepwiki\nVector memory DB\tOpenClaw feature\tSemantic search across session history, notes, and memory files\tEnable in openclaw.json (memory.enabled: true)\n\nBefore starting any skill design or update session, verify both are available:\n\n# Check deepwiki\nls ~/.openclaw/skills/deepwiki/deepwiki.sh || ls ~/.openclaw/workspace-*/skills/deepwiki/deepwiki.sh\n\n# Check vector memory (should return results, not empty)\n# Use the memory_search tool with a known topic from recent sessions\n\n\nIf deepwiki is missing, install from liaosvcaf/openclaw-skill-deepwiki. If vector memory returns no results on known topics, check that memory.enabled is true in openclaw.json and that indexing has run.\n\nWhy These Are Non-Negotiable\n\nDeepWiki: OpenClaw APIs are version-specific. Without DeepWiki, skills are written against memory of past behavior — which drifts as OpenClaw updates. DeepWiki grounds skill content in actual source code. A skill engineer without DeepWiki is working blind.\n\nVector memory DB: Session history, Obsidian notes, and past decisions are indexed here. Without it, the agent falls back to manual file search — slower, less accurate, and misses cross-document connections. Critical context from past sessions (installation guides, design decisions, pitfalls) lives in this index.\n\nMemory Search Protocol (MANDATORY)\n\nBefore searching files manually, always query the vector memory database first. It indexes session history, Obsidian notes, and memory files — and finds cross-document connections that manual search misses.\n\nWhen to query vector memory:\n\nUser asks \"do you remember...\", \"find the notes about...\", \"we did X before...\"\nLooking for past installation guides, design decisions, or troubleshooting records\nAny question about prior work, configurations, or lessons learned\n\nHow to query correctly:\n\nmemory_search(\"your query here\", maxResults=5)\n\n\nCritical rule: try multiple queries before giving up.\n\nIf the first query returns empty, do NOT fall back to manual file search immediately. Try at least 3 different phrasings:\n\nFirst query fails\tTry instead\n\"Docker OpenClaw installation\"\t\"dockerized openclaw Titan\"\n\"dockerized openclaw Titan\"\t\"openclaw isolation install guide\"\nStill empty\tThen fall back to manual file search\n\nLesson learned (2026-03-03): When asked to find Docker/OpenClaw installation notes, memory_search returned empty on the first query and the agent immediately switched to manual SQLite/file search. The correct approach was to try different query phrasings — the second attempt (\"dockerized OpenClaw installation Titan setup\") returned 5 relevant results directly from indexed Obsidian notes. Manual file search is a last resort, not a first response.\n\nDeepWiki Staleness Protocol (MANDATORY)\n\nOpenClaw APIs, skill loading behavior, subagent mechanics, and frontmatter fields are version-specific. Information in this skill or any skill referencing OpenClaw internals may be outdated.\n\nALWAYS query DeepWiki when:\n\nDesigning a skill that uses sessions_spawn, tool calls, or OpenClaw-specific APIs\nReferencing skill frontmatter fields or loading precedence\nUpdating an existing skill that has version-tagged sections\nThe installed OpenClaw version differs from any version tag in the skill\nYou are unsure whether an API, field, or behavior still exists\n\nHow to check:\n\n# Check current OpenClaw version\nopenclaw --version\n\n# Query DeepWiki for current behavior\n~/.openclaw/skills/deepwiki/deepwiki.sh ask openclaw/openclaw \"YOUR QUESTION\"\n\n\nDo NOT rely on memory or this skill's documented behavior without verifying when the topic is OpenClaw internals. DeepWiki is grounded in the actual source code. This skill's documentation is not.\n\nVerification checklist before shipping any skill that references OpenClaw internals:\n\n Checked openclaw --version against version tags in the skill\n Queried DeepWiki to confirm API/field behavior is current\n Updated version tags if behavior has changed\nScope & Boundaries\nWhat This Skill Handles\nSkill design: SKILL.md, skill.yml, README.md, tests, scripts, references\nSkill review: quality evaluation, rubric scoring, gap analysis\nSkill testing: self-play validation, trigger testing, functional testing\nSkill maintenance: iteration based on feedback, refactoring\nAgent kit audit: inventory, consistency, quality scoring across all skills\nWhat This Skill Does NOT Handle\nRelease pipeline — publishing, versioning, changelogs belong to release processes\nRepository management — git submodules, repo creation, branch strategy belong to your VCS workflow\nDeployment — installing skills to agents, configuration management\nTracking — progress tracking, task management, project boards\nInfrastructure — MCP servers, API keys, environment setup\nWhere This Skill Ends\n\nThis skill produces validated skill artifacts (SKILL.md, skill.yml, README.md, tests, scripts). Once artifacts pass quality gates, responsibility transfers to whatever system handles publishing and deployment.\n\nSuccess Criteria\n\nA skill development cycle is considered successful when:\n\nQuality gates passed — Reviewer scores ≥28/33 (Deploy threshold)\nNo blocking issues — Tester reports no issues marked as \"blocking\"\nAll artifacts generated — SKILL.md, skill.yml, README.md, tests/, scripts/ (if needed), references/ (if needed)\nOPSEC clean — No hardcoded secrets, paths, org names, or private URLs\nScripts validated — All deterministic validation scripts execute successfully on target platform(s)\nTrigger accuracy — Tester reports ≥90% trigger accuracy (true positives + true negatives)\n\nIf any criterion fails, the skill returns to the Designer for revision.\n\nInputs\n\nWhen invoking this skill, the orchestrator must gather:\n\nInput\tDescription\tRequired\tSource\nProblem description\tWhat capability or workflow needs to be enabled\tYes\tUser conversation\nTarget audience\tWhich agent(s) will use this skill\tYes\tUser or inferred\nExpected interactions\tWith users, APIs, files, MCP servers, other skills\tYes\tRequirements discussion\nInputs/Outputs\tWhat data the skill receives and produces\tYes\tRequirements discussion\nConstraints\tPerformance limits, security requirements, dependencies\tNo\tUser or system\nPrior feedback\tReview or test reports from previous iterations\tNo\tPrevious Reviewer/Tester\nExisting artifacts\tIf refactoring/maintaining an existing skill\tNo\tFile system\n\nExample requirements gathering:\n\nUser: \"I need a skill for analyzing competitor websites\"\n\nOrchestrator gathers:\n- Problem: Automate competitor analysis with structured output\n- Audience: research-agent\n- Interactions: web_fetch, browser tool, writes markdown reports\n- Inputs: competitor URLs, analysis criteria\n- Outputs: comparison table, insights markdown\n- Constraints: must complete in <60s per site\n\n\nThese inputs are then passed to the Designer to begin the design process.\n\nArchitecture Overview\n\nThe skill-engineer uses a three-role iterative architecture. The orchestrator spawns subagents for each role and never does creative or evaluation work directly.\n\nPattern Selection (IMPORTANT)\n\nTwo architecture modes are available. Choose based on complexity:\n\nMode A: Director-Controlled (simple/short skill work) Use when: ≤2 phases, <10 minutes total, user interaction needed between phases (e.g., quick fixes, single-skill reviews).\n\nDirector/Orchestrator (main agent, depth 0)\n    ├─ Spawn ──→ Designer (depth 1)\n    ├─ Spawn ──→ Reviewer (depth 1)\n    └─ Spawn ──→ Tester (depth 1)\n\n\nRisk: announce-to-action gap — if user sends a message while waiting for a subagent, the main agent may handle that instead of chaining the next phase. Mitigate with cron safety net (see below).\n\nMode B: Orchestrator Subagent Pattern (complex/long skill work) Use when: 3+ phases, >10 minutes, pipeline must not stall, parallel workers needed.\n\nDirector (user-facing, depth 0)\n    └── Orchestrator (pipeline owner, depth 1)\n        ├─ Spawn ──→ Designer (depth 2)\n        ├─ Spawn ──→ Reviewer (depth 2)\n        └─ Spawn ──→ Tester (depth 2)\n\n\nThe Director spawns a single Orchestrator subagent with the full task description. The Orchestrator owns the entire Design→Review→Test loop without yielding control between phases. User messages go to the Director; the pipeline runs uninterrupted.\n\nRequired config for Mode B:\n\n{\n  \"agents\": { \"defaults\": { \"subagents\": { \"maxSpawnDepth\": 2 } } }\n}\n\n\nWhy Mode B is superior for complex work:\n\nNo announce-to-action gap (orchestrator chains phases immediately within the same session)\nImmune to user interruption between phases\nPersistent pipeline state without re-deriving from files each turn\n\nReference: orchestrator-subagent-pattern-2026-02-28.md (Obsidian notes) — documented after a real 70-minute pipeline stall incident.\n\nMode A Safety Net (cron)\n\nWhen using Mode A, set a cron safety net after each spawn to catch announce-to-action failures:\n\n\"Check if [designer/reviewer/tester] subagent has completed. If so and next phase not started, resume pipeline.\"\n(fires 15 min after spawn)\n\nIteration Loop\nDesigner → Reviewer ──pass──→ Tester ──pass──→ Ship\n              │                  │\n              fail               fail\n              │                  │\n              ▼                  ▼\n         Designer revises   Designer revises\n              │                  │\n              ▼                  ▼\n           Reviewer          Reviewer + Tester\n              │\n           (max 3 iterations, then fail)\n\n\nExit conditions:\n\nShip: Reviewer scores ≥ 28/33 (85%+) AND Tester reports no blocking issues\nRevise: Reviewer or Tester found fixable issues (iterate)\nFail: 3 iterations exhausted and still below quality bar\nIteration Failure Path\n\nAfter 3 failed iterations, the orchestrator must:\n\nStop iteration — do not continue trying\nReport failure to user with:\nSummary: \"Skill development failed after 3 iterations\"\nAll 3 iteration reports (Reviewer + Tester feedback)\nFinal quality score\nList of unresolved blocking issues\nPresent options to user:\nProvide more context or clarify requirements (restart with better inputs)\nSimplify scope (reduce skill complexity and retry)\nAbandon this skill (requirements may be infeasible)\nDo NOT silently fail — always report to user and await decision\n\nNever: Continue past 3 iterations or ship a skill that hasn't passed quality gates.\n\nSubagent Spawning Mechanism\n\nVersion note: Verified against OpenClaw v2026.2.26. API may change.\n\nIn OpenClaw, subagents are spawned using the sessions_spawn tool (not a CLI command). Subagents run in isolated sessions, announce results back to the requester's channel when complete, and are auto-archived after 60 minutes by default.\n\nKey constraints on subagents:\n\nDefault max spawn depth is 1 (subagents cannot spawn further subagents unless maxSpawnDepth: 2 is configured)\nDefault max 5 active children per agent at once\nSubagents do NOT receive SOUL.md, IDENTITY.md, or USER.md — only AGENTS.md and TOOLS.md\nUse runTimeoutSeconds to prevent hanging (900s for Designer, 600s for Reviewer/Tester)\nResults are announced back automatically; reply ANNOUNCE_SKIP to suppress\nDirector vs. Orchestrator Roles\n\nThis is the most important architectural decision. Understand it before proceeding.\n\nThe Problem with Naive Single-Agent Control\n\nThe natural instinct is to have the main agent (you) directly manage the Design→Review→Test loop:\n\nMain agent\n    ├── spawns Designer → waits for announce → spawns Reviewer → waits → spawns Tester\n\n\nThis breaks in three ways:\n\nAnnounce-to-action gap: When a subagent finishes, OpenClaw sends a completion announce that triggers a new LLM turn. The LLM may report results to the user and stop — treating the announce as informational rather than a pipeline trigger. There is no mechanism that forces the next action.\n\nContext loss: Each new turn is a fresh LLM call. Between subagent completion and the next turn, there is no persistent state machine tracking \"we're in iteration 2, reviewer passed, now run Tester.\" The agent must re-derive this from files every time — fragile over 3+ iterations.\n\nUser message interruption: If the user sends a message while the pipeline is between phases, the main agent handles that message instead of continuing. The pipeline stalls silently until the user notices.\n\nReal incident: A book-writer pipeline stalled for 70 minutes because a research subagent completed and announced back, but the Director reported results to the user and stopped — never spawning the writing phase. (2026-02-28)\n\nThe Solution: One Level of Indirection\n\nAdd an intermediate Orchestrator subagent that owns the pipeline. The main agent becomes the Director — it talks to the user. The Orchestrator does the pipeline work. They don't share context.\n\nDirector (main agent, depth 0)  ←→  User\n    │\n    └── Orchestrator (subagent, depth 1) — owns Design→Review→Test loop\n        ├── Designer (depth 2)\n        ├── Reviewer (depth 2)\n        └── Tester (depth 2)\n\n\nWhy this works:\n\nThe Orchestrator runs as a single continuous session. It processes each subagent's completion announce immediately — no turn boundary between phases, no gap.\nUser messages go to the Director (depth 0), not the Orchestrator. The pipeline cannot be interrupted by user activity.\nThe Orchestrator maintains full pipeline state throughout its run without re-deriving from files.\n\nRequired config (add to openclaw.json before using this pattern):\n\n{\n  \"agents\": { \"defaults\": { \"subagents\": { \"maxSpawnDepth\": 2 } } }\n}\n\nWhen to Use Each Mode\nSituation\tUse\tWhy\nQuick fix, single skill review, <10 min\tDirector-only (depth 1 subagents)\tSimpler, fewer spawns\nFull design cycle (Design+Review+Test)\tDirector + Orchestrator (depth 2)\tPipeline cannot afford to stall\nAny pipeline with 3+ sequential phases\tDirector + Orchestrator (depth 2)\tAnnounce-to-action gap becomes critical\nmaxSpawnDepth not set to 2\tDirector-only with cron safety net\tNo choice — see fallback below\nFallback: Director-Only with Cron Safety Net\n\nIf maxSpawnDepth: 2 is not configured, use Director-only mode but add a cron safety net after each subagent spawn:\n\nAfter spawning Designer, register a cron job:\n\"Check if Designer has completed (look for output at /path/to/skill/SKILL.md).\n If completed and Reviewer not yet started, spawn Reviewer now.\"\n(fires 15 minutes after spawn)\n\n\nThis mitigates but does not eliminate the announce-to-action gap.\n\nDirector Responsibilities\n\nThe Director (main agent) talks to the user and kicks off the pipeline. It does NOT do design, review, or testing work.\n\nGather requirements from the user (problem, audience, inputs/outputs, interactions)\nQuery DeepWiki — if the skill touches any OpenClaw internals, query DeepWiki FIRST:\n~/.openclaw/skills/deepwiki/deepwiki.sh ask openclaw/openclaw \"RELEVANT QUESTION\"\n\nChoose mode — Director-only (simple) or Director+Orchestrator (full cycle)\nFor Director+Orchestrator mode: Spawn a single Orchestrator subagent with complete task description including: requirements, DeepWiki findings, artifact output path, quality rubric location, max iterations\nFor Director-only mode: Execute Orchestrator Responsibilities directly (see below)\nRelay final result to user when pipeline completes\nOrchestrator Responsibilities\n\nThe Orchestrator (depth-1 subagent in Mode B, or main agent in fallback mode) owns the Design→Review→Test loop. It does NOT write skill content or evaluate quality — it only coordinates.\n\nQuery DeepWiki for any OpenClaw-specific topics in the requirements (if Director hasn't already)\nSpawn Designer with requirements, DeepWiki findings, and any prior feedback\nsessions_spawn(\n  task=\"Act as Designer. Requirements: [...]. Write artifacts to /path/to/skill/\",\n  label=\"skill-v1-designer\",\n  runTimeoutSeconds=900\n)\n\nCollect Designer output — verify all required files exist at output path\nSpawn Reviewer with artifacts and quality rubric\nsessions_spawn(\n  task=\"Act as Reviewer. Evaluate skill at /path/to/skill/ using rubric: [...]. Score all 33 checks.\",\n  label=\"skill-v1-reviewer\",\n  runTimeoutSeconds=600\n)\n\nCollect Reviewer feedback (scores + structured issues)\nIf score <28/33 or blocking issues: feed feedback back to Designer → go to step 2, increment iteration count\nIf passing review: Spawn Tester\nsessions_spawn(\n  task=\"Act as Tester. Run self-play on skill at /path/to/skill/. Test triggers, functional steps, edge cases.\",\n  label=\"skill-v1-tester\",\n  runTimeoutSeconds=600\n)\n\nCollect Tester results (pass/fail + report)\nIf blocking issues: feed test results back to Designer → go to step 2\nIf all pass: add quality scorecard to README.md → announce completion to Director\nTrack iteration count — after 3 failed iterations, report failure with all iteration logs\nFinal Review Scores in README\n\nEvery shipped skill must include a quality scorecard in its README.md. This is the Reviewer's final scores, added by the Orchestrator before delivery:\n\n## Quality Scorecard\n\n| Category | Score | Details |\n|----------|-------|---------|\n| Completeness (SQ-A) | 7/7 | All checks pass |\n| Clarity (SQ-B) | 4/5 | Minor ambiguity in edge case handling |\n| Balance (SQ-C) | 4/4 | AI/script split appropriate |\n| Integration (SQ-D) | 4/4 | Compatible with standard agent kit |\n| Scope (SCOPE) | 3/3 | Clean boundaries, no leaks |\n| OPSEC | 2/2 | No violations |\n| References (REF) | 3/3 | All sources cited |\n| Architecture (ARCH) | 2/2 | Separation of concerns maintained |\n| **Total** | **29/30** | |\n\n*Scored by skill-engineer Reviewer (iteration 2)*\n\n\nThis scorecard serves as a quality certificate. Users can assess skill quality before installing.\n\nVersion Control\n\nThe orchestrator manages git commits throughout the workflow:\n\nWhen to commit:\n\nAfter Designer produces initial artifacts (iteration 1): git add . && git commit -m \"feat: initial design for <skill-name>\"\nAfter Designer revisions (iteration 2+): git add . && git commit -m \"fix: address review issues (iteration N)\"\nAfter Tester passes and before ship: git add README.md && git commit -m \"docs: add quality scorecard for <skill-name>\"\n\nWhen to push:\n\nAfter final ship (all gates passed): git push origin main\nDo NOT push intermediate iterations — only ship-ready artifacts\n\nBranch strategy:\n\nWork in main branch for routine skill development\nUse feature branches for experimental or breaking changes\nError Handling\n\nThe orchestrator must handle technical failures gracefully:\n\nFailure Type\tDetection\tResponse\nGit push fails\tExit code ≠ 0\tRetry once. If fails again, report to user: \"Cannot push to remote. Check network/permissions.\"\nOPSEC scan script missing\tFile not found\tSkip OPSEC automated check, but flag in review: \"Manual OPSEC review required — script not found.\"\nFile write errors\tPermission denied\tReport: \"Cannot write to [path]. Check file permissions.\" Fail workflow.\nSubagent crashes\tTimeout or error\tLog the error, attempt retry once. If fails again, report: \"Subagent failed. Manual intervention required.\"\nReview score = 0\tAll checks fail\tReport: \"Skill failed all quality checks. Requirements may be unclear or skill design is fundamentally flawed. Recommend starting over.\"\n\nRetry logic:\n\nGit operations: 1 retry after 5s delay\nFile operations: 1 retry after 2s delay\nSubagent spawns: 1 retry with fresh context\n\nFail-fast rules:\n\nIf iteration count exceeds 3, fail immediately (no further retries)\nIf OPSEC violations found, fail immediately (no iteration)\nIf required files cannot be written, fail immediately\nPerformance Notes\n\nOrchestrator workload: Coordinating Designer/Reviewer/Tester across 1-3 iterations can be complex, especially for large skills (1000+ lines). The orchestrator manages:\n\nRequirements gathering\nSubagent coordination (3-9 spawns in typical workflow)\nFeedback routing between roles\nIteration tracking\nFinal scorecard assembly\nGit operations\n\nToken considerations: A full 3-iteration cycle can consume 50k-150k tokens depending on skill complexity. For extremely complex skills, consider:\n\nBreaking into sub-skills (each with simpler scope)\nUsing separate agent sessions (Option 2 spawning) to isolate token contexts\nSimplifying requirements before starting iteration\n\nIf orchestrator feels overwhelmed: This is a signal that the skill being designed may be too complex. Revisit the scope definition and consider decomposition.\n\nSpawning Context\n\nEach subagent receives only what it needs:\n\nRole\tReceives\tDoes NOT Receive\nDesigner\tRequirements, prior feedback (if any), design principles\tReviewer rubric internals\nReviewer\tSkill artifacts, quality rubric, scope boundaries\tRequirements discussion\nTester\tSkill artifacts, test protocol\tReview scores\nDesigner Role\n\nPurpose: Generate and revise skill content.\n\nFor complete Designer instructions, see: references/designer-guide.md\n\nQuick Reference\n\nInputs: Requirements, design principles, feedback (on iterations 2+)\n\nOutputs: SKILL.md, skill.yml, README.md, tests/, scripts/, references/\n\nKey constraints:\n\nApply progressive disclosure (frontmatter → body → linked files)\nApply scoping rules (explicit boundaries, no scope creep)\nApply tool selection guardrails (validate before execution)\nREADME for strangers only (no internal org details)\nFollow AI vs. Script decision framework\n\nDesign principles:\n\nProgressive disclosure (3-level system)\nComposability (works alongside other skills)\nPortability (same skill works across Claude.ai, Claude Code, API)\nReviewer Role\n\nPurpose: Independent quality evaluation. The Reviewer has never seen the requirements discussion — it evaluates artifacts on their own merits.\n\nFor complete Reviewer rubric and scoring guide, see: references/reviewer-rubric.md\n\nQuick Reference\n\nInputs: Skill artifacts, quality rubric, scope boundaries\n\nOutputs: Review report with scores, verdict (PASS/REVISE/FAIL), issues, strengths\n\nQuality rubric (33 checks total):\n\nSQ-A: Completeness (8 checks)\nSQ-B: Clarity (5 checks)\nSQ-C: Balance (5 checks)\nSQ-D: Integration (5 checks)\nSCOPE: Boundaries (3 checks)\nOPSEC: Security (2 checks)\nREF: References (3 checks)\nARCH: Architecture (2 checks)\n\nScoring thresholds:\n\n28-33 pass → Deploy (PASS verdict)\n20-27 pass → Revise (fixable issues)\n10-19 pass → Redesign (major rework)\n0-9 pass → Reject (fundamentally flawed)\n\nPre-review: Run deterministic validation scripts before manual evaluation\n\nTester Role\n\nPurpose: Empirical validation via self-play. The Tester loads the skill and attempts realistic tasks.\n\nFor complete Tester protocol, see: references/tester-protocol.md\n\nQuick Reference\n\nInputs: Skill artifacts, test protocol\n\nOutputs: Test report with trigger accuracy, functional test results, edge cases, blocking/non-blocking issues, verdict (PASS/FAIL)\n\nTest protocol:\n\nTrigger tests — verify skill loads correctly (≥90% accuracy threshold)\nFunctional tests — execute 2-3 realistic tasks, note confusion points\nEdge case tests — missing inputs, ambiguous requirements, boundary cases\n\nIssue classification:\n\nBlocking: Prevents skill from functioning (must fix before ship)\nNon-blocking: Impacts quality but doesn't break core functionality\n\nPass criteria: No blocking issues + ≥90% trigger accuracy\n\nSeparation of Concerns Rule\n\nThe agent that DESIGNS a skill must NOT be the same agent that AUDITS it in the same session.\n\nThis is a hard architectural rule, not a guideline. When the same agent designs and audits in one session, it creates structural circularity: the designer unconsciously frames evaluation in terms of their own intentions, missing gaps that a fresh reader would catch.\n\nEnforcement:\n\nAll audit work (Reviewer role, Tester role) MUST be performed by a fresh subagent spawned after design is complete.\nUse openclaw agent --session-id <unique-id> (Option 2 spawning) when auditing a skill the current session has designed.\nThe orchestrator may never evaluate its own spawned Designer's output directly — it must route all evaluation through an independent Reviewer subagent.\nIn role-based execution (Option 1), the agent must explicitly transition: complete all Designer work, then start the Reviewer role with no reference to design-time reasoning.\n\nWhy this matters:\n\nA designer who audits their own work will score it against their intentions, not against what a new agent will actually experience.\nThe rubric (SQ-C3) explicitly prohibits a sub-agent from being both producer AND evaluator of the same output.\nThis rule is the implementation of that check at the session level.\n\nExample — correct:\n\n# Session A: Designer work\nsessions_spawn(\n  task=\"Design a skill for X. Write artifacts to /path/to/skill/\",\n  label=\"skill-v1-designer\",\n  runTimeoutSeconds=900\n)\n\n# Session B: Audit (fresh session, no context from Session A)\nsessions_spawn(\n  task=\"Audit the skill at /path/to/skill/ using the reviewer rubric.\",\n  label=\"skill-v1-auditor\",\n  runTimeoutSeconds=600\n)\n\n\nExample — incorrect:\n\n[Session A]\n1. Design the skill...\n2. Now let me review the skill I just designed...  ← VIOLATION\n\nEvals Framework\n\nSource: Anthropic \"Improving skill-creator\" (2026-03-03). Adapted for OpenClaw skill-engineer.\n\nEvals turn \"seems to work\" into \"verified to work.\" Every shipped skill should have persistent evals that can be re-run after model updates, skill edits, or environment changes.\n\nEval Structure\n\nAn eval consists of:\n\nTest prompt — a realistic user input that should trigger the skill\nExpected behavior description — what \"good\" looks like (natural language, not exact match)\nPass/fail criteria — specific, observable conditions\n\nStore evals in the skill's tests/ directory:\n\ntests/\n├── evals.json           # Eval definitions\n├── benchmarks/          # Benchmark run results (timestamped)\n└── comparisons/         # A/B comparison results\n\nEval Types\nType\tPurpose\tWhen to run\nRegression eval\tCatch quality drops after changes\tAfter every skill edit or model update\nCapability eval\tDetect if base model has outgrown the skill\tMonthly, or after major model releases\nTrigger eval\tVerify skill fires correctly\tAfter description changes\nBenchmark Mode\n\nRun standardized assessments tracking:\n\nEval pass rate — what percentage of evals pass\nElapsed time — how long each eval takes\nToken usage — cost per eval run\n\nStore benchmark results with timestamps for trend tracking:\n\n{\n  \"timestamp\": \"2026-03-04T12:00:00Z\",\n  \"skill\": \"my-skill\",\n  \"model\": \"claude-sonnet-4-5\",\n  \"pass_rate\": 0.85,\n  \"avg_time_s\": 12.3,\n  \"avg_tokens\": 4200,\n  \"evals_run\": 10\n}\n\nComparator Testing (A/B Blind Test)\n\nCompare two skill versions — or skill vs. no skill — using a blind judge:\n\nRun the same test prompt through Version A and Version B\nA Comparator subagent (fresh context, no knowledge of which is which) evaluates both outputs\nThe Comparator scores on relevant dimensions without knowing the source\n\nWhen to use:\n\nBefore shipping a major revision (old vs. new)\nTo justify a skill's existence (with-skill vs. without-skill)\nTo compare two alternative approaches during design\n\nSpawning a Comparator:\n\nsessions_spawn(\n  task=\"You are a blind comparator. You will receive Output A and Output B for the same task. Score each on [dimensions]. You do NOT know which version produced which output. Be objective.\",\n  label=\"skill-comparator\",\n  runTimeoutSeconds=300\n)\n\nDescription Tuning\n\nSkill descriptions determine trigger accuracy. As skill count grows, description precision becomes critical:\n\nToo broad → false triggers (skill loads when irrelevant)\nToo narrow → misses (skill doesn't load when needed)\n\nTuning protocol:\n\nCollect 10-20 sample prompts (mix of should-trigger and should-not-trigger)\nRun each prompt and check whether the skill triggers correctly\nAnalyze false positives and false negatives\nRevise the description field to be more precise\nRe-run trigger tests to verify improvement\n\nTarget: ≥90% trigger accuracy on sample prompts. Anthropic's internal testing improved 5 out of 6 public skills using this method.\n\nSkill Retirement Protocol\n\nSkills are not forever. Capability uplift skills may become unnecessary as models improve.\n\nRetirement signal: Base model passes ≥80% of the skill's evals without the skill loaded.\n\nRetirement process:\n\nRun capability evals with skill disabled\nIf pass rate ≥80%, flag skill as \"retirement candidate\"\nRun comparator test (with-skill vs. without-skill) to confirm\nIf comparator shows no significant quality difference, retire the skill\nArchive (don't delete) — the skill may become relevant again with different models\n\nTrack in audit reports:\n\n## Retirement Candidates\n\n| Skill | Capability Eval (no skill) | Comparator Result | Recommendation |\n|-------|---------------------------|-------------------|----------------|\n| pdf-creator | 85% pass | No significant difference | Retire |\n\nAgent Kit Audit Protocol\n\nPeriodic full audit of the agent kit:\n\nInventory all skills — list every SKILL.md with owner agent\nCheck for orphans — skills that no agent uses\nCheck for duplicates — overlapping functionality\nCheck for gaps — workflow steps that have no skill\nCheck balance — are some agents overloaded while others idle?\nCheck consistency — naming conventions, output formats\nRun quality score on each skill (SQ-A through SQ-D)\nProduce audit report with scores and recommendations\nAudit Output Template\n# Agent Kit Audit Report\n\n**Date:** [date]\n**Skills audited:** [count]\n\n## Skill Inventory\n\n| # | Skill | Agent | Quality Score | Status |\n|---|-------|-------|--------------|--------|\n| 1 | [name] | [agent] | X/33 | Deploy/Revise/Redesign |\n\n## Issues Found\n1. ...\n\n## Recommendations\n1. ...\n\n## Action Items\n| # | Action | Priority | Owner |\n|---|--------|----------|-------|\n\nSkill Interaction Map\n\nMaintain a map of how skills interact:\n\norchestrator-agent (coordinates workflow)\n    ├── content-creator (writes content)\n    │   └── consumes: research outputs, review feedback\n    ├── content-reviewer (reviews content)\n    │   └── produces: review reports\n    ├── research-analyst (researches topics)\n    │   └── produces: research consumed by content-creator\n    ├── validator (validates outputs)\n    └── skill-engineer (this skill — meta)\n        └── consumes: all skills for audit\n\n\nAdapt this to your specific agent architecture.\n\nOpenClaw Skill System Reference\n\nVersion note: This section is based on OpenClaw v2026.2.26. Skill system behavior (frontmatter fields, loading precedence, subagent APIs) may change across versions. Verify against source or DeepWiki when upgrading.\n\nSkill Structure\n\nA skill is a directory containing at minimum a SKILL.md file:\n\nmy-skill/\n├── SKILL.md          # Required: frontmatter + instructions\n├── skill.yml         # Optional: ClawhHub publish metadata\n├── README.md         # Optional: human-facing documentation\n├── scripts/          # Optional: deterministic helper scripts\n├── tests/            # Optional: test cases and fixtures\n└── references/       # Optional: detailed linked documentation\n\nSKILL.md Frontmatter Format\n\nRequired fields:\n\n---\nname: skill-name          # kebab-case, no spaces/capitals/underscores\ndescription: |            # What it does + when to use it + trigger phrases\n  [What it does]. Use when user [trigger phrases]. [Key capabilities].\n---\n\n\nFull supported fields:\n\n---\nname: skill-name\ndescription: ...\nhomepage: https://...                    # URL for Skills UI\nuser-invocable: true                     # Expose as slash command (default: true)\ndisable-model-invocation: false          # Exclude from model prompt (default: false)\ncommand-dispatch: tool                   # Bypass model, dispatch to tool directly\ncommand-tool: tool-name                  # Tool to invoke when command-dispatch is set\ncommand-arg-mode: raw                    # Argument forwarding mode (default: raw)\nmetadata: {\"openclaw\": {\"always\": true, \"emoji\": \"🔧\", \"os\": [\"darwin\",\"linux\"], \"requires\": {\"bins\": [\"curl\",\"python3\"]}, \"primaryEnv\": \"MY_API_KEY\"}}\n---\n\n\nmetadata.openclaw load-time gates:\n\nField\tPurpose\nalways: true\tAlways include, skip all other gates\nemoji\tEmoji shown in macOS Skills UI\nos\tLimit to platforms: darwin, linux, win32\nrequires.bins\tAll binaries must exist on PATH\nrequires.anyBins\tAt least one binary must exist\nrequires.env\tEnvironment variables must exist\nrequires.config\topenclaw.json paths must be truthy\nprimaryEnv\tLinks to skills.entries.<name>.apiKey in config\nSkill Loading Precedence\n\nSkills are loaded from these locations (highest → lowest priority):\n\n<workspace>/skills/ — agent-specific, highest precedence\n~/.openclaw/skills/ — shared across all agents on machine\nskills.load.extraDirs in openclaw.json — additional directories\nBundled skills — shipped with OpenClaw, lowest precedence\nPlugin skills — listed in openclaw.plugin.json\nWhen to Use Each Location\nLocation\tUse when\n<workspace>/skills/\tSkill is specific to one agent's role; under active development\n~/.openclaw/skills/\tSkill should be available to all agents on this machine\nHow Skills Are Triggered\n\nOpenClaw builds a system prompt with a compact XML list of available skills (name, description, location). The model reads this list and decides which skills to load. Skills are NOT auto-injected — the model must explicitly read the SKILL.md when needed.\n\nTrigger accuracy goal: ≥90% (skill loads when relevant, does NOT load when irrelevant).\n\nSkill Discovery Command\n\nTo inventory all skills on a machine:\n\nfind ~/.openclaw/ -name \"SKILL.md\" -not -path \"*/node_modules/*\" | sort\n\nConfiguration\n\nNo persistent configuration required. The skill uses tools available in the agent's environment.\n\nRequirement\tDescription\ndeepwiki skill\tQuery OpenClaw source for current API behavior (liaosvcaf/openclaw-skill-deepwiki)\nVector memory\tSemantic search across session history (memory.enabled: true in openclaw.json)\ngh CLI\tGitHub repo creation and visibility changes for release pipeline\nclawhub CLI\tPublish skills to ClawhHub registry (npm i -g clawhub)\n\nSee references/designer-guide.md for full environment setup."
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/chunhualiao/skill-engineer",
    "publisherUrl": "https://clawhub.ai/chunhualiao/skill-engineer",
    "owner": "chunhualiao",
    "version": "3.2.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/skill-engineer",
    "downloadUrl": "https://openagent3.xyz/downloads/skill-engineer",
    "agentUrl": "https://openagent3.xyz/skills/skill-engineer/agent",
    "manifestUrl": "https://openagent3.xyz/skills/skill-engineer/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/skill-engineer/agent.md"
  }
}