{
  "schemaVersion": "1.0",
  "item": {
    "slug": "forge",
    "name": "Forge",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/ikennaokpala/forge",
    "canonicalUrl": "https://clawhub.ai/ikennaokpala/forge",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/forge",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=forge",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "CHANGELOG.md",
      "README.md",
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "slug": "forge",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-03T10:30:58.010Z",
      "expiresAt": "2026-05-10T10:30:58.010Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=forge",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=forge",
        "contentDisposition": "attachment; filename=\"forge-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "forge"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/forge"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/forge",
    "agentPageUrl": "https://openagent3.xyz/skills/forge/agent",
    "manifestUrl": "https://openagent3.xyz/skills/forge/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/forge/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Forge — Autonomous Quality Engineering Swarm",
        "body": "Quality forged in, not bolted on.\n\nForge is a self-learning, autonomous quality engineering swarm that unifies three approaches into one:\n\nPillarSourceWhat It DoesBuildDDD+ADR+TDD methodologyStructured development with quality gates, defect prediction, confidence-tiered fixesVerifyBDD/Gherkin behavioral specsContinuous behavioral verification — the PRODUCT works, not just the CODEHealAutonomous E2E fix loopTest → Analyze → Fix → Commit → Learn → Repeat\n\n\"DONE DONE\" means: the code compiles AND the product behaves as specified. Every Gherkin scenario passes. Every quality gate clears. Every dependency graph is satisfied."
      },
      {
        "title": "ARCHITECTURE ADAPTABILITY",
        "body": "Forge adapts to any project architecture. Before first run, it discovers your project structure:"
      },
      {
        "title": "Supported Architectures",
        "body": "ArchitectureHow Forge AdaptsMonolithSingle backend process, all contexts in one codebase. Forge runs all tests against one server.Modular MonolithSingle deployment with bounded contexts as modules. Forge discovers modules and tests each context independently.MicroservicesMultiple services. Forge discovers service endpoints, tests each service, validates inter-service contracts.MonorepoMultiple apps/packages in one repo. Forge detects workspace structure (Turborepo, Nx, Lerna, Melos, Cargo workspace).Mobile + BackendFrontend app with backend API. Forge starts backend, then runs E2E tests against it.Full-Stack MonolithFrontend and backend in same deployment. Forge tests through the UI layer against real backend."
      },
      {
        "title": "Project Discovery",
        "body": "On first invocation, Forge analyzes the project to build a context map:\n\n# Forge automatically discovers:\n# 1. Backend technology (Rust/Cargo, Node/npm, Python/pip, Go, Java/Maven/Gradle, .NET)\n# 2. Frontend technology (Flutter, React, Next.js, Vue, Angular, SwiftUI, Kotlin/Compose)\n# 3. Test framework (integration_test, Jest, Pytest, Go test, JUnit, xUnit)\n# 4. Project structure (monorepo layout, service boundaries, module boundaries)\n# 5. API protocol (REST, GraphQL, gRPC, WebSocket)\n# 6. Build system (Make, npm scripts, Gradle tasks, Cargo features)\n\nForge stores the discovered project map:\n\n{\n  \"architecture\": \"mobile-backend\",\n  \"backend\": {\n    \"technology\": \"rust\",\n    \"buildCommand\": \"cargo build --release --features test-endpoints\",\n    \"runCommand\": \"cargo run --release --features test-endpoints\",\n    \"healthEndpoint\": \"/health\",\n    \"port\": 8080,\n    \"migrationCommand\": \"cargo sqlx migrate run\"\n  },\n  \"frontend\": {\n    \"technology\": \"flutter\",\n    \"testCommand\": \"flutter drive --driver=test_driver/integration_test.dart --target={target}\",\n    \"testDir\": \"integration_test/e2e/\",\n    \"specDir\": \"integration_test/e2e/specs/\"\n  },\n  \"contexts\": [\"identity\", \"rides\", \"payments\", \"...\"],\n  \"testDataSeeding\": {\n    \"method\": \"api\",\n    \"endpoint\": \"/api/v1/test/seed\",\n    \"authHeader\": \"X-Test-Key\"\n  }\n}"
      },
      {
        "title": "Configuration Override",
        "body": "Projects can provide a forge.config.yaml at the repo root to override auto-discovery:\n\n# forge.config.yaml (optional — Forge auto-discovers if absent)\narchitecture: microservices\nbackend:\n  services:\n    - name: auth-service\n      port: 8081\n      healthEndpoint: /health\n      buildCommand: npm run build\n      runCommand: npm start\n    - name: payment-service\n      port: 8082\n      healthEndpoint: /health\n      buildCommand: npm run build\n      runCommand: npm start\nfrontend:\n  technology: react\n  testCommand: npx cypress run --spec {target}\n  testDir: cypress/e2e/\n  specDir: cypress/e2e/specs/\ncontexts:\n  - name: identity\n    testFile: auth.cy.ts\n    specFile: identity.feature\n  - name: payments\n    testFile: payments.cy.ts\n    specFile: payments.feature\ndependencies:\n  identity:\n    blocks: [payments, orders]\n  payments:\n    depends_on: [identity]\n    blocks: [orders]"
      },
      {
        "title": "CRITICAL: NO MOCKING OR STUBBING ALLOWED",
        "body": "ABSOLUTE RULE: This skill NEVER uses mocking or stubbing of the backend API.\n\nALL tests run against the REAL backend API\nNO mocking frameworks for API calls (no mockito, wiremock, MockClient, nock, msw, httpretty, etc.)\nNO stubbed responses or fake data from API endpoints\nThe backend MUST be running and healthy before any tests execute\nTest data is seeded through REAL API calls, not mocked state\n\nWhy No Mocking:\n\nMocks hide real integration bugs\nMocks create false confidence\nMocks don't test the actual data flow\nReal API tests catch serialization, validation, and timing issues"
      },
      {
        "title": "PHASE 0: BACKEND SETUP (MANDATORY FIRST STEP)",
        "body": "BEFORE ANY TESTING, the backend MUST be built, compiled, and running.\n\nThis is the FIRST thing the skill does — no exceptions."
      },
      {
        "title": "Step 1: Check and Start Backend",
        "body": "# 1. Read project config or auto-discover backend settings\n# 2. Check if backend is already running\ncurl -s http://localhost:${BACKEND_PORT}/${HEALTH_ENDPOINT} || {\n  echo \"Backend not running. Starting...\"\n\n  # 3. Navigate to backend directory\n  cd ${BACKEND_DIR}\n\n  # 4. Ensure environment is configured\n  cp .env.example .env 2>/dev/null || true\n\n  # 5. Build the backend\n  ${BUILD_COMMAND}\n\n  # 6. Run database migrations (if applicable)\n  ${MIGRATION_COMMAND}\n\n  # 7. Start backend (background)\n  nohup ${RUN_COMMAND} > backend.log 2>&1 &\n  echo $! > backend.pid\n\n  # 8. Wait for backend to be healthy (up to 60 seconds)\n  for i in {1..60}; do\n    if curl -s http://localhost:${BACKEND_PORT}/${HEALTH_ENDPOINT} | grep -q \"ok\\|healthy\\|UP\"; then\n      echo \"Backend healthy on port ${BACKEND_PORT}\"\n      break\n    fi\n    sleep 1\n  done\n}"
      },
      {
        "title": "Step 2: Verify Backend Health",
        "body": "# Verify critical endpoints are responding\ncurl -s http://localhost:${BACKEND_PORT}/${HEALTH_ENDPOINT} | jq .\n\n# Verify test fixtures endpoint (for seeding)\ncurl -s -H \"${TEST_AUTH_HEADER}\" http://localhost:${BACKEND_PORT}/${TEST_STATUS_ENDPOINT} | jq ."
      },
      {
        "title": "Step 3: Contract Validation",
        "body": "# Verify API spec matches running API (if OpenAPI/Swagger available)\ncurl -s http://localhost:${BACKEND_PORT}/${OPENAPI_ENDPOINT} > /tmp/live-spec.json\n\n# Store contract snapshot for regression detection\nnpx @claude-flow/cli@latest memory store \\\n  --key \"contract-snapshot-$(date +%s)\" \\\n  --value \"$(cat /tmp/live-spec.json | head -c 5000)\" \\\n  --namespace forge-contracts"
      },
      {
        "title": "Step 4: Seed Test Data (Real API Calls)",
        "body": "# Seed test data through REAL API — adapt to your project's seeding endpoint\ncurl -X POST http://localhost:${BACKEND_PORT}/${SEED_ENDPOINT} \\\n  -H \"Content-Type: application/json\" \\\n  -H \"${TEST_AUTH_HEADER}\" \\\n  -d '${SEED_PAYLOAD}'"
      },
      {
        "title": "PHASE 1: BEHAVIORAL SPECIFICATION & ARCHITECTURE RECORDS",
        "body": "Before testing, verify Gherkin specs and architecture decision records exist for the target bounded context.\n\nBehavioral specifications define WHAT the product does from the user's perspective. Every test traces back to a Gherkin scenario. If tests pass but specs fail, the product is broken."
      },
      {
        "title": "Spec Location",
        "body": "Gherkin specs are stored alongside tests:\n\n${SPEC_DIR}/\n├── [context-a].feature\n├── [context-b].feature\n├── [context-c].feature\n└── ...\n\nThe exact location depends on your project's test structure. Forge auto-discovers this from the project map."
      },
      {
        "title": "Spec-to-Test Mapping",
        "body": "Each Gherkin Scenario maps to exactly one test function. The mapping is tracked:\n\nFeature: [Context Name]\n  As a [user role]\n  I want to [action]\n  So that [outcome]\n\n  Scenario: [Descriptive scenario name]\n    Given [precondition]\n    When [action]\n    Then [expected result]\n    And [additional verification]"
      },
      {
        "title": "Missing Spec Generation",
        "body": "If specs are missing for a target context, the Specification Verifier agent creates them:\n\nRead the screen/component/route implementation files for the context\nExtract all user-visible features, interactions, and states\nGenerate Gherkin scenarios covering every cyclomatic path\nWrite to ${SPEC_DIR}/[context].feature\nMap each scenario to its corresponding test function"
      },
      {
        "title": "Agent-Optimized ADR Generation",
        "body": "When Forge discovers a bounded context without an Architecture Decision Record, the Specification Verifier generates one. ADRs follow an agent-optimized format designed for machine consumption:\n\n# ADR-NNN: [Context] Architecture Decision\n\n## Status\nProposed | Accepted | Deprecated | Superseded by ADR-XXX\n\n## MUST\n- [Explicit required behaviors with contract references]\n- [Link to OpenAPI spec: /api/v1/[context]/openapi.json]\n- [Required integration patterns]\n\n## MUST NOT\n- [Explicit forbidden patterns]\n- [Anti-patterns to avoid]\n- [Coupling violations]\n\n## Verification\n- Command: [command to verify this decision holds]\n- Expected: [expected output or exit code]\n\n## Dependencies\n- Depends on: [list of upstream contexts with ADR links]\n- Blocks: [list of downstream contexts with ADR links]\n\nADR Storage:\n\nADRs are stored in docs/decisions/ or the project-configured ADR directory\nEach bounded context has exactly one ADR\nADRs are updated when contracts change or new dependencies are discovered\nThe Specification Verifier agent includes ADR generation in its workflow"
      },
      {
        "title": "Contract Validation",
        "body": "Before running tests, verify API response schemas match expected DTOs:\n\n# For each endpoint the context uses:\n# 1. Make a real API call\n# 2. Compare response structure against expected DTO/schema\n# 3. Flag any mismatches as contract violations\n\nContract violations are treated as Gate 7 failures and must be resolved before functional testing proceeds."
      },
      {
        "title": "Shared Types Validation",
        "body": "For bounded contexts that share dependencies, validate type consistency across context boundaries:\n\nIdentify shared DTOs/models — For each context, extract types used in API requests and responses\nCross-reference types — Compare DTOs between contexts that share dependencies (from the dependency graph)\nFlag type mismatches — e.g., context A expects userId: string but context B sends userId: number\nValidate value objects — Ensure value objects (email, money, address) follow consistent patterns across contexts\nReport violations — Flag as pre-Gate warnings with specific file locations and expected vs actual types\n\n{\n  \"sharedTypeViolation\": {\n    \"type\": \"UserId\",\n    \"contextA\": { \"name\": \"payments\", \"file\": \"types/payment.ts\", \"definition\": \"string\" },\n    \"contextB\": { \"name\": \"orders\", \"file\": \"types/order.ts\", \"definition\": \"number\" },\n    \"severity\": \"error\"\n  }\n}"
      },
      {
        "title": "Cross-Cutting Foundation Validation",
        "body": "Verify cross-cutting concerns are consistent across all bounded contexts:\n\nAuth patterns — Same header format (Authorization: Bearer <token>), same token validation approach across all endpoints\nError response format — All API endpoints return errors in the project's standard format (consistent structure, error codes, HTTP status codes)\nLogging patterns — Consistent log levels, structured format, and correlation IDs across contexts\nPagination format — Consistent pagination parameters and response format across collection endpoints\n\nCross-cutting violations are reported as warnings before Gate evaluation begins."
      },
      {
        "title": "Dependency Graph",
        "body": "Bounded contexts have dependencies. When a fix touches context X, all contexts that depend on X must be re-tested.\n\n# Context Dependency Map — define in forge.config.yaml or auto-discover\n# Example for a typical application:\n#\n# authentication:\n#   depends_on: []\n#   blocks: [orders, payments, profile, messaging]\n#\n# payments:\n#   depends_on: [authentication]\n#   blocks: [orders, subscriptions]\n#\n# orders:\n#   depends_on: [authentication, payments]\n#   blocks: [reviews, notifications]"
      },
      {
        "title": "Cascade Re-Testing",
        "body": "When Bug Fixer modifies a file in context X:\n\nIdentify which context X belongs to\nLook up all contexts in blocks list for X\nAfter X's tests pass, automatically re-run tests for blocked contexts\nIf a cascade failure occurs, trace it back to the original fix"
      },
      {
        "title": "PHASE 3: SWARM INITIALIZATION",
        "body": "# Initialize anti-drift swarm for Forge\nnpx @claude-flow/cli@latest swarm init --topology hierarchical --max-agents 10 --strategy specialized\n\n# Load previous fix patterns from memory\nnpx @claude-flow/cli@latest memory search --query \"forge fix patterns\" --namespace forge-patterns\n\n# Check current coverage and gate status\nnpx @claude-flow/cli@latest memory retrieve --key \"forge-coverage-status\" --namespace forge-state\n\n# Load confidence tiers\nnpx @claude-flow/cli@latest memory search --query \"confidence tier\" --namespace forge-patterns\n\n# Check defect predictions for target context\nnpx @claude-flow/cli@latest memory search --query \"defect prediction\" --namespace forge-predictions"
      },
      {
        "title": "MODEL ROUTING",
        "body": "Forge routes each agent to the appropriate model tier based on task complexity, optimizing for cost without sacrificing quality:\n\nAgentModelRationaleSpecification VerifiersonnetReads code + generates Gherkin — moderate reasoningTest RunnerhaikuStructured execution, output parsing — low reasoningFailure AnalyzersonnetRoot cause analysis — moderate reasoningBug FixeropusFirst-principles code fixes — high reasoningQuality Gate EnforcerhaikuThreshold comparison — low reasoningAccessibility AuditorsonnetCode analysis + WCAG rules — moderate reasoningAuto-CommitterhaikuGit operations, message formatting — low reasoningLearning OptimizersonnetPattern analysis, prediction — moderate reasoning\n\nProjects can override model assignments in forge.config.yaml:\n\n# forge.config.yaml — Model routing overrides (optional)\nmodel_routing:\n  spec-verifier: sonnet\n  test-runner: haiku\n  failure-analyzer: sonnet\n  bug-fixer: opus\n  gate-enforcer: haiku\n  accessibility-auditor: sonnet\n  auto-committer: haiku\n  learning-optimizer: sonnet\n\nWhen no override is specified, the defaults above are used. This routing reduces token cost by ~60% compared to running all agents on the highest-tier model."
      },
      {
        "title": "PHASE 4: SPAWN AUTONOMOUS AGENTS",
        "body": "Claude Code MUST spawn these 8 agents in a SINGLE message with run_in_background: true:\n\n// Agent 1: Specification Verifier\nTask({\n  model: \"sonnet\",\n  prompt: `You are the Specification Verifier agent. Your mission:\n\n    1. VERIFY backend is running: curl -sf http://localhost:${BACKEND_PORT}/${HEALTH_ENDPOINT}\n    2. Check if Gherkin specs exist for the target bounded context:\n       - Look in the project's spec directory\n    3. If specs are MISSING:\n       - Read the screen/component/route implementation files for the context\n       - Extract all user-visible features, interactions, states\n       - Generate Gherkin feature files with scenarios for every cyclomatic path\n       - Write specs to the correct location\n    4. If specs EXIST:\n       - Read current implementations\n       - Compare against existing scenarios\n       - Flag scenarios that no longer match implementation (stale specs)\n       - Generate new scenarios for uncovered features\n    5. Create spec-to-test mapping:\n       - Each Scenario name → test function name\n       - Store mapping in memory for Test Runner\n    6. Store results:\n       npx @claude-flow/cli@latest memory store --key \"specs-[context]-[timestamp]\" \\\n         --value \"[spec status JSON]\" --namespace forge-specs\n\n    CONSTRAINTS:\n    - NEVER generate specs for code you haven't read\n    - NEVER assume UI elements exist without checking implementation\n    - NEVER create scenarios that duplicate existing coverage\n    - NEVER modify existing test files — only spec files\n\n    ACCEPTANCE:\n    - Every implementation file has at least one Gherkin scenario\n    - Spec-to-test mapping has zero unmapped entries\n    - All generated scenarios follow Given/When/Then format\n    - Results stored in forge-specs namespace\n\n    Output: List of all Gherkin scenarios with their mapped test functions, and any gaps found.`,\n  subagent_type: \"researcher\",\n  description: \"Spec Verification\",\n  run_in_background: true\n})\n\n// Agent 2: Test Runner\nTask({\n  model: \"haiku\",\n  prompt: `You are the Test Runner agent. Your mission:\n\n    1. VERIFY backend is running\n    2. Check defect predictions from memory:\n       npx @claude-flow/cli@latest memory search --query \"defect prediction [context]\" --namespace forge-predictions\n       - Run predicted-to-fail tests FIRST for faster convergence\n    3. Run the E2E test suite for the specified context using the project's test command\n    4. Capture ALL test output including stack traces\n    5. Parse failures into structured format:\n       {testId, gherkinScenario, error, stackTrace, file, line, context}\n    6. Map each failure to its Gherkin scenario (from spec-to-test mapping)\n    7. Store results in memory for other agents:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"test-run-[timestamp]\" \\\n         --value \"[parsed results JSON]\" \\\n         --namespace forge-results\n\n    CONSTRAINTS:\n    - NEVER skip failing tests\n    - NEVER modify test code or source code\n    - NEVER mock API calls or stub responses\n    - NEVER continue if backend health check fails\n\n    ACCEPTANCE:\n    - All test results stored in memory with structured format\n    - Zero unparsed failures — every failure has testId, error, stackTrace, file, line\n    - Predicted-to-fail tests executed first\n    - Results include Gherkin scenario mapping for every test`,\n  subagent_type: \"tester\",\n  description: \"Test Runner\",\n  run_in_background: true\n})\n\n// Agent 3: Failure Analyzer\nTask({\n  model: \"sonnet\",\n  prompt: `You are the Failure Analyzer agent. Your mission:\n\n    1. Monitor memory for new test results from Test Runner\n    2. For each failure, analyze:\n       - Root cause category: element-not-found, assertion-failed, timeout,\n         api-mismatch, navigation-error, state-error, contract-violation\n       - Affected file and line number\n       - Which Gherkin scenario is violated\n       - Impact on dependent contexts (check dependency graph)\n    3. Search memory for matching fix patterns with confidence tiers:\n       npx @claude-flow/cli@latest memory search \\\n         --query \"[error pattern]\" --namespace forge-patterns\n    4. If pattern found with confidence >= 0.85 (Gold+):\n       - Recommend auto-apply\n       - Include pattern key and success rate\n    5. If pattern found with confidence >= 0.75 (Silver):\n       - Suggest fix but flag for review\n    6. If no matching pattern:\n       - Perform root cause analysis from first principles\n       - Generate fix hypothesis\n    7. Store analysis in memory for Bug Fixer:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"analysis-[testId]-[timestamp]\" \\\n         --value \"[analysis JSON]\" \\\n         --namespace forge-results\n\n    CONSTRAINTS:\n    - NEVER assume root cause without stack trace evidence\n    - NEVER recommend fixes for passing tests\n    - NEVER skip dependency graph impact analysis\n    - NEVER override confidence tier thresholds\n\n    ACCEPTANCE:\n    - Every failure has a root cause category and affected file\n    - Zero unanalyzed failures\n    - Dependency impact documented for every failure\n    - Pattern search executed for every error type`,\n  subagent_type: \"researcher\",\n  description: \"Failure Analyzer\",\n  run_in_background: true\n})\n\n// Agent 4: Bug Fixer\nTask({\n  model: \"opus\",\n  prompt: `You are the Bug Fixer agent. Your mission:\n\n    1. Retrieve failure analysis from memory\n    2. For each failure, apply fix using confidence-tiered approach:\n\n       PLATINUM (>= 0.95 confidence):\n       - Auto-apply the stored fix pattern immediately\n       - No review needed\n\n       GOLD (>= 0.85 confidence):\n       - Auto-apply the stored fix pattern\n       - Flag in commit message for awareness\n\n       SILVER (>= 0.75 confidence):\n       - Read the failing test file and source file\n       - Apply suggested fix with extra verification\n       - Run targeted test before proceeding\n\n       BRONZE or NO PATTERN:\n       - Read the failing test file\n       - Read the source file causing the failure\n       - Implement fix from first principles\n       - Use defensive patterns appropriate to the test framework\n\n    3. After fixing, identify affected context:\n       - Check dependency graph for cascade impacts\n       - Flag dependent contexts for re-testing\n\n    4. Store the fix pattern with initial confidence:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"fix-[error-type]-[hash]\" \\\n         --value '{\"pattern\":\"[fix]\",\"confidence\":0.75,\"tier\":\"silver\",\"applied\":1,\"successes\":0}' \\\n         --namespace forge-patterns\n\n    5. Signal Test Runner to re-run affected tests\n    6. Signal Quality Gate Enforcer to check all 7 gates\n\n    CONSTRAINTS:\n    - NEVER change test assertions to make tests pass\n    - NEVER modify Gherkin specs to match broken behavior\n    - NEVER introduce new dependencies without flagging\n    - NEVER apply fixes without reading both test file and source file\n\n    ACCEPTANCE:\n    - Every applied fix has a targeted test re-run result\n    - Zero fixes without verification\n    - Fix pattern stored with initial confidence score\n    - Cascade impacts identified and flagged for re-testing`,\n  subagent_type: \"coder\",\n  description: \"Bug Fixer\",\n  run_in_background: true\n})\n\n// Agent 5: Quality Gate Enforcer\nTask({\n  model: \"haiku\",\n  prompt: `You are the Quality Gate Enforcer agent. Your mission:\n\n    After each fix cycle, evaluate ALL 7 quality gates:\n\n    GATE 1 — FUNCTIONAL (100% required):\n    - All tests in the target context pass\n    - No regressions in previously passing tests\n\n    GATE 2 — BEHAVIORAL (100% of targeted scenarios):\n    - Every Gherkin scenario that was targeted has a passing test\n    - Spec-to-test mapping is complete (no unmapped scenarios)\n\n    GATE 3 — COVERAGE (>=85% overall, >=95% critical paths):\n    - Calculate path coverage for the context\n    - Critical paths: authentication, payment, core workflows\n    - Non-critical paths: preferences, history, settings\n\n    GATE 4 — SECURITY (0 critical/high violations):\n    - No hardcoded API keys, tokens, or secrets in test files\n    - No hardcoded test credentials (use env vars or test fixtures)\n    - Secure storage patterns used (no plaintext sensitive data)\n    - No SQL injection vectors in dynamic queries\n    - No XSS vectors in rendered output\n    - No path traversal in file operations\n    - Dependencies have no known critical CVEs (when lockfile available)\n    - When AQE available: delegate to security-scanner for full SAST analysis\n\n    GATE 5 — ACCESSIBILITY (WCAG AA):\n    - All interactive elements have accessible labels\n    - Touch/click targets meet minimum size requirements\n    - Color contrast meets WCAG AA ratios\n    - Screen reader navigation order is logical\n\n    GATE 6 — RESILIENCE (tested for target context):\n    - Offline/disconnected state handled gracefully\n    - Timeout handling shows user-friendly message\n    - Error states show retry option\n    - Server errors show generic error, not stack trace\n\n    GATE 7 — CONTRACT (0 mismatches):\n    - API responses match expected schemas\n    - No unexpected null fields\n    - Enum values match expected set\n    - Pagination format is consistent\n\n    For each gate:\n    - Status: PASS / FAIL / SKIP (with reason)\n    - Details: what passed, what failed\n    - Blocking: whether this gate blocks the commit\n\n    Store gate results:\n    npx @claude-flow/cli@latest memory store \\\n      --key \"gates-[context]-[timestamp]\" \\\n      --value \"[gate results JSON]\" \\\n      --namespace forge-state\n\n    ONLY signal Auto-Committer when ALL 7 GATES PASS.\n\n    CONSTRAINTS:\n    - NEVER approve a commit with ANY blocking gate failure\n    - NEVER lower thresholds below defined minimums\n    - NEVER skip gate evaluation — all 7 gates must be assessed\n    - NEVER mark a gate as PASS without evidence\n\n    ACCEPTANCE:\n    - Gate results stored in memory with PASS/FAIL/SKIP for all 7 gates\n    - Every FAIL includes specific details of what failed\n    - Every SKIP includes reason for skipping\n    - Auto-Committer only signaled when all blocking gates pass`,\n  subagent_type: \"reviewer\",\n  description: \"Quality Gate Enforcer\",\n  run_in_background: true\n})\n\n// Agent 6: Accessibility Auditor\nTask({\n  model: \"sonnet\",\n  prompt: `You are the Accessibility Auditor agent. Your mission:\n\n    1. For each screen/page/component in the target context, audit:\n\n    LABELS:\n    - Every interactive element has an accessible label/aria-label/Semantics label\n    - Labels are descriptive (not \"button1\" but \"Submit payment\")\n    - Images have alt text or semantic labels\n\n    TOUCH/CLICK TARGETS:\n    - All interactive elements meet minimum size (48x48dp mobile, 44x44px web)\n    - Flag any undersized targets\n\n    CONTRAST:\n    - Text on colored backgrounds meets WCAG AA ratio (4.5:1 normal, 3:1 large)\n    - Flag low-contrast combinations\n\n    SCREEN READER:\n    - Accessibility tree has logical reading order\n    - No duplicate or misleading labels\n    - Form fields have associated labels\n\n    FOCUS/TAB ORDER:\n    - Focus order follows visual layout\n    - Focus trap in modals/dialogs\n    - Focus returns to trigger after dialog closes\n\n    2. Generate findings as:\n       {severity: \"critical\"|\"warning\"|\"info\", element, file, line, issue, fix}\n\n    3. Store audit results:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"a11y-[context]-[timestamp]\" \\\n         --value \"[audit JSON]\" \\\n         --namespace forge-state\n\n    CONSTRAINTS:\n    - NEVER skip interactive elements during audit\n    - NEVER report false positives for decorative images\n    - NEVER ignore focus/tab order analysis\n    - NEVER apply fixes — only report findings for Bug Fixer\n\n    ACCEPTANCE:\n    - Every interactive element audited\n    - Findings stored with severity, element, file, line, issue, fix\n    - Zero unaudited interactive elements in target context\n    - WCAG AA compliance level assessed for every screen`,\n  subagent_type: \"analyst\",\n  description: \"Accessibility Auditor\",\n  run_in_background: true\n})\n\n// Agent 7: Auto-Committer\nTask({\n  model: \"haiku\",\n  prompt: `You are the Auto-Committer agent. Your mission:\n\n    1. Monitor for successful fixes where ALL 7 QUALITY GATES PASS\n    2. For each successful fix:\n       - Stage only the fixed files (never git add -A)\n       - Create detailed commit message:\n\n         fix(forge): Fix [TEST_ID] - [brief description]\n\n         Behavioral Spec: [Gherkin scenario name]\n         Root Cause: [what caused the failure]\n         - [specific issue 1]\n         - [specific issue 2]\n\n         Fix Applied:\n         - [change 1]\n         - [change 2]\n\n         Quality Gates:\n         - Functional: PASS\n         - Behavioral: PASS\n         - Coverage: [X]%\n         - Security: PASS\n         - Accessibility: PASS\n         - Resilience: PASS\n         - Contract: PASS\n\n         Confidence Tier: [platinum|gold|silver|bronze]\n         Pattern Stored: fix-[error-type]-[hash]\n\n       - Commit with the message above\n    3. Update coverage report with new passing paths\n    4. Store commit hash in memory for rollback capability:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"commit-[hash]\" \\\n         --value \"[commit details JSON]\" \\\n         --namespace forge-commits\n    5. Store last known good commit:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"last-green-commit\" \\\n         --value \"[hash]\" \\\n         --namespace forge-state\n\n    CONSTRAINTS:\n    - NEVER use git add -A or git add .\n    - NEVER commit without all 7 gates passing\n    - NEVER amend previous commits\n    - NEVER push to remote — only local commits\n\n    ACCEPTANCE:\n    - Commit message includes Behavioral Spec, Root Cause, Fix Applied, all 7 gate statuses\n    - Only fixed files are staged (no unrelated files)\n    - Commit hash stored in forge-commits namespace\n    - Last green commit updated in forge-state namespace`,\n  subagent_type: \"reviewer\",\n  description: \"Auto-Committer\",\n  run_in_background: true\n})\n\n// Agent 8: Learning Optimizer\nTask({\n  model: \"sonnet\",\n  prompt: `You are the Learning Optimizer agent. Your mission:\n\n    1. After each test cycle, analyze patterns:\n       - Which error types fail most often?\n       - Which fix patterns have highest success rate?\n       - What new defensive patterns should be added?\n       - Which Gherkin scenarios are most fragile?\n\n    2. UPDATE CONFIDENCE TIERS:\n       For each fix pattern applied this cycle:\n       - If fix succeeded: confidence += 0.05 (cap at 1.0)\n         - If confidence crosses 0.95: promote to Platinum\n         - If confidence crosses 0.85: promote to Gold\n       - If fix failed: confidence -= 0.10 (floor at 0.0)\n         - If confidence drops below 0.70: demote to Bronze (learning-only)\n       Store updated pattern:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"fix-[error-type]-[hash]\" \\\n         --value \"[updated pattern JSON]\" \\\n         --namespace forge-patterns\n\n    3. DEFECT PREDICTION:\n       Analyze which contexts/files are likely to fail next:\n       - Files changed since last green run\n       - Historical failure rate per context\n       - Complexity of recent changes\n       Store prediction:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"prediction-[date]\" \\\n         --value \"[prediction JSON]\" \\\n         --namespace forge-predictions\n\n    4. Train neural patterns on successful fixes:\n       npx @claude-flow/cli@latest hooks post-task \\\n         --task-id \"forge-cycle\" --success true --store-results true\n\n    5. Update coverage status:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"forge-coverage-status\" \\\n         --value \"[updated coverage JSON]\" \\\n         --namespace forge-state\n\n    6. Generate recommendations for test improvements\n    7. Export learning metrics:\n       npx @claude-flow/cli@latest neural train --pattern-type forge-fixes --epochs 5\n\n    CONSTRAINTS:\n    - NEVER promote a pattern that failed in the current cycle\n    - NEVER delete patterns — only demote below Bronze threshold\n    - NEVER override confidence scores without evidence from test results\n    - NEVER generate predictions without historical data\n\n    ACCEPTANCE:\n    - All applied patterns have updated confidence scores\n    - Prediction stored for next run with context-level probabilities\n    - Coverage status updated in forge-state namespace\n    - Zero patterns promoted without success evidence`,\n  subagent_type: \"researcher\",\n  description: \"Learning Optimizer\",\n  run_in_background: true\n})"
      },
      {
        "title": "PHASE 5: QUALITY GATES",
        "body": "7 gates evaluated after each fix cycle. ALL must pass before a commit is created.\n\nGateCheckThresholdBlocking1. FunctionalAll tests pass100% pass rateYES2. BehavioralGherkin scenarios satisfied100% of targeted scenariosYES3. CoveragePath coverage>=85% overall, >=95% criticalYES (critical only)4. SecurityNo hardcoded secrets, secure storage, SAST checks0 critical/high violationsYES5. AccessibilityAccessible labels, target sizes, contrastWCAG AAWarning only6. ResilienceOffline handling, timeout handling, error statesTested for target contextWarning only7. ContractAPI response matches expected schema0 mismatchesYES"
      },
      {
        "title": "Gate Failure Categories",
        "body": "When gates fail, failures are categorized for targeted re-runs:\n\nFunctional failures → Re-run Bug Fixer on failing tests\nBehavioral failures → Check spec-to-test mapping, may need new tests\nCoverage failures → Generate additional test paths\nSecurity failures → Fix hardcoded values, update storage patterns\nAccessibility failures → Add accessible labels, fix target sizes\nResilience failures → Add offline/error state handling\nContract failures → Update DTOs or flag API regression"
      },
      {
        "title": "AUTONOMOUS EXECUTION LOOP",
        "body": "┌────────────────────────────────────────────────────────────────────────┐\n│                      FORGE AUTONOMOUS LOOP                             │\n├────────────────────────────────────────────────────────────────────────┤\n│                                                                        │\n│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐        │\n│  │ Specify  │───▶│   Test   │───▶│ Analyze  │───▶│   Fix    │        │\n│  │ (Gherkin)│    │ (Run)    │    │ (Root    │    │ (Tiered) │        │\n│  └──────────┘    └──────────┘    │  Cause)  │    └──────────┘        │\n│       ▲                          └──────────┘         │               │\n│       │                                               ▼               │\n│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐        │\n│  │  Learn   │◀───│  Commit  │◀───│  Gate    │◀───│  Audit   │        │\n│  │ (Update  │    │ (Auto)   │    │ (7 Gates)│    │ (A11y)   │        │\n│  │  Tiers)  │    └──────────┘    └──────────┘    └──────────┘        │\n│  └──────────┘                                                         │\n│       │                                                                │\n│       └───────────────── REPEAT ──────────────────────────────────────│\n│                                                                        │\n│  Loop continues until: ALL 7 GATES PASS or MAX_ITERATIONS (10)        │\n│  Gate failures are categorized for targeted re-runs (not full re-run) │\n└────────────────────────────────────────────────────────────────────────┘"
      },
      {
        "title": "REAL-TIME PROGRESS REPORTING",
        "body": "Each agent emits structured progress events during execution for observability:\n\n{\"agent\": \"spec-verifier\", \"event\": \"spec_generated\", \"context\": \"payments\", \"scenarios\": 12}\n{\"agent\": \"test-runner\", \"event\": \"test_started\", \"context\": \"payments\", \"test\": \"user_can_pay\"}\n{\"agent\": \"test-runner\", \"event\": \"test_completed\", \"context\": \"payments\", \"passed\": 10, \"failed\": 2}\n{\"agent\": \"failure-analyzer\", \"event\": \"root_cause_found\", \"test\": \"user_can_pay\", \"cause\": \"timeout\"}\n{\"agent\": \"bug-fixer\", \"event\": \"fix_applied\", \"file\": \"payments.ts\", \"confidence\": 0.92}\n{\"agent\": \"gate-enforcer\", \"event\": \"gate_evaluated\", \"gate\": \"functional\", \"status\": \"PASS\"}\n{\"agent\": \"auto-committer\", \"event\": \"committed\", \"hash\": \"abc123\", \"tests_fixed\": 2}\n{\"agent\": \"learning-optimizer\", \"event\": \"pattern_updated\", \"pattern\": \"fix-timeout-xyz\", \"tier\": \"gold\"}\n\nProgress File:\n\nEvents are appended to .forge/progress.jsonl (one JSON object per line)\nFile is created at the start of each Forge run and truncated\nTools can tail this file for real-time monitoring: tail -f .forge/progress.jsonl\n\nIntegration with Agentic QE AG-UI:\n\nWhen the AQE AG-UI protocol is available, events stream directly to the user interface\nUsers see live progress: which gate is being evaluated, which test is running, which fix is being applied\nWhen running in Claude Code without AG-UI, progress is visible through agent output files"
      },
      {
        "title": "CONFIDENCE TIERS FOR FIX PATTERNS",
        "body": "Every fix pattern is tracked with a confidence score that evolves over time:\n\n{\n  \"key\": \"fix-element-not-found-abc123\",\n  \"pattern\": {\n    \"error\": \"Element not found / No element\",\n    \"fix\": \"Ensure element is rendered and visible before interaction\",\n    \"files_affected\": [\"*_test.*\"],\n    \"context\": \"any\"\n  },\n  \"tier\": \"gold\",\n  \"confidence\": 0.92,\n  \"auto_apply\": true,\n  \"applied_count\": 47,\n  \"success_count\": 43,\n  \"success_rate\": 0.915,\n  \"last_applied\": \"2026-02-06T14:30:00Z\",\n  \"last_failed\": \"2026-02-01T09:15:00Z\"\n}"
      },
      {
        "title": "Tier Thresholds",
        "body": "TierConfidenceAuto-ApplyBehaviorPlatinum>= 0.95YesApply immediately without reviewGold>= 0.85YesApply and flag in commit messageSilver>= 0.75NoSuggest to Bug Fixer, don't auto-applyBronze>= 0.70NoStore for learning only, never auto-applyExpired< 0.70NoPattern demoted, needs revalidation"
      },
      {
        "title": "Confidence Updates",
        "body": "After each application:\n\nSuccess: confidence += 0.05 (capped at 1.0)\nFailure: confidence -= 0.10 (floored at 0.0)\nTier promotion when crossing threshold upward\nTier demotion when crossing threshold downward"
      },
      {
        "title": "DEFECT PREDICTION",
        "body": "Before running tests, the Learning Optimizer analyzes historical data to predict which tests are most likely to fail:"
      },
      {
        "title": "Input Signals",
        "body": "Files changed since last green run (git diff against last-green-commit)\nHistorical failure rates per bounded context (from forge-results namespace)\nFix pattern freshness — recently applied fixes are more likely to regress\nComplexity metrics — contexts with more cyclomatic paths fail more often\nDependency chain length — deeper dependency chains have higher failure rates"
      },
      {
        "title": "Prediction Output",
        "body": "{\n  \"date\": \"2026-02-07\",\n  \"predictions\": [\n    { \"context\": \"payments\", \"probability\": 0.73, \"reason\": \"3 files changed in payment module\" },\n    { \"context\": \"orders\", \"probability\": 0.45, \"reason\": \"depends on payments (changed)\" },\n    { \"context\": \"identity\", \"probability\": 0.12, \"reason\": \"no changes, stable history\" }\n  ],\n  \"recommended_order\": [\"payments\", \"orders\", \"identity\"]\n}\n\nTests are executed in descending probability order — predicted-to-fail tests run FIRST for faster convergence."
      },
      {
        "title": "General UI Element Edge Cases",
        "body": "For EVERY interactive element, test:\n\nInteraction States\n\nSingle interaction → expected action\nRepeated rapid interaction → no duplicate action\nLong press / right-click → context menu if applicable\nDisabled state → no action, visual feedback\n\n\n\nInput Field States\n\nEmpty → placeholder visible\nFocus → visual focus indicator\nValid input → no error\nInvalid input → error message\nMax length reached → prevents further input\nPaste → validates pasted content\nClear → resets to empty\n\n\n\nAsync Operation States\n\nBefore load → loading indicator\nDuring load → spinner, disabled submit\nSuccess → data displayed, spinner gone\nError → error message, retry option\nTimeout → timeout message, retry option\n\n\n\nNavigation Edge Cases\n\nBack navigation → previous screen or exit confirmation\nDeep link → correct screen with params\nInvalid deep link → fallback/error screen\nBrowser forward/back (web) → correct state\n\n\n\nScroll Edge Cases\n\nOverscroll → appropriate feedback\nScroll to hidden content → content becomes visible\nKeyboard appears → scroll to focused field"
      },
      {
        "title": "Network Edge Cases",
        "body": "No internet → offline indicator, cached data if available\nSlow connection → loading states persist, timeout handling\nConnection restored → auto-retry pending operations\nServer error 500 → generic error message\nAuth error 401 → redirect to login\nPermission error 403 → permission denied message\nNot found 404 → \"not found\" message"
      },
      {
        "title": "Chaos Testing (Resilience)",
        "body": "For each target context, inject controlled failures:\n\nTimeout injection → API calls take >10s → verify timeout UI\nPartial response → API returns incomplete data → verify graceful degradation\nRate limiting → API returns 429 → verify retry-after behavior\nConcurrent mutations → Multiple clients modify same resource → verify conflict handling\nSession expiry → Token expires mid-flow → verify re-auth prompt"
      },
      {
        "title": "Visual Regression Testing",
        "body": "For UI-heavy projects, Forge captures and compares screenshots to detect unintended visual changes:\n\nBefore fix — Capture baseline screenshots of all screens in the target context\nAfter fix — Capture new screenshots of the same screens\nCompare — Pixel-by-pixel comparison with configurable threshold (default: 0.1% diff tolerance)\nReport — Flag visual regressions as Gate 5 (Accessibility) warnings\nStore — Save screenshot diffs in memory for review\n\nScreenshot Capture by Platform:\n\nPlatformMethodWeb (Playwright)page.screenshot({ fullPage: true })Web (Cypress)cy.screenshot()Flutterawait tester.binding.setSurfaceSize(size); await expectLater(find.byType(App), matchesGoldenFile('name.png'))Mobile (native)Platform-specific screenshot capture\n\nConfiguration:\n\n# forge.config.yaml — Visual regression settings (optional)\nvisual_regression:\n  enabled: true\n  threshold: 0.001  # 0.1% pixel diff tolerance\n  screenshot_dir: .forge/screenshots\n  full_page: true\n\nWhen Agentic QE is available, delegate to the visual-tester agent for parallel viewport comparison across multiple screen sizes."
      },
      {
        "title": "INVOCATION MODES",
        "body": "# Full autonomous run — all contexts, all gates\n/forge --autonomous --all\n\n# Single context autonomous\n/forge --autonomous --context [context-name]\n\n# Behavioral verification only (no fixes)\n/forge --verify-only\n/forge --verify-only --context [context-name]\n\n# Fix-only mode (fix failures, don't generate new tests)\n/forge --fix-only --context [context-name]\n\n# Learn mode (analyze patterns, update confidence tiers)\n/forge --learn\n\n# Add coverage for new screens/pages/components\n/forge --add-coverage --screens [name1],[name2]\n\n# Generate Gherkin specs for a context\n/forge --spec-gen --context [context-name]\n/forge --spec-gen --all\n\n# Run quality gates without test execution\n/forge --gates-only\n/forge --gates-only --context [context-name]\n\n# Defect prediction only\n/forge --predict\n/forge --predict --context [context-name]\n\n# Chaos/resilience testing for a context\n/forge --chaos --context [context-name]\n/forge --chaos --all"
      },
      {
        "title": "MEMORY NAMESPACES",
        "body": "NamespacePurposeKey Patternforge-patternsFix patterns with confidence tiersfix-[error-type]-[hash]forge-resultsTest run resultstest-run-[timestamp]forge-stateCoverage + gate statusforge-coverage-status, gates-[context]-[ts], last-green-commitforge-commitsCommit historycommit-[hash]forge-screensImplemented screens/pagesscreen-[name]forge-specsGherkin specificationsspecs-[context]-[timestamp]forge-contractsAPI contract snapshotscontract-snapshot-[timestamp]forge-predictionsDefect prediction historyprediction-[date]"
      },
      {
        "title": "OPTIONAL: AGENTIC QE INTEGRATION",
        "body": "Forge can optionally integrate with the Agentic QE framework via MCP for enhanced capabilities. All AQE features are additive — Forge works identically without AQE."
      },
      {
        "title": "Detection",
        "body": "On startup, Forge checks for AQE availability:\n\n# Check if agentic-qe MCP server is registered\nclaude mcp list | grep -q \"aqe\" && echo \"AQE available\" || echo \"AQE not available — using defaults\""
      },
      {
        "title": "Enhanced Capabilities When AQE Is Available",
        "body": "Forge ComponentWithout AQE (Default)With AQEPattern Storageclaude-flow memory (forge-patterns namespace)ReasoningBank — HNSW vector-indexed, 150x faster pattern search, experience replayDefect PredictionHistorical failure rates + file changesdefect-intelligence domain — root-cause-analyzer + defect-predictor agentsSecurity ScanningGate 4 static checks (secrets, injection vectors)security-compliance domain — full SAST/DAST via security-scanner agentAccessibility AuditForge Accessibility Auditor agentvisual-accessibility domain — visual-tester + accessibility-auditor agentsContract TestingGate 7 schema validationcontract-testing domain — contract-validator + graphql-tester agentsProgress Reporting.forge/progress.jsonl fileAG-UI streaming protocol for real-time UI updates"
      },
      {
        "title": "Fallback Behavior",
        "body": "When AQE is NOT available, Forge falls back to its built-in behavior for every capability. No configuration is required — the skill auto-detects and adapts."
      },
      {
        "title": "Configuration",
        "body": "# forge.config.yaml — AQE integration settings (optional)\nintegrations:\n  agentic-qe:\n    enabled: true  # auto-detected if not specified\n    domains:\n      - defect-intelligence\n      - security-compliance\n      - visual-accessibility\n      - contract-testing\n    reasoning_bank:\n      enabled: true  # replaces claude-flow memory for forge-patterns namespace\n    ag_ui:\n      enabled: true  # stream progress events to AG-UI protocol"
      },
      {
        "title": "AQE Agent Delegation Map",
        "body": "When AQE is enabled, Forge delegates specific subtasks to specialized AQE agents:\n\nForge AgentAQE DomainAQE Agents UsedSpecification Verifierrequirements-validationbdd-generator, requirements-validatorFailure Analyzerdefect-intelligenceroot-cause-analyzer, defect-predictorQuality Gate Enforcer (Gate 4)security-compliancesecurity-scanner, security-auditorAccessibility Auditorvisual-accessibilityvisual-tester, accessibility-auditorQuality Gate Enforcer (Gate 7)contract-testingcontract-validator, graphql-testerLearning Optimizerlearning-optimizationlearning-coordinator, pattern-learner\n\nForge agents that have no AQE equivalent (Test Runner, Bug Fixer, Auto-Committer) continue to run as built-in agents regardless of AQE availability."
      },
      {
        "title": "DEFENSIVE TEST PATTERNS",
        "body": "The Bug Fixer agent uses defensive patterns appropriate to the project's test framework. Examples:"
      },
      {
        "title": "Flutter: Safe Tap",
        "body": "Future<bool> safeTap(WidgetTester tester, Finder finder) async {\n  await tester.pumpAndSettle();\n  final elements = finder.evaluate();\n  if (elements.isNotEmpty) {\n    await tester.tap(finder.first, warnIfMissed: false);\n    await tester.pumpAndSettle();\n    return true;\n  }\n  debugPrint('Widget not found: ${finder.description}');\n  return false;\n}"
      },
      {
        "title": "Flutter: Safe Text Entry",
        "body": "Future<bool> safeEnterText(WidgetTester tester, Finder finder, String text) async {\n  await tester.pumpAndSettle();\n  final elements = finder.evaluate();\n  if (elements.isNotEmpty) {\n    await tester.enterText(finder.first, text);\n    await tester.pumpAndSettle();\n    return true;\n  }\n  return false;\n}"
      },
      {
        "title": "Flutter: Visual Observation Delay",
        "body": "Future<void> visualDelay(WidgetTester tester, {String? label}) async {\n  if (label != null) debugPrint('Observing: $label');\n  await tester.pump(const Duration(milliseconds: 2500));\n}"
      },
      {
        "title": "Flutter: Scroll Until Visible",
        "body": "Future<bool> scrollUntilVisible(\n  WidgetTester tester,\n  Finder finder,\n  Finder scrollable,\n) async {\n  for (int i = 0; i < 10; i++) {\n    await tester.pumpAndSettle();\n    if (finder.evaluate().isNotEmpty) return true;\n    await tester.drag(scrollable, const Offset(0, -300));\n    await tester.pumpAndSettle();\n  }\n  return false;\n}"
      },
      {
        "title": "Flutter: Wait For API Response",
        "body": "Future<void> waitForApiResponse(WidgetTester tester, {int maxWaitMs = 5000}) async {\n  final startTime = DateTime.now();\n  while (DateTime.now().difference(startTime).inMilliseconds < maxWaitMs) {\n    await tester.pump(const Duration(milliseconds: 100));\n    if (find.byType(CircularProgressIndicator).evaluate().isEmpty) break;\n  }\n  await tester.pumpAndSettle();\n}"
      },
      {
        "title": "Cypress / Playwright: Safe Click",
        "body": "async function safeClick(selector, options = { timeout: 5000 }) {\n  try {\n    await page.waitForSelector(selector, { state: 'visible', timeout: options.timeout });\n    await page.click(selector);\n    return true;\n  } catch (e) {\n    console.warn(`Element not found: ${selector}`);\n    return false;\n  }\n}"
      },
      {
        "title": "Cypress / Playwright: Wait For API",
        "body": "async function waitForApi(urlPattern, options = { timeout: 10000 }) {\n  return page.waitForResponse(\n    response => response.url().includes(urlPattern) && response.status() === 200,\n    { timeout: options.timeout }\n  );\n}"
      },
      {
        "title": "Pattern: Element Not Found",
        "body": "{\n  \"error\": \"Element not found / No element / Bad state: No element\",\n  \"cause\": \"Element not rendered, wrong selector, or not in viewport\",\n  \"tier\": \"platinum\",\n  \"confidence\": 0.97,\n  \"fixes\": [\n    \"Wait for element to be rendered before interaction\",\n    \"Use safe interaction helpers instead of direct calls\",\n    \"Verify selector matches actual element\",\n    \"Scroll element into view before interaction\"\n  ]\n}"
      },
      {
        "title": "Pattern: Timeout",
        "body": "{\n  \"error\": \"Timeout / pumpAndSettle timed out / waiting for selector\",\n  \"cause\": \"Infinite animation, continuous rebuild, or slow API\",\n  \"tier\": \"gold\",\n  \"confidence\": 0.89,\n  \"fixes\": [\n    \"Use fixed-duration wait instead of settle/idle wait\",\n    \"Dispose animation controllers in tearDown\",\n    \"Check for infinite re-render loops\",\n    \"Increase timeout for slow API calls\"\n  ]\n}"
      },
      {
        "title": "Pattern: Assertion Failed",
        "body": "{\n  \"error\": \"Expected: X, Actual: Y / AssertionError\",\n  \"cause\": \"State not updated or wrong expectation\",\n  \"tier\": \"silver\",\n  \"confidence\": 0.78,\n  \"fixes\": [\n    \"Add delay before assertion for async state updates\",\n    \"Verify test data seeding completed\",\n    \"Check async operation completion before asserting\"\n  ]\n}"
      },
      {
        "title": "Pattern: API Response Mismatch",
        "body": "{\n  \"error\": \"Type error / null value / schema mismatch\",\n  \"cause\": \"Backend response format changed\",\n  \"tier\": \"gold\",\n  \"confidence\": 0.86,\n  \"fixes\": [\n    \"Update model/DTO to match current API response\",\n    \"Add null safety handling\",\n    \"Check API version compatibility\"\n  ]\n}"
      },
      {
        "title": "COVERAGE TRACKING",
        "body": "The Learning Optimizer maintains coverage status per context:\n\n{\n  \"lastRun\": \"2026-02-07T11:00:00Z\",\n  \"backendStatus\": {\n    \"healthy\": true,\n    \"port\": 8080\n  },\n  \"gateStatus\": {\n    \"functional\": \"PASS\",\n    \"behavioral\": \"PASS\",\n    \"coverage\": \"PASS\",\n    \"security\": \"PASS\",\n    \"accessibility\": \"WARNING\",\n    \"resilience\": \"PASS\",\n    \"contract\": \"PASS\"\n  },\n  \"contexts\": {\n    \"[context-a]\": { \"total\": 68, \"passing\": 68, \"failing\": 0, \"behavioralCoverage\": 100 },\n    \"[context-b]\": { \"total\": 72, \"passing\": 70, \"failing\": 2, \"behavioralCoverage\": 97 }\n  },\n  \"totalPaths\": 0,\n  \"passingPaths\": 0,\n  \"coveragePercent\": 0,\n  \"confidenceTiers\": {\n    \"platinum\": 0,\n    \"gold\": 0,\n    \"silver\": 0,\n    \"bronze\": 0,\n    \"expired\": 0\n  }\n}"
      },
      {
        "title": "AUTO-COMMIT MESSAGE FORMAT",
        "body": "fix(forge): Fix [TEST_ID] - [brief description]\n\nBehavioral Spec: [Gherkin scenario name]\nRoot Cause: [what caused the failure]\n- [specific issue 1]\n- [specific issue 2]\n\nFix Applied:\n- [change 1]\n- [change 2]\n\nQuality Gates:\n- Functional: PASS\n- Behavioral: PASS\n- Coverage: [X]%\n- Security: PASS\n- Accessibility: PASS/WARNING\n- Resilience: PASS\n- Contract: PASS\n\nTest Verification:\n- Test now passes after fix\n- No regression in related tests\n- Dependent contexts re-tested: [list]\n\nConfidence Tier: [platinum|gold|silver|bronze]\nPattern Stored: fix-[error-type]-[hash]"
      },
      {
        "title": "Rollback Capability",
        "body": "If a fix introduces regressions:\n\n# Retrieve last known good commit\nnpx @claude-flow/cli@latest memory retrieve --key \"last-green-commit\" --namespace forge-state\n\n# Rollback to that commit\ngit revert [bad-commit-hash]\n\n# Store rollback event for learning (prevents pattern from being re-applied)\nnpx @claude-flow/cli@latest memory store \\\n  --key \"rollback-[timestamp]\" \\\n  --value '{\"commit\":\"[hash]\",\"reason\":\"[reason]\",\"pattern\":\"[pattern-key]\"}' \\\n  --namespace forge-patterns\n\n# Demote the fix pattern confidence (-0.10)\n# Learning Optimizer will handle this automatically"
      },
      {
        "title": "Fix Conflict Protocol",
        "body": "When Bug Fixer's fix causes a cascade regression (tests in dependent contexts fail):\n\nHalt — Stop the fix loop for the affected context\nRe-analyze — Failure Analyzer examines both the original failure AND the cascade failure\nCategorize — Compare root cause categories:\n\nDifferent root cause → The fix is kept; the cascade failure is treated as a new, independent failure in the next loop iteration\nSame root cause → The fix is reverted and the pattern is demoted (-0.10 confidence)\n\n\nRevert limit — Maximum 2 revert cycles per test before escalating to user review\nEscalation — If 2 reverts occur for the same test, Forge pauses and reports:\nESCALATION: Test [testId] has regressed 2x after fix attempts.\nOriginal failure: [description]\nCascade failure: [description]\nAttempted fixes: [list]\nRecommendation: Manual review required."
      },
      {
        "title": "Agent Disagreement Resolution",
        "body": "When two agents disagree (e.g., Bug Fixer wants to change a file that Spec Verifier says shouldn't change):\n\nQuality Gate Enforcer acts as arbiter — It evaluates both proposed states\nThe change that results in more gates passing wins\nTie-breaking order:\n\nFewer files changed (prefer minimal diff)\nHigher confidence tier (prefer proven patterns)\nBug Fixer defers to Spec Verifier (specs are source of truth)"
      },
      {
        "title": "POST-EXECUTION LEARNING",
        "body": "After each autonomous run, the skill triggers comprehensive learning:\n\n# Train on successful patterns\nnpx @claude-flow/cli@latest hooks post-task --task-id \"forge-run\" --success true --store-results true\n\n# Update neural patterns\nnpx @claude-flow/cli@latest neural train --pattern-type forge-fixes --epochs 5\n\n# Update defect predictions\nnpx @claude-flow/cli@latest memory store \\\n  --key \"prediction-$(date +%Y-%m-%d)\" \\\n  --value \"[prediction JSON from Learning Optimizer]\" \\\n  --namespace forge-predictions\n\n# Export metrics\nnpx @claude-flow/cli@latest hooks metrics --format json"
      },
      {
        "title": "PROJECT-SPECIFIC EXTENSIONS",
        "body": "Forge can be extended per-project by creating a forge.contexts.yaml file alongside the skill:\n\n# forge.contexts.yaml — Project-specific bounded contexts and screens\ncontexts:\n  - name: identity\n    testFile: click_through_identity_full_test.dart\n    specFile: identity.feature\n    paths: 68\n    subdomains: [Auth, Profiles, Verification]\n    screens:\n      - name: Identity Verification\n        file: lib/screens/compliance/identity_verification_screen.dart\n        route: /verification\n        cyclomaticPaths:\n          - All verifications incomplete -> show progress 0%\n          - Email only verified -> show 25%\n          - All verified -> show 100% + celebration state\n\n  - name: payments\n    testFile: click_through_payments_test.dart\n    specFile: payments.feature\n    paths: 89\n    subdomains: [Wallet, Cards, Transactions]\n\ndependencies:\n  identity:\n    blocks: [rides, payments, driver]\n  payments:\n    depends_on: [identity]\n    blocks: [rides, subscriptions]\n\nThis separates the generic Forge engine from project-specific configuration, making Forge reusable across any codebase."
      },
      {
        "title": "QUICK REFERENCE CHECKLIST",
        "body": "Before running Forge:\n\nBackend built and running\n Health check passes\n Test data seeded via real API calls\n No mocking or stubbing in test code\n Gherkin specs exist for target context (or will be generated)\n All new screens/pages have test coverage\n Edge cases documented and tested\n\nAfter Forge completes:\n\nGate 1 (Functional): All tests pass\n Gate 2 (Behavioral): All targeted Gherkin scenarios satisfied\n Gate 3 (Coverage): >=85% overall, >=95% critical paths\n Gate 4 (Security): No hardcoded secrets, no injection vectors, no critical CVEs\n Gate 5 (Accessibility): WCAG AA compliance checked\n Gate 6 (Resilience): Offline/timeout/error states tested\n Gate 7 (Contract): API responses match expected schemas\n Confidence tiers updated for all applied fix patterns\n Defect predictions updated for next run\n All fixes committed with detailed messages"
      }
    ],
    "body": "Forge — Autonomous Quality Engineering Swarm\n\nQuality forged in, not bolted on.\n\nForge is a self-learning, autonomous quality engineering swarm that unifies three approaches into one:\n\nPillar\tSource\tWhat It Does\nBuild\tDDD+ADR+TDD methodology\tStructured development with quality gates, defect prediction, confidence-tiered fixes\nVerify\tBDD/Gherkin behavioral specs\tContinuous behavioral verification — the PRODUCT works, not just the CODE\nHeal\tAutonomous E2E fix loop\tTest → Analyze → Fix → Commit → Learn → Repeat\n\n\"DONE DONE\" means: the code compiles AND the product behaves as specified. Every Gherkin scenario passes. Every quality gate clears. Every dependency graph is satisfied.\n\nARCHITECTURE ADAPTABILITY\n\nForge adapts to any project architecture. Before first run, it discovers your project structure:\n\nSupported Architectures\nArchitecture\tHow Forge Adapts\nMonolith\tSingle backend process, all contexts in one codebase. Forge runs all tests against one server.\nModular Monolith\tSingle deployment with bounded contexts as modules. Forge discovers modules and tests each context independently.\nMicroservices\tMultiple services. Forge discovers service endpoints, tests each service, validates inter-service contracts.\nMonorepo\tMultiple apps/packages in one repo. Forge detects workspace structure (Turborepo, Nx, Lerna, Melos, Cargo workspace).\nMobile + Backend\tFrontend app with backend API. Forge starts backend, then runs E2E tests against it.\nFull-Stack Monolith\tFrontend and backend in same deployment. Forge tests through the UI layer against real backend.\nProject Discovery\n\nOn first invocation, Forge analyzes the project to build a context map:\n\n# Forge automatically discovers:\n# 1. Backend technology (Rust/Cargo, Node/npm, Python/pip, Go, Java/Maven/Gradle, .NET)\n# 2. Frontend technology (Flutter, React, Next.js, Vue, Angular, SwiftUI, Kotlin/Compose)\n# 3. Test framework (integration_test, Jest, Pytest, Go test, JUnit, xUnit)\n# 4. Project structure (monorepo layout, service boundaries, module boundaries)\n# 5. API protocol (REST, GraphQL, gRPC, WebSocket)\n# 6. Build system (Make, npm scripts, Gradle tasks, Cargo features)\n\n\nForge stores the discovered project map:\n\n{\n  \"architecture\": \"mobile-backend\",\n  \"backend\": {\n    \"technology\": \"rust\",\n    \"buildCommand\": \"cargo build --release --features test-endpoints\",\n    \"runCommand\": \"cargo run --release --features test-endpoints\",\n    \"healthEndpoint\": \"/health\",\n    \"port\": 8080,\n    \"migrationCommand\": \"cargo sqlx migrate run\"\n  },\n  \"frontend\": {\n    \"technology\": \"flutter\",\n    \"testCommand\": \"flutter drive --driver=test_driver/integration_test.dart --target={target}\",\n    \"testDir\": \"integration_test/e2e/\",\n    \"specDir\": \"integration_test/e2e/specs/\"\n  },\n  \"contexts\": [\"identity\", \"rides\", \"payments\", \"...\"],\n  \"testDataSeeding\": {\n    \"method\": \"api\",\n    \"endpoint\": \"/api/v1/test/seed\",\n    \"authHeader\": \"X-Test-Key\"\n  }\n}\n\nConfiguration Override\n\nProjects can provide a forge.config.yaml at the repo root to override auto-discovery:\n\n# forge.config.yaml (optional — Forge auto-discovers if absent)\narchitecture: microservices\nbackend:\n  services:\n    - name: auth-service\n      port: 8081\n      healthEndpoint: /health\n      buildCommand: npm run build\n      runCommand: npm start\n    - name: payment-service\n      port: 8082\n      healthEndpoint: /health\n      buildCommand: npm run build\n      runCommand: npm start\nfrontend:\n  technology: react\n  testCommand: npx cypress run --spec {target}\n  testDir: cypress/e2e/\n  specDir: cypress/e2e/specs/\ncontexts:\n  - name: identity\n    testFile: auth.cy.ts\n    specFile: identity.feature\n  - name: payments\n    testFile: payments.cy.ts\n    specFile: payments.feature\ndependencies:\n  identity:\n    blocks: [payments, orders]\n  payments:\n    depends_on: [identity]\n    blocks: [orders]\n\nCRITICAL: NO MOCKING OR STUBBING ALLOWED\n\nABSOLUTE RULE: This skill NEVER uses mocking or stubbing of the backend API.\n\nALL tests run against the REAL backend API\nNO mocking frameworks for API calls (no mockito, wiremock, MockClient, nock, msw, httpretty, etc.)\nNO stubbed responses or fake data from API endpoints\nThe backend MUST be running and healthy before any tests execute\nTest data is seeded through REAL API calls, not mocked state\n\nWhy No Mocking:\n\nMocks hide real integration bugs\nMocks create false confidence\nMocks don't test the actual data flow\nReal API tests catch serialization, validation, and timing issues\nPHASE 0: BACKEND SETUP (MANDATORY FIRST STEP)\n\nBEFORE ANY TESTING, the backend MUST be built, compiled, and running.\n\nThis is the FIRST thing the skill does — no exceptions.\n\nStep 1: Check and Start Backend\n# 1. Read project config or auto-discover backend settings\n# 2. Check if backend is already running\ncurl -s http://localhost:${BACKEND_PORT}/${HEALTH_ENDPOINT} || {\n  echo \"Backend not running. Starting...\"\n\n  # 3. Navigate to backend directory\n  cd ${BACKEND_DIR}\n\n  # 4. Ensure environment is configured\n  cp .env.example .env 2>/dev/null || true\n\n  # 5. Build the backend\n  ${BUILD_COMMAND}\n\n  # 6. Run database migrations (if applicable)\n  ${MIGRATION_COMMAND}\n\n  # 7. Start backend (background)\n  nohup ${RUN_COMMAND} > backend.log 2>&1 &\n  echo $! > backend.pid\n\n  # 8. Wait for backend to be healthy (up to 60 seconds)\n  for i in {1..60}; do\n    if curl -s http://localhost:${BACKEND_PORT}/${HEALTH_ENDPOINT} | grep -q \"ok\\|healthy\\|UP\"; then\n      echo \"Backend healthy on port ${BACKEND_PORT}\"\n      break\n    fi\n    sleep 1\n  done\n}\n\nStep 2: Verify Backend Health\n# Verify critical endpoints are responding\ncurl -s http://localhost:${BACKEND_PORT}/${HEALTH_ENDPOINT} | jq .\n\n# Verify test fixtures endpoint (for seeding)\ncurl -s -H \"${TEST_AUTH_HEADER}\" http://localhost:${BACKEND_PORT}/${TEST_STATUS_ENDPOINT} | jq .\n\nStep 3: Contract Validation\n# Verify API spec matches running API (if OpenAPI/Swagger available)\ncurl -s http://localhost:${BACKEND_PORT}/${OPENAPI_ENDPOINT} > /tmp/live-spec.json\n\n# Store contract snapshot for regression detection\nnpx @claude-flow/cli@latest memory store \\\n  --key \"contract-snapshot-$(date +%s)\" \\\n  --value \"$(cat /tmp/live-spec.json | head -c 5000)\" \\\n  --namespace forge-contracts\n\nStep 4: Seed Test Data (Real API Calls)\n# Seed test data through REAL API — adapt to your project's seeding endpoint\ncurl -X POST http://localhost:${BACKEND_PORT}/${SEED_ENDPOINT} \\\n  -H \"Content-Type: application/json\" \\\n  -H \"${TEST_AUTH_HEADER}\" \\\n  -d '${SEED_PAYLOAD}'\n\nPHASE 1: BEHAVIORAL SPECIFICATION & ARCHITECTURE RECORDS\n\nBefore testing, verify Gherkin specs and architecture decision records exist for the target bounded context.\n\nBehavioral specifications define WHAT the product does from the user's perspective. Every test traces back to a Gherkin scenario. If tests pass but specs fail, the product is broken.\n\nSpec Location\n\nGherkin specs are stored alongside tests:\n\n${SPEC_DIR}/\n├── [context-a].feature\n├── [context-b].feature\n├── [context-c].feature\n└── ...\n\n\nThe exact location depends on your project's test structure. Forge auto-discovers this from the project map.\n\nSpec-to-Test Mapping\n\nEach Gherkin Scenario maps to exactly one test function. The mapping is tracked:\n\nFeature: [Context Name]\n  As a [user role]\n  I want to [action]\n  So that [outcome]\n\n  Scenario: [Descriptive scenario name]\n    Given [precondition]\n    When [action]\n    Then [expected result]\n    And [additional verification]\n\nMissing Spec Generation\n\nIf specs are missing for a target context, the Specification Verifier agent creates them:\n\nRead the screen/component/route implementation files for the context\nExtract all user-visible features, interactions, and states\nGenerate Gherkin scenarios covering every cyclomatic path\nWrite to ${SPEC_DIR}/[context].feature\nMap each scenario to its corresponding test function\nAgent-Optimized ADR Generation\n\nWhen Forge discovers a bounded context without an Architecture Decision Record, the Specification Verifier generates one. ADRs follow an agent-optimized format designed for machine consumption:\n\n# ADR-NNN: [Context] Architecture Decision\n\n## Status\nProposed | Accepted | Deprecated | Superseded by ADR-XXX\n\n## MUST\n- [Explicit required behaviors with contract references]\n- [Link to OpenAPI spec: /api/v1/[context]/openapi.json]\n- [Required integration patterns]\n\n## MUST NOT\n- [Explicit forbidden patterns]\n- [Anti-patterns to avoid]\n- [Coupling violations]\n\n## Verification\n- Command: [command to verify this decision holds]\n- Expected: [expected output or exit code]\n\n## Dependencies\n- Depends on: [list of upstream contexts with ADR links]\n- Blocks: [list of downstream contexts with ADR links]\n\n\nADR Storage:\n\nADRs are stored in docs/decisions/ or the project-configured ADR directory\nEach bounded context has exactly one ADR\nADRs are updated when contracts change or new dependencies are discovered\nThe Specification Verifier agent includes ADR generation in its workflow\nPHASE 2: CONTRACT & DEPENDENCY VALIDATION\nContract Validation\n\nBefore running tests, verify API response schemas match expected DTOs:\n\n# For each endpoint the context uses:\n# 1. Make a real API call\n# 2. Compare response structure against expected DTO/schema\n# 3. Flag any mismatches as contract violations\n\n\nContract violations are treated as Gate 7 failures and must be resolved before functional testing proceeds.\n\nShared Types Validation\n\nFor bounded contexts that share dependencies, validate type consistency across context boundaries:\n\nIdentify shared DTOs/models — For each context, extract types used in API requests and responses\nCross-reference types — Compare DTOs between contexts that share dependencies (from the dependency graph)\nFlag type mismatches — e.g., context A expects userId: string but context B sends userId: number\nValidate value objects — Ensure value objects (email, money, address) follow consistent patterns across contexts\nReport violations — Flag as pre-Gate warnings with specific file locations and expected vs actual types\n{\n  \"sharedTypeViolation\": {\n    \"type\": \"UserId\",\n    \"contextA\": { \"name\": \"payments\", \"file\": \"types/payment.ts\", \"definition\": \"string\" },\n    \"contextB\": { \"name\": \"orders\", \"file\": \"types/order.ts\", \"definition\": \"number\" },\n    \"severity\": \"error\"\n  }\n}\n\nCross-Cutting Foundation Validation\n\nVerify cross-cutting concerns are consistent across all bounded contexts:\n\nAuth patterns — Same header format (Authorization: Bearer <token>), same token validation approach across all endpoints\nError response format — All API endpoints return errors in the project's standard format (consistent structure, error codes, HTTP status codes)\nLogging patterns — Consistent log levels, structured format, and correlation IDs across contexts\nPagination format — Consistent pagination parameters and response format across collection endpoints\n\nCross-cutting violations are reported as warnings before Gate evaluation begins.\n\nDependency Graph\n\nBounded contexts have dependencies. When a fix touches context X, all contexts that depend on X must be re-tested.\n\n# Context Dependency Map — define in forge.config.yaml or auto-discover\n# Example for a typical application:\n#\n# authentication:\n#   depends_on: []\n#   blocks: [orders, payments, profile, messaging]\n#\n# payments:\n#   depends_on: [authentication]\n#   blocks: [orders, subscriptions]\n#\n# orders:\n#   depends_on: [authentication, payments]\n#   blocks: [reviews, notifications]\n\nCascade Re-Testing\n\nWhen Bug Fixer modifies a file in context X:\n\nIdentify which context X belongs to\nLook up all contexts in blocks list for X\nAfter X's tests pass, automatically re-run tests for blocked contexts\nIf a cascade failure occurs, trace it back to the original fix\nPHASE 3: SWARM INITIALIZATION\n# Initialize anti-drift swarm for Forge\nnpx @claude-flow/cli@latest swarm init --topology hierarchical --max-agents 10 --strategy specialized\n\n# Load previous fix patterns from memory\nnpx @claude-flow/cli@latest memory search --query \"forge fix patterns\" --namespace forge-patterns\n\n# Check current coverage and gate status\nnpx @claude-flow/cli@latest memory retrieve --key \"forge-coverage-status\" --namespace forge-state\n\n# Load confidence tiers\nnpx @claude-flow/cli@latest memory search --query \"confidence tier\" --namespace forge-patterns\n\n# Check defect predictions for target context\nnpx @claude-flow/cli@latest memory search --query \"defect prediction\" --namespace forge-predictions\n\nMODEL ROUTING\n\nForge routes each agent to the appropriate model tier based on task complexity, optimizing for cost without sacrificing quality:\n\nAgent\tModel\tRationale\nSpecification Verifier\tsonnet\tReads code + generates Gherkin — moderate reasoning\nTest Runner\thaiku\tStructured execution, output parsing — low reasoning\nFailure Analyzer\tsonnet\tRoot cause analysis — moderate reasoning\nBug Fixer\topus\tFirst-principles code fixes — high reasoning\nQuality Gate Enforcer\thaiku\tThreshold comparison — low reasoning\nAccessibility Auditor\tsonnet\tCode analysis + WCAG rules — moderate reasoning\nAuto-Committer\thaiku\tGit operations, message formatting — low reasoning\nLearning Optimizer\tsonnet\tPattern analysis, prediction — moderate reasoning\n\nProjects can override model assignments in forge.config.yaml:\n\n# forge.config.yaml — Model routing overrides (optional)\nmodel_routing:\n  spec-verifier: sonnet\n  test-runner: haiku\n  failure-analyzer: sonnet\n  bug-fixer: opus\n  gate-enforcer: haiku\n  accessibility-auditor: sonnet\n  auto-committer: haiku\n  learning-optimizer: sonnet\n\n\nWhen no override is specified, the defaults above are used. This routing reduces token cost by ~60% compared to running all agents on the highest-tier model.\n\nPHASE 4: SPAWN AUTONOMOUS AGENTS\n\nClaude Code MUST spawn these 8 agents in a SINGLE message with run_in_background: true:\n\n// Agent 1: Specification Verifier\nTask({\n  model: \"sonnet\",\n  prompt: `You are the Specification Verifier agent. Your mission:\n\n    1. VERIFY backend is running: curl -sf http://localhost:${BACKEND_PORT}/${HEALTH_ENDPOINT}\n    2. Check if Gherkin specs exist for the target bounded context:\n       - Look in the project's spec directory\n    3. If specs are MISSING:\n       - Read the screen/component/route implementation files for the context\n       - Extract all user-visible features, interactions, states\n       - Generate Gherkin feature files with scenarios for every cyclomatic path\n       - Write specs to the correct location\n    4. If specs EXIST:\n       - Read current implementations\n       - Compare against existing scenarios\n       - Flag scenarios that no longer match implementation (stale specs)\n       - Generate new scenarios for uncovered features\n    5. Create spec-to-test mapping:\n       - Each Scenario name → test function name\n       - Store mapping in memory for Test Runner\n    6. Store results:\n       npx @claude-flow/cli@latest memory store --key \"specs-[context]-[timestamp]\" \\\n         --value \"[spec status JSON]\" --namespace forge-specs\n\n    CONSTRAINTS:\n    - NEVER generate specs for code you haven't read\n    - NEVER assume UI elements exist without checking implementation\n    - NEVER create scenarios that duplicate existing coverage\n    - NEVER modify existing test files — only spec files\n\n    ACCEPTANCE:\n    - Every implementation file has at least one Gherkin scenario\n    - Spec-to-test mapping has zero unmapped entries\n    - All generated scenarios follow Given/When/Then format\n    - Results stored in forge-specs namespace\n\n    Output: List of all Gherkin scenarios with their mapped test functions, and any gaps found.`,\n  subagent_type: \"researcher\",\n  description: \"Spec Verification\",\n  run_in_background: true\n})\n\n// Agent 2: Test Runner\nTask({\n  model: \"haiku\",\n  prompt: `You are the Test Runner agent. Your mission:\n\n    1. VERIFY backend is running\n    2. Check defect predictions from memory:\n       npx @claude-flow/cli@latest memory search --query \"defect prediction [context]\" --namespace forge-predictions\n       - Run predicted-to-fail tests FIRST for faster convergence\n    3. Run the E2E test suite for the specified context using the project's test command\n    4. Capture ALL test output including stack traces\n    5. Parse failures into structured format:\n       {testId, gherkinScenario, error, stackTrace, file, line, context}\n    6. Map each failure to its Gherkin scenario (from spec-to-test mapping)\n    7. Store results in memory for other agents:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"test-run-[timestamp]\" \\\n         --value \"[parsed results JSON]\" \\\n         --namespace forge-results\n\n    CONSTRAINTS:\n    - NEVER skip failing tests\n    - NEVER modify test code or source code\n    - NEVER mock API calls or stub responses\n    - NEVER continue if backend health check fails\n\n    ACCEPTANCE:\n    - All test results stored in memory with structured format\n    - Zero unparsed failures — every failure has testId, error, stackTrace, file, line\n    - Predicted-to-fail tests executed first\n    - Results include Gherkin scenario mapping for every test`,\n  subagent_type: \"tester\",\n  description: \"Test Runner\",\n  run_in_background: true\n})\n\n// Agent 3: Failure Analyzer\nTask({\n  model: \"sonnet\",\n  prompt: `You are the Failure Analyzer agent. Your mission:\n\n    1. Monitor memory for new test results from Test Runner\n    2. For each failure, analyze:\n       - Root cause category: element-not-found, assertion-failed, timeout,\n         api-mismatch, navigation-error, state-error, contract-violation\n       - Affected file and line number\n       - Which Gherkin scenario is violated\n       - Impact on dependent contexts (check dependency graph)\n    3. Search memory for matching fix patterns with confidence tiers:\n       npx @claude-flow/cli@latest memory search \\\n         --query \"[error pattern]\" --namespace forge-patterns\n    4. If pattern found with confidence >= 0.85 (Gold+):\n       - Recommend auto-apply\n       - Include pattern key and success rate\n    5. If pattern found with confidence >= 0.75 (Silver):\n       - Suggest fix but flag for review\n    6. If no matching pattern:\n       - Perform root cause analysis from first principles\n       - Generate fix hypothesis\n    7. Store analysis in memory for Bug Fixer:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"analysis-[testId]-[timestamp]\" \\\n         --value \"[analysis JSON]\" \\\n         --namespace forge-results\n\n    CONSTRAINTS:\n    - NEVER assume root cause without stack trace evidence\n    - NEVER recommend fixes for passing tests\n    - NEVER skip dependency graph impact analysis\n    - NEVER override confidence tier thresholds\n\n    ACCEPTANCE:\n    - Every failure has a root cause category and affected file\n    - Zero unanalyzed failures\n    - Dependency impact documented for every failure\n    - Pattern search executed for every error type`,\n  subagent_type: \"researcher\",\n  description: \"Failure Analyzer\",\n  run_in_background: true\n})\n\n// Agent 4: Bug Fixer\nTask({\n  model: \"opus\",\n  prompt: `You are the Bug Fixer agent. Your mission:\n\n    1. Retrieve failure analysis from memory\n    2. For each failure, apply fix using confidence-tiered approach:\n\n       PLATINUM (>= 0.95 confidence):\n       - Auto-apply the stored fix pattern immediately\n       - No review needed\n\n       GOLD (>= 0.85 confidence):\n       - Auto-apply the stored fix pattern\n       - Flag in commit message for awareness\n\n       SILVER (>= 0.75 confidence):\n       - Read the failing test file and source file\n       - Apply suggested fix with extra verification\n       - Run targeted test before proceeding\n\n       BRONZE or NO PATTERN:\n       - Read the failing test file\n       - Read the source file causing the failure\n       - Implement fix from first principles\n       - Use defensive patterns appropriate to the test framework\n\n    3. After fixing, identify affected context:\n       - Check dependency graph for cascade impacts\n       - Flag dependent contexts for re-testing\n\n    4. Store the fix pattern with initial confidence:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"fix-[error-type]-[hash]\" \\\n         --value '{\"pattern\":\"[fix]\",\"confidence\":0.75,\"tier\":\"silver\",\"applied\":1,\"successes\":0}' \\\n         --namespace forge-patterns\n\n    5. Signal Test Runner to re-run affected tests\n    6. Signal Quality Gate Enforcer to check all 7 gates\n\n    CONSTRAINTS:\n    - NEVER change test assertions to make tests pass\n    - NEVER modify Gherkin specs to match broken behavior\n    - NEVER introduce new dependencies without flagging\n    - NEVER apply fixes without reading both test file and source file\n\n    ACCEPTANCE:\n    - Every applied fix has a targeted test re-run result\n    - Zero fixes without verification\n    - Fix pattern stored with initial confidence score\n    - Cascade impacts identified and flagged for re-testing`,\n  subagent_type: \"coder\",\n  description: \"Bug Fixer\",\n  run_in_background: true\n})\n\n// Agent 5: Quality Gate Enforcer\nTask({\n  model: \"haiku\",\n  prompt: `You are the Quality Gate Enforcer agent. Your mission:\n\n    After each fix cycle, evaluate ALL 7 quality gates:\n\n    GATE 1 — FUNCTIONAL (100% required):\n    - All tests in the target context pass\n    - No regressions in previously passing tests\n\n    GATE 2 — BEHAVIORAL (100% of targeted scenarios):\n    - Every Gherkin scenario that was targeted has a passing test\n    - Spec-to-test mapping is complete (no unmapped scenarios)\n\n    GATE 3 — COVERAGE (>=85% overall, >=95% critical paths):\n    - Calculate path coverage for the context\n    - Critical paths: authentication, payment, core workflows\n    - Non-critical paths: preferences, history, settings\n\n    GATE 4 — SECURITY (0 critical/high violations):\n    - No hardcoded API keys, tokens, or secrets in test files\n    - No hardcoded test credentials (use env vars or test fixtures)\n    - Secure storage patterns used (no plaintext sensitive data)\n    - No SQL injection vectors in dynamic queries\n    - No XSS vectors in rendered output\n    - No path traversal in file operations\n    - Dependencies have no known critical CVEs (when lockfile available)\n    - When AQE available: delegate to security-scanner for full SAST analysis\n\n    GATE 5 — ACCESSIBILITY (WCAG AA):\n    - All interactive elements have accessible labels\n    - Touch/click targets meet minimum size requirements\n    - Color contrast meets WCAG AA ratios\n    - Screen reader navigation order is logical\n\n    GATE 6 — RESILIENCE (tested for target context):\n    - Offline/disconnected state handled gracefully\n    - Timeout handling shows user-friendly message\n    - Error states show retry option\n    - Server errors show generic error, not stack trace\n\n    GATE 7 — CONTRACT (0 mismatches):\n    - API responses match expected schemas\n    - No unexpected null fields\n    - Enum values match expected set\n    - Pagination format is consistent\n\n    For each gate:\n    - Status: PASS / FAIL / SKIP (with reason)\n    - Details: what passed, what failed\n    - Blocking: whether this gate blocks the commit\n\n    Store gate results:\n    npx @claude-flow/cli@latest memory store \\\n      --key \"gates-[context]-[timestamp]\" \\\n      --value \"[gate results JSON]\" \\\n      --namespace forge-state\n\n    ONLY signal Auto-Committer when ALL 7 GATES PASS.\n\n    CONSTRAINTS:\n    - NEVER approve a commit with ANY blocking gate failure\n    - NEVER lower thresholds below defined minimums\n    - NEVER skip gate evaluation — all 7 gates must be assessed\n    - NEVER mark a gate as PASS without evidence\n\n    ACCEPTANCE:\n    - Gate results stored in memory with PASS/FAIL/SKIP for all 7 gates\n    - Every FAIL includes specific details of what failed\n    - Every SKIP includes reason for skipping\n    - Auto-Committer only signaled when all blocking gates pass`,\n  subagent_type: \"reviewer\",\n  description: \"Quality Gate Enforcer\",\n  run_in_background: true\n})\n\n// Agent 6: Accessibility Auditor\nTask({\n  model: \"sonnet\",\n  prompt: `You are the Accessibility Auditor agent. Your mission:\n\n    1. For each screen/page/component in the target context, audit:\n\n    LABELS:\n    - Every interactive element has an accessible label/aria-label/Semantics label\n    - Labels are descriptive (not \"button1\" but \"Submit payment\")\n    - Images have alt text or semantic labels\n\n    TOUCH/CLICK TARGETS:\n    - All interactive elements meet minimum size (48x48dp mobile, 44x44px web)\n    - Flag any undersized targets\n\n    CONTRAST:\n    - Text on colored backgrounds meets WCAG AA ratio (4.5:1 normal, 3:1 large)\n    - Flag low-contrast combinations\n\n    SCREEN READER:\n    - Accessibility tree has logical reading order\n    - No duplicate or misleading labels\n    - Form fields have associated labels\n\n    FOCUS/TAB ORDER:\n    - Focus order follows visual layout\n    - Focus trap in modals/dialogs\n    - Focus returns to trigger after dialog closes\n\n    2. Generate findings as:\n       {severity: \"critical\"|\"warning\"|\"info\", element, file, line, issue, fix}\n\n    3. Store audit results:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"a11y-[context]-[timestamp]\" \\\n         --value \"[audit JSON]\" \\\n         --namespace forge-state\n\n    CONSTRAINTS:\n    - NEVER skip interactive elements during audit\n    - NEVER report false positives for decorative images\n    - NEVER ignore focus/tab order analysis\n    - NEVER apply fixes — only report findings for Bug Fixer\n\n    ACCEPTANCE:\n    - Every interactive element audited\n    - Findings stored with severity, element, file, line, issue, fix\n    - Zero unaudited interactive elements in target context\n    - WCAG AA compliance level assessed for every screen`,\n  subagent_type: \"analyst\",\n  description: \"Accessibility Auditor\",\n  run_in_background: true\n})\n\n// Agent 7: Auto-Committer\nTask({\n  model: \"haiku\",\n  prompt: `You are the Auto-Committer agent. Your mission:\n\n    1. Monitor for successful fixes where ALL 7 QUALITY GATES PASS\n    2. For each successful fix:\n       - Stage only the fixed files (never git add -A)\n       - Create detailed commit message:\n\n         fix(forge): Fix [TEST_ID] - [brief description]\n\n         Behavioral Spec: [Gherkin scenario name]\n         Root Cause: [what caused the failure]\n         - [specific issue 1]\n         - [specific issue 2]\n\n         Fix Applied:\n         - [change 1]\n         - [change 2]\n\n         Quality Gates:\n         - Functional: PASS\n         - Behavioral: PASS\n         - Coverage: [X]%\n         - Security: PASS\n         - Accessibility: PASS\n         - Resilience: PASS\n         - Contract: PASS\n\n         Confidence Tier: [platinum|gold|silver|bronze]\n         Pattern Stored: fix-[error-type]-[hash]\n\n       - Commit with the message above\n    3. Update coverage report with new passing paths\n    4. Store commit hash in memory for rollback capability:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"commit-[hash]\" \\\n         --value \"[commit details JSON]\" \\\n         --namespace forge-commits\n    5. Store last known good commit:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"last-green-commit\" \\\n         --value \"[hash]\" \\\n         --namespace forge-state\n\n    CONSTRAINTS:\n    - NEVER use git add -A or git add .\n    - NEVER commit without all 7 gates passing\n    - NEVER amend previous commits\n    - NEVER push to remote — only local commits\n\n    ACCEPTANCE:\n    - Commit message includes Behavioral Spec, Root Cause, Fix Applied, all 7 gate statuses\n    - Only fixed files are staged (no unrelated files)\n    - Commit hash stored in forge-commits namespace\n    - Last green commit updated in forge-state namespace`,\n  subagent_type: \"reviewer\",\n  description: \"Auto-Committer\",\n  run_in_background: true\n})\n\n// Agent 8: Learning Optimizer\nTask({\n  model: \"sonnet\",\n  prompt: `You are the Learning Optimizer agent. Your mission:\n\n    1. After each test cycle, analyze patterns:\n       - Which error types fail most often?\n       - Which fix patterns have highest success rate?\n       - What new defensive patterns should be added?\n       - Which Gherkin scenarios are most fragile?\n\n    2. UPDATE CONFIDENCE TIERS:\n       For each fix pattern applied this cycle:\n       - If fix succeeded: confidence += 0.05 (cap at 1.0)\n         - If confidence crosses 0.95: promote to Platinum\n         - If confidence crosses 0.85: promote to Gold\n       - If fix failed: confidence -= 0.10 (floor at 0.0)\n         - If confidence drops below 0.70: demote to Bronze (learning-only)\n       Store updated pattern:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"fix-[error-type]-[hash]\" \\\n         --value \"[updated pattern JSON]\" \\\n         --namespace forge-patterns\n\n    3. DEFECT PREDICTION:\n       Analyze which contexts/files are likely to fail next:\n       - Files changed since last green run\n       - Historical failure rate per context\n       - Complexity of recent changes\n       Store prediction:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"prediction-[date]\" \\\n         --value \"[prediction JSON]\" \\\n         --namespace forge-predictions\n\n    4. Train neural patterns on successful fixes:\n       npx @claude-flow/cli@latest hooks post-task \\\n         --task-id \"forge-cycle\" --success true --store-results true\n\n    5. Update coverage status:\n       npx @claude-flow/cli@latest memory store \\\n         --key \"forge-coverage-status\" \\\n         --value \"[updated coverage JSON]\" \\\n         --namespace forge-state\n\n    6. Generate recommendations for test improvements\n    7. Export learning metrics:\n       npx @claude-flow/cli@latest neural train --pattern-type forge-fixes --epochs 5\n\n    CONSTRAINTS:\n    - NEVER promote a pattern that failed in the current cycle\n    - NEVER delete patterns — only demote below Bronze threshold\n    - NEVER override confidence scores without evidence from test results\n    - NEVER generate predictions without historical data\n\n    ACCEPTANCE:\n    - All applied patterns have updated confidence scores\n    - Prediction stored for next run with context-level probabilities\n    - Coverage status updated in forge-state namespace\n    - Zero patterns promoted without success evidence`,\n  subagent_type: \"researcher\",\n  description: \"Learning Optimizer\",\n  run_in_background: true\n})\n\nPHASE 5: QUALITY GATES\n\n7 gates evaluated after each fix cycle. ALL must pass before a commit is created.\n\nGate\tCheck\tThreshold\tBlocking\n1. Functional\tAll tests pass\t100% pass rate\tYES\n2. Behavioral\tGherkin scenarios satisfied\t100% of targeted scenarios\tYES\n3. Coverage\tPath coverage\t>=85% overall, >=95% critical\tYES (critical only)\n4. Security\tNo hardcoded secrets, secure storage, SAST checks\t0 critical/high violations\tYES\n5. Accessibility\tAccessible labels, target sizes, contrast\tWCAG AA\tWarning only\n6. Resilience\tOffline handling, timeout handling, error states\tTested for target context\tWarning only\n7. Contract\tAPI response matches expected schema\t0 mismatches\tYES\nGate Failure Categories\n\nWhen gates fail, failures are categorized for targeted re-runs:\n\nFunctional failures → Re-run Bug Fixer on failing tests\nBehavioral failures → Check spec-to-test mapping, may need new tests\nCoverage failures → Generate additional test paths\nSecurity failures → Fix hardcoded values, update storage patterns\nAccessibility failures → Add accessible labels, fix target sizes\nResilience failures → Add offline/error state handling\nContract failures → Update DTOs or flag API regression\nAUTONOMOUS EXECUTION LOOP\n┌────────────────────────────────────────────────────────────────────────┐\n│                      FORGE AUTONOMOUS LOOP                             │\n├────────────────────────────────────────────────────────────────────────┤\n│                                                                        │\n│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐        │\n│  │ Specify  │───▶│   Test   │───▶│ Analyze  │───▶│   Fix    │        │\n│  │ (Gherkin)│    │ (Run)    │    │ (Root    │    │ (Tiered) │        │\n│  └──────────┘    └──────────┘    │  Cause)  │    └──────────┘        │\n│       ▲                          └──────────┘         │               │\n│       │                                               ▼               │\n│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐        │\n│  │  Learn   │◀───│  Commit  │◀───│  Gate    │◀───│  Audit   │        │\n│  │ (Update  │    │ (Auto)   │    │ (7 Gates)│    │ (A11y)   │        │\n│  │  Tiers)  │    └──────────┘    └──────────┘    └──────────┘        │\n│  └──────────┘                                                         │\n│       │                                                                │\n│       └───────────────── REPEAT ──────────────────────────────────────│\n│                                                                        │\n│  Loop continues until: ALL 7 GATES PASS or MAX_ITERATIONS (10)        │\n│  Gate failures are categorized for targeted re-runs (not full re-run) │\n└────────────────────────────────────────────────────────────────────────┘\n\nREAL-TIME PROGRESS REPORTING\n\nEach agent emits structured progress events during execution for observability:\n\n{\"agent\": \"spec-verifier\", \"event\": \"spec_generated\", \"context\": \"payments\", \"scenarios\": 12}\n{\"agent\": \"test-runner\", \"event\": \"test_started\", \"context\": \"payments\", \"test\": \"user_can_pay\"}\n{\"agent\": \"test-runner\", \"event\": \"test_completed\", \"context\": \"payments\", \"passed\": 10, \"failed\": 2}\n{\"agent\": \"failure-analyzer\", \"event\": \"root_cause_found\", \"test\": \"user_can_pay\", \"cause\": \"timeout\"}\n{\"agent\": \"bug-fixer\", \"event\": \"fix_applied\", \"file\": \"payments.ts\", \"confidence\": 0.92}\n{\"agent\": \"gate-enforcer\", \"event\": \"gate_evaluated\", \"gate\": \"functional\", \"status\": \"PASS\"}\n{\"agent\": \"auto-committer\", \"event\": \"committed\", \"hash\": \"abc123\", \"tests_fixed\": 2}\n{\"agent\": \"learning-optimizer\", \"event\": \"pattern_updated\", \"pattern\": \"fix-timeout-xyz\", \"tier\": \"gold\"}\n\n\nProgress File:\n\nEvents are appended to .forge/progress.jsonl (one JSON object per line)\nFile is created at the start of each Forge run and truncated\nTools can tail this file for real-time monitoring: tail -f .forge/progress.jsonl\n\nIntegration with Agentic QE AG-UI:\n\nWhen the AQE AG-UI protocol is available, events stream directly to the user interface\nUsers see live progress: which gate is being evaluated, which test is running, which fix is being applied\nWhen running in Claude Code without AG-UI, progress is visible through agent output files\nCONFIDENCE TIERS FOR FIX PATTERNS\n\nEvery fix pattern is tracked with a confidence score that evolves over time:\n\n{\n  \"key\": \"fix-element-not-found-abc123\",\n  \"pattern\": {\n    \"error\": \"Element not found / No element\",\n    \"fix\": \"Ensure element is rendered and visible before interaction\",\n    \"files_affected\": [\"*_test.*\"],\n    \"context\": \"any\"\n  },\n  \"tier\": \"gold\",\n  \"confidence\": 0.92,\n  \"auto_apply\": true,\n  \"applied_count\": 47,\n  \"success_count\": 43,\n  \"success_rate\": 0.915,\n  \"last_applied\": \"2026-02-06T14:30:00Z\",\n  \"last_failed\": \"2026-02-01T09:15:00Z\"\n}\n\nTier Thresholds\nTier\tConfidence\tAuto-Apply\tBehavior\nPlatinum\t>= 0.95\tYes\tApply immediately without review\nGold\t>= 0.85\tYes\tApply and flag in commit message\nSilver\t>= 0.75\tNo\tSuggest to Bug Fixer, don't auto-apply\nBronze\t>= 0.70\tNo\tStore for learning only, never auto-apply\nExpired\t< 0.70\tNo\tPattern demoted, needs revalidation\nConfidence Updates\n\nAfter each application:\n\nSuccess: confidence += 0.05 (capped at 1.0)\nFailure: confidence -= 0.10 (floored at 0.0)\nTier promotion when crossing threshold upward\nTier demotion when crossing threshold downward\nDEFECT PREDICTION\n\nBefore running tests, the Learning Optimizer analyzes historical data to predict which tests are most likely to fail:\n\nInput Signals\nFiles changed since last green run (git diff against last-green-commit)\nHistorical failure rates per bounded context (from forge-results namespace)\nFix pattern freshness — recently applied fixes are more likely to regress\nComplexity metrics — contexts with more cyclomatic paths fail more often\nDependency chain length — deeper dependency chains have higher failure rates\nPrediction Output\n{\n  \"date\": \"2026-02-07\",\n  \"predictions\": [\n    { \"context\": \"payments\", \"probability\": 0.73, \"reason\": \"3 files changed in payment module\" },\n    { \"context\": \"orders\", \"probability\": 0.45, \"reason\": \"depends on payments (changed)\" },\n    { \"context\": \"identity\", \"probability\": 0.12, \"reason\": \"no changes, stable history\" }\n  ],\n  \"recommended_order\": [\"payments\", \"orders\", \"identity\"]\n}\n\n\nTests are executed in descending probability order — predicted-to-fail tests run FIRST for faster convergence.\n\nEXHAUSTIVE EDGE CASE TESTING\nGeneral UI Element Edge Cases\n\nFor EVERY interactive element, test:\n\nInteraction States\n\nSingle interaction → expected action\nRepeated rapid interaction → no duplicate action\nLong press / right-click → context menu if applicable\nDisabled state → no action, visual feedback\n\nInput Field States\n\nEmpty → placeholder visible\nFocus → visual focus indicator\nValid input → no error\nInvalid input → error message\nMax length reached → prevents further input\nPaste → validates pasted content\nClear → resets to empty\n\nAsync Operation States\n\nBefore load → loading indicator\nDuring load → spinner, disabled submit\nSuccess → data displayed, spinner gone\nError → error message, retry option\nTimeout → timeout message, retry option\n\nNavigation Edge Cases\n\nBack navigation → previous screen or exit confirmation\nDeep link → correct screen with params\nInvalid deep link → fallback/error screen\nBrowser forward/back (web) → correct state\n\nScroll Edge Cases\n\nOverscroll → appropriate feedback\nScroll to hidden content → content becomes visible\nKeyboard appears → scroll to focused field\nNetwork Edge Cases\nNo internet → offline indicator, cached data if available\nSlow connection → loading states persist, timeout handling\nConnection restored → auto-retry pending operations\nServer error 500 → generic error message\nAuth error 401 → redirect to login\nPermission error 403 → permission denied message\nNot found 404 → \"not found\" message\nChaos Testing (Resilience)\n\nFor each target context, inject controlled failures:\n\nTimeout injection → API calls take >10s → verify timeout UI\nPartial response → API returns incomplete data → verify graceful degradation\nRate limiting → API returns 429 → verify retry-after behavior\nConcurrent mutations → Multiple clients modify same resource → verify conflict handling\nSession expiry → Token expires mid-flow → verify re-auth prompt\nVisual Regression Testing\n\nFor UI-heavy projects, Forge captures and compares screenshots to detect unintended visual changes:\n\nBefore fix — Capture baseline screenshots of all screens in the target context\nAfter fix — Capture new screenshots of the same screens\nCompare — Pixel-by-pixel comparison with configurable threshold (default: 0.1% diff tolerance)\nReport — Flag visual regressions as Gate 5 (Accessibility) warnings\nStore — Save screenshot diffs in memory for review\n\nScreenshot Capture by Platform:\n\nPlatform\tMethod\nWeb (Playwright)\tpage.screenshot({ fullPage: true })\nWeb (Cypress)\tcy.screenshot()\nFlutter\tawait tester.binding.setSurfaceSize(size); await expectLater(find.byType(App), matchesGoldenFile('name.png'))\nMobile (native)\tPlatform-specific screenshot capture\n\nConfiguration:\n\n# forge.config.yaml — Visual regression settings (optional)\nvisual_regression:\n  enabled: true\n  threshold: 0.001  # 0.1% pixel diff tolerance\n  screenshot_dir: .forge/screenshots\n  full_page: true\n\n\nWhen Agentic QE is available, delegate to the visual-tester agent for parallel viewport comparison across multiple screen sizes.\n\nINVOCATION MODES\n# Full autonomous run — all contexts, all gates\n/forge --autonomous --all\n\n# Single context autonomous\n/forge --autonomous --context [context-name]\n\n# Behavioral verification only (no fixes)\n/forge --verify-only\n/forge --verify-only --context [context-name]\n\n# Fix-only mode (fix failures, don't generate new tests)\n/forge --fix-only --context [context-name]\n\n# Learn mode (analyze patterns, update confidence tiers)\n/forge --learn\n\n# Add coverage for new screens/pages/components\n/forge --add-coverage --screens [name1],[name2]\n\n# Generate Gherkin specs for a context\n/forge --spec-gen --context [context-name]\n/forge --spec-gen --all\n\n# Run quality gates without test execution\n/forge --gates-only\n/forge --gates-only --context [context-name]\n\n# Defect prediction only\n/forge --predict\n/forge --predict --context [context-name]\n\n# Chaos/resilience testing for a context\n/forge --chaos --context [context-name]\n/forge --chaos --all\n\nMEMORY NAMESPACES\nNamespace\tPurpose\tKey Pattern\nforge-patterns\tFix patterns with confidence tiers\tfix-[error-type]-[hash]\nforge-results\tTest run results\ttest-run-[timestamp]\nforge-state\tCoverage + gate status\tforge-coverage-status, gates-[context]-[ts], last-green-commit\nforge-commits\tCommit history\tcommit-[hash]\nforge-screens\tImplemented screens/pages\tscreen-[name]\nforge-specs\tGherkin specifications\tspecs-[context]-[timestamp]\nforge-contracts\tAPI contract snapshots\tcontract-snapshot-[timestamp]\nforge-predictions\tDefect prediction history\tprediction-[date]\nOPTIONAL: AGENTIC QE INTEGRATION\n\nForge can optionally integrate with the Agentic QE framework via MCP for enhanced capabilities. All AQE features are additive — Forge works identically without AQE.\n\nDetection\n\nOn startup, Forge checks for AQE availability:\n\n# Check if agentic-qe MCP server is registered\nclaude mcp list | grep -q \"aqe\" && echo \"AQE available\" || echo \"AQE not available — using defaults\"\n\nEnhanced Capabilities When AQE Is Available\nForge Component\tWithout AQE (Default)\tWith AQE\nPattern Storage\tclaude-flow memory (forge-patterns namespace)\tReasoningBank — HNSW vector-indexed, 150x faster pattern search, experience replay\nDefect Prediction\tHistorical failure rates + file changes\tdefect-intelligence domain — root-cause-analyzer + defect-predictor agents\nSecurity Scanning\tGate 4 static checks (secrets, injection vectors)\tsecurity-compliance domain — full SAST/DAST via security-scanner agent\nAccessibility Audit\tForge Accessibility Auditor agent\tvisual-accessibility domain — visual-tester + accessibility-auditor agents\nContract Testing\tGate 7 schema validation\tcontract-testing domain — contract-validator + graphql-tester agents\nProgress Reporting\t.forge/progress.jsonl file\tAG-UI streaming protocol for real-time UI updates\nFallback Behavior\n\nWhen AQE is NOT available, Forge falls back to its built-in behavior for every capability. No configuration is required — the skill auto-detects and adapts.\n\nConfiguration\n# forge.config.yaml — AQE integration settings (optional)\nintegrations:\n  agentic-qe:\n    enabled: true  # auto-detected if not specified\n    domains:\n      - defect-intelligence\n      - security-compliance\n      - visual-accessibility\n      - contract-testing\n    reasoning_bank:\n      enabled: true  # replaces claude-flow memory for forge-patterns namespace\n    ag_ui:\n      enabled: true  # stream progress events to AG-UI protocol\n\nAQE Agent Delegation Map\n\nWhen AQE is enabled, Forge delegates specific subtasks to specialized AQE agents:\n\nForge Agent\tAQE Domain\tAQE Agents Used\nSpecification Verifier\trequirements-validation\tbdd-generator, requirements-validator\nFailure Analyzer\tdefect-intelligence\troot-cause-analyzer, defect-predictor\nQuality Gate Enforcer (Gate 4)\tsecurity-compliance\tsecurity-scanner, security-auditor\nAccessibility Auditor\tvisual-accessibility\tvisual-tester, accessibility-auditor\nQuality Gate Enforcer (Gate 7)\tcontract-testing\tcontract-validator, graphql-tester\nLearning Optimizer\tlearning-optimization\tlearning-coordinator, pattern-learner\n\nForge agents that have no AQE equivalent (Test Runner, Bug Fixer, Auto-Committer) continue to run as built-in agents regardless of AQE availability.\n\nDEFENSIVE TEST PATTERNS\n\nThe Bug Fixer agent uses defensive patterns appropriate to the project's test framework. Examples:\n\nFlutter: Safe Tap\nFuture<bool> safeTap(WidgetTester tester, Finder finder) async {\n  await tester.pumpAndSettle();\n  final elements = finder.evaluate();\n  if (elements.isNotEmpty) {\n    await tester.tap(finder.first, warnIfMissed: false);\n    await tester.pumpAndSettle();\n    return true;\n  }\n  debugPrint('Widget not found: ${finder.description}');\n  return false;\n}\n\nFlutter: Safe Text Entry\nFuture<bool> safeEnterText(WidgetTester tester, Finder finder, String text) async {\n  await tester.pumpAndSettle();\n  final elements = finder.evaluate();\n  if (elements.isNotEmpty) {\n    await tester.enterText(finder.first, text);\n    await tester.pumpAndSettle();\n    return true;\n  }\n  return false;\n}\n\nFlutter: Visual Observation Delay\nFuture<void> visualDelay(WidgetTester tester, {String? label}) async {\n  if (label != null) debugPrint('Observing: $label');\n  await tester.pump(const Duration(milliseconds: 2500));\n}\n\nFlutter: Scroll Until Visible\nFuture<bool> scrollUntilVisible(\n  WidgetTester tester,\n  Finder finder,\n  Finder scrollable,\n) async {\n  for (int i = 0; i < 10; i++) {\n    await tester.pumpAndSettle();\n    if (finder.evaluate().isNotEmpty) return true;\n    await tester.drag(scrollable, const Offset(0, -300));\n    await tester.pumpAndSettle();\n  }\n  return false;\n}\n\nFlutter: Wait For API Response\nFuture<void> waitForApiResponse(WidgetTester tester, {int maxWaitMs = 5000}) async {\n  final startTime = DateTime.now();\n  while (DateTime.now().difference(startTime).inMilliseconds < maxWaitMs) {\n    await tester.pump(const Duration(milliseconds: 100));\n    if (find.byType(CircularProgressIndicator).evaluate().isEmpty) break;\n  }\n  await tester.pumpAndSettle();\n}\n\nCypress / Playwright: Safe Click\nasync function safeClick(selector, options = { timeout: 5000 }) {\n  try {\n    await page.waitForSelector(selector, { state: 'visible', timeout: options.timeout });\n    await page.click(selector);\n    return true;\n  } catch (e) {\n    console.warn(`Element not found: ${selector}`);\n    return false;\n  }\n}\n\nCypress / Playwright: Wait For API\nasync function waitForApi(urlPattern, options = { timeout: 10000 }) {\n  return page.waitForResponse(\n    response => response.url().includes(urlPattern) && response.status() === 200,\n    { timeout: options.timeout }\n  );\n}\n\nCOMMON FIX PATTERNS\nPattern: Element Not Found\n{\n  \"error\": \"Element not found / No element / Bad state: No element\",\n  \"cause\": \"Element not rendered, wrong selector, or not in viewport\",\n  \"tier\": \"platinum\",\n  \"confidence\": 0.97,\n  \"fixes\": [\n    \"Wait for element to be rendered before interaction\",\n    \"Use safe interaction helpers instead of direct calls\",\n    \"Verify selector matches actual element\",\n    \"Scroll element into view before interaction\"\n  ]\n}\n\nPattern: Timeout\n{\n  \"error\": \"Timeout / pumpAndSettle timed out / waiting for selector\",\n  \"cause\": \"Infinite animation, continuous rebuild, or slow API\",\n  \"tier\": \"gold\",\n  \"confidence\": 0.89,\n  \"fixes\": [\n    \"Use fixed-duration wait instead of settle/idle wait\",\n    \"Dispose animation controllers in tearDown\",\n    \"Check for infinite re-render loops\",\n    \"Increase timeout for slow API calls\"\n  ]\n}\n\nPattern: Assertion Failed\n{\n  \"error\": \"Expected: X, Actual: Y / AssertionError\",\n  \"cause\": \"State not updated or wrong expectation\",\n  \"tier\": \"silver\",\n  \"confidence\": 0.78,\n  \"fixes\": [\n    \"Add delay before assertion for async state updates\",\n    \"Verify test data seeding completed\",\n    \"Check async operation completion before asserting\"\n  ]\n}\n\nPattern: API Response Mismatch\n{\n  \"error\": \"Type error / null value / schema mismatch\",\n  \"cause\": \"Backend response format changed\",\n  \"tier\": \"gold\",\n  \"confidence\": 0.86,\n  \"fixes\": [\n    \"Update model/DTO to match current API response\",\n    \"Add null safety handling\",\n    \"Check API version compatibility\"\n  ]\n}\n\nCOVERAGE TRACKING\n\nThe Learning Optimizer maintains coverage status per context:\n\n{\n  \"lastRun\": \"2026-02-07T11:00:00Z\",\n  \"backendStatus\": {\n    \"healthy\": true,\n    \"port\": 8080\n  },\n  \"gateStatus\": {\n    \"functional\": \"PASS\",\n    \"behavioral\": \"PASS\",\n    \"coverage\": \"PASS\",\n    \"security\": \"PASS\",\n    \"accessibility\": \"WARNING\",\n    \"resilience\": \"PASS\",\n    \"contract\": \"PASS\"\n  },\n  \"contexts\": {\n    \"[context-a]\": { \"total\": 68, \"passing\": 68, \"failing\": 0, \"behavioralCoverage\": 100 },\n    \"[context-b]\": { \"total\": 72, \"passing\": 70, \"failing\": 2, \"behavioralCoverage\": 97 }\n  },\n  \"totalPaths\": 0,\n  \"passingPaths\": 0,\n  \"coveragePercent\": 0,\n  \"confidenceTiers\": {\n    \"platinum\": 0,\n    \"gold\": 0,\n    \"silver\": 0,\n    \"bronze\": 0,\n    \"expired\": 0\n  }\n}\n\nAUTO-COMMIT MESSAGE FORMAT\nfix(forge): Fix [TEST_ID] - [brief description]\n\nBehavioral Spec: [Gherkin scenario name]\nRoot Cause: [what caused the failure]\n- [specific issue 1]\n- [specific issue 2]\n\nFix Applied:\n- [change 1]\n- [change 2]\n\nQuality Gates:\n- Functional: PASS\n- Behavioral: PASS\n- Coverage: [X]%\n- Security: PASS\n- Accessibility: PASS/WARNING\n- Resilience: PASS\n- Contract: PASS\n\nTest Verification:\n- Test now passes after fix\n- No regression in related tests\n- Dependent contexts re-tested: [list]\n\nConfidence Tier: [platinum|gold|silver|bronze]\nPattern Stored: fix-[error-type]-[hash]\n\nROLLBACK & CONFLICT RESOLUTION\nRollback Capability\n\nIf a fix introduces regressions:\n\n# Retrieve last known good commit\nnpx @claude-flow/cli@latest memory retrieve --key \"last-green-commit\" --namespace forge-state\n\n# Rollback to that commit\ngit revert [bad-commit-hash]\n\n# Store rollback event for learning (prevents pattern from being re-applied)\nnpx @claude-flow/cli@latest memory store \\\n  --key \"rollback-[timestamp]\" \\\n  --value '{\"commit\":\"[hash]\",\"reason\":\"[reason]\",\"pattern\":\"[pattern-key]\"}' \\\n  --namespace forge-patterns\n\n# Demote the fix pattern confidence (-0.10)\n# Learning Optimizer will handle this automatically\n\nFix Conflict Protocol\n\nWhen Bug Fixer's fix causes a cascade regression (tests in dependent contexts fail):\n\nHalt — Stop the fix loop for the affected context\nRe-analyze — Failure Analyzer examines both the original failure AND the cascade failure\nCategorize — Compare root cause categories:\nDifferent root cause → The fix is kept; the cascade failure is treated as a new, independent failure in the next loop iteration\nSame root cause → The fix is reverted and the pattern is demoted (-0.10 confidence)\nRevert limit — Maximum 2 revert cycles per test before escalating to user review\nEscalation — If 2 reverts occur for the same test, Forge pauses and reports:\nESCALATION: Test [testId] has regressed 2x after fix attempts.\nOriginal failure: [description]\nCascade failure: [description]\nAttempted fixes: [list]\nRecommendation: Manual review required.\n\nAgent Disagreement Resolution\n\nWhen two agents disagree (e.g., Bug Fixer wants to change a file that Spec Verifier says shouldn't change):\n\nQuality Gate Enforcer acts as arbiter — It evaluates both proposed states\nThe change that results in more gates passing wins\nTie-breaking order:\nFewer files changed (prefer minimal diff)\nHigher confidence tier (prefer proven patterns)\nBug Fixer defers to Spec Verifier (specs are source of truth)\nPOST-EXECUTION LEARNING\n\nAfter each autonomous run, the skill triggers comprehensive learning:\n\n# Train on successful patterns\nnpx @claude-flow/cli@latest hooks post-task --task-id \"forge-run\" --success true --store-results true\n\n# Update neural patterns\nnpx @claude-flow/cli@latest neural train --pattern-type forge-fixes --epochs 5\n\n# Update defect predictions\nnpx @claude-flow/cli@latest memory store \\\n  --key \"prediction-$(date +%Y-%m-%d)\" \\\n  --value \"[prediction JSON from Learning Optimizer]\" \\\n  --namespace forge-predictions\n\n# Export metrics\nnpx @claude-flow/cli@latest hooks metrics --format json\n\nPROJECT-SPECIFIC EXTENSIONS\n\nForge can be extended per-project by creating a forge.contexts.yaml file alongside the skill:\n\n# forge.contexts.yaml — Project-specific bounded contexts and screens\ncontexts:\n  - name: identity\n    testFile: click_through_identity_full_test.dart\n    specFile: identity.feature\n    paths: 68\n    subdomains: [Auth, Profiles, Verification]\n    screens:\n      - name: Identity Verification\n        file: lib/screens/compliance/identity_verification_screen.dart\n        route: /verification\n        cyclomaticPaths:\n          - All verifications incomplete -> show progress 0%\n          - Email only verified -> show 25%\n          - All verified -> show 100% + celebration state\n\n  - name: payments\n    testFile: click_through_payments_test.dart\n    specFile: payments.feature\n    paths: 89\n    subdomains: [Wallet, Cards, Transactions]\n\ndependencies:\n  identity:\n    blocks: [rides, payments, driver]\n  payments:\n    depends_on: [identity]\n    blocks: [rides, subscriptions]\n\n\nThis separates the generic Forge engine from project-specific configuration, making Forge reusable across any codebase.\n\nQUICK REFERENCE CHECKLIST\n\nBefore running Forge:\n\n Backend built and running\n Health check passes\n Test data seeded via real API calls\n No mocking or stubbing in test code\n Gherkin specs exist for target context (or will be generated)\n All new screens/pages have test coverage\n Edge cases documented and tested\n\nAfter Forge completes:\n\n Gate 1 (Functional): All tests pass\n Gate 2 (Behavioral): All targeted Gherkin scenarios satisfied\n Gate 3 (Coverage): >=85% overall, >=95% critical paths\n Gate 4 (Security): No hardcoded secrets, no injection vectors, no critical CVEs\n Gate 5 (Accessibility): WCAG AA compliance checked\n Gate 6 (Resilience): Offline/timeout/error states tested\n Gate 7 (Contract): API responses match expected schemas\n Confidence tiers updated for all applied fix patterns\n Defect predictions updated for next run\n All fixes committed with detailed messages"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/ikennaokpala/forge",
    "publisherUrl": "https://clawhub.ai/ikennaokpala/forge",
    "owner": "ikennaokpala",
    "version": "1.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/forge",
    "downloadUrl": "https://openagent3.xyz/downloads/forge",
    "agentUrl": "https://openagent3.xyz/skills/forge/agent",
    "manifestUrl": "https://openagent3.xyz/skills/forge/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/forge/agent.md"
  }
}