{
  "schemaVersion": "1.0",
  "item": {
    "slug": "perf-profiler",
    "name": "Performance Profiler",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/gitgoodordietrying/perf-profiler",
    "canonicalUrl": "https://clawhub.ai/gitgoodordietrying/perf-profiler",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/perf-profiler",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=perf-profiler",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/perf-profiler"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/perf-profiler",
    "agentPageUrl": "https://openagent3.xyz/skills/perf-profiler/agent",
    "manifestUrl": "https://openagent3.xyz/skills/perf-profiler/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/perf-profiler/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Performance Profiler",
        "body": "Measure, profile, and optimize application performance. Covers CPU profiling, memory analysis, flame graphs, benchmarking, load testing, and language-specific optimization patterns."
      },
      {
        "title": "When to Use",
        "body": "Diagnosing why an application or function is slow\nMeasuring CPU and memory usage\nGenerating flame graphs to visualize hot paths\nBenchmarking functions or endpoints\nLoad testing APIs before deployment\nFinding and fixing memory leaks\nOptimizing database query performance\nComparing performance before and after changes"
      },
      {
        "title": "Command-line timing",
        "body": "# Time any command\ntime my-command --flag\n\n# More precise: multiple runs with stats\nfor i in $(seq 1 10); do\n  /usr/bin/time -f \"%e\" my-command 2>&1\ndone | awk '{sum+=$1; sumsq+=$1*$1; count++} END {\n  avg=sum/count;\n  stddev=sqrt(sumsq/count - avg*avg);\n  printf \"runs=%d avg=%.3fs stddev=%.3fs\\n\", count, avg, stddev\n}'\n\n# Hyperfine (better benchmarking tool)\n# Install: https://github.com/sharkdp/hyperfine\nhyperfine 'command-a' 'command-b'\nhyperfine --warmup 3 --runs 20 'my-command'\nhyperfine --export-json results.json 'old-version' 'new-version'"
      },
      {
        "title": "Inline timing (any language)",
        "body": "// Node.js\nconsole.time('operation');\nawait doExpensiveThing();\nconsole.timeEnd('operation'); // \"operation: 142.3ms\"\n\n// High-resolution\nconst start = performance.now();\nawait doExpensiveThing();\nconst elapsed = performance.now() - start;\nconsole.log(`Elapsed: ${elapsed.toFixed(2)}ms`);\n\n# Python\nimport time\n\nstart = time.perf_counter()\ndo_expensive_thing()\nelapsed = time.perf_counter() - start\nprint(f\"Elapsed: {elapsed:.4f}s\")\n\n# Context manager\nfrom contextlib import contextmanager\n\n@contextmanager\ndef timer(label=\"\"):\n    start = time.perf_counter()\n    yield\n    elapsed = time.perf_counter() - start\n    print(f\"{label}: {elapsed:.4f}s\")\n\nwith timer(\"data processing\"):\n    process_data()\n\n// Go\nstart := time.Now()\ndoExpensiveThing()\nfmt.Printf(\"Elapsed: %v\\n\", time.Since(start))"
      },
      {
        "title": "CPU profiling with V8 inspector",
        "body": "# Generate CPU profile (writes .cpuprofile file)\nnode --cpu-prof app.js\n# Open the .cpuprofile in Chrome DevTools > Performance tab\n\n# Profile for a specific duration\nnode --cpu-prof --cpu-prof-interval=100 app.js\n\n# Inspect running process\nnode --inspect app.js\n# Open chrome://inspect in Chrome, click \"inspect\"\n# Go to Performance tab, click Record"
      },
      {
        "title": "Heap snapshots (memory)",
        "body": "# Generate heap snapshot\nnode --heap-prof app.js\n\n# Take snapshots programmatically\nnode -e \"\nconst v8 = require('v8');\nconst fs = require('fs');\n\n// Take snapshot\nconst snapshotStream = v8.writeHeapSnapshot();\nconsole.log('Heap snapshot written to:', snapshotStream);\n\"\n\n# Compare heap snapshots to find leaks:\n# 1. Take snapshot A (baseline)\n# 2. Run operations that might leak\n# 3. Take snapshot B\n# 4. In Chrome DevTools > Memory, load both and use \"Comparison\" view"
      },
      {
        "title": "Memory usage monitoring",
        "body": "// Print memory usage periodically\nsetInterval(() => {\n  const usage = process.memoryUsage();\n  console.log({\n    rss: `${(usage.rss / 1024 / 1024).toFixed(1)}MB`,\n    heapUsed: `${(usage.heapUsed / 1024 / 1024).toFixed(1)}MB`,\n    heapTotal: `${(usage.heapTotal / 1024 / 1024).toFixed(1)}MB`,\n    external: `${(usage.external / 1024 / 1024).toFixed(1)}MB`,\n  });\n}, 5000);\n\n// Detect memory growth\nlet lastHeap = 0;\nsetInterval(() => {\n  const heap = process.memoryUsage().heapUsed;\n  const delta = heap - lastHeap;\n  if (delta > 1024 * 1024) { // > 1MB growth\n    console.warn(`Heap grew by ${(delta / 1024 / 1024).toFixed(1)}MB`);\n  }\n  lastHeap = heap;\n}, 10000);"
      },
      {
        "title": "Node.js benchmarking",
        "body": "// Simple benchmark function\nfunction benchmark(name, fn, iterations = 10000) {\n  // Warmup\n  for (let i = 0; i < 100; i++) fn();\n\n  const start = performance.now();\n  for (let i = 0; i < iterations; i++) fn();\n  const elapsed = performance.now() - start;\n\n  console.log(`${name}: ${(elapsed / iterations).toFixed(4)}ms/op (${iterations} iterations in ${elapsed.toFixed(1)}ms)`);\n}\n\nbenchmark('JSON.parse', () => JSON.parse('{\"key\":\"value\",\"num\":42}'));\nbenchmark('regex match', () => /^\\d{4}-\\d{2}-\\d{2}$/.test('2026-02-03'));"
      },
      {
        "title": "cProfile (built-in CPU profiler)",
        "body": "# Profile a script\npython3 -m cProfile -s cumulative my_script.py\n\n# Save to file for analysis\npython3 -m cProfile -o profile.prof my_script.py\n\n# Analyze saved profile\npython3 -c \"\nimport pstats\nstats = pstats.Stats('profile.prof')\nstats.sort_stats('cumulative')\nstats.print_stats(20)\n\"\n\n# Profile a specific function\npython3 -c \"\nimport cProfile\nfrom my_module import expensive_function\n\ncProfile.run('expensive_function()', sort='cumulative')\n\""
      },
      {
        "title": "line_profiler (line-by-line)",
        "body": "# Install\npip install line_profiler\n\n# Add @profile decorator to functions of interest, then:\nkernprof -l -v my_script.py\n\n# Programmatic usage\nfrom line_profiler import LineProfiler\n\ndef process_data(data):\n    result = []\n    for item in data:           # Is this loop the bottleneck?\n        transformed = transform(item)\n        if validate(transformed):\n            result.append(transformed)\n    return result\n\nprofiler = LineProfiler()\nprofiler.add_function(process_data)\nprofiler.enable()\nprocess_data(large_dataset)\nprofiler.disable()\nprofiler.print_stats()"
      },
      {
        "title": "Memory profiling (Python)",
        "body": "# memory_profiler\npip install memory_profiler\n\n# Profile memory line-by-line\npython3 -m memory_profiler my_script.py\n\nfrom memory_profiler import profile\n\n@profile\ndef load_data():\n    data = []\n    for i in range(1000000):\n        data.append({'id': i, 'value': f'item_{i}'})\n    return data\n\n# Track memory over time\nimport tracemalloc\n\ntracemalloc.start()\n\n# ... run code ...\n\nsnapshot = tracemalloc.take_snapshot()\ntop_stats = snapshot.statistics('lineno')\nfor stat in top_stats[:10]:\n    print(stat)"
      },
      {
        "title": "Python benchmarking",
        "body": "import timeit\n\n# Time a statement\nresult = timeit.timeit('sorted(range(1000))', number=10000)\nprint(f\"sorted: {result:.4f}s for 10000 iterations\")\n\n# Compare two approaches\nsetup = \"data = list(range(10000))\"\nt1 = timeit.timeit('list(filter(lambda x: x % 2 == 0, data))', setup=setup, number=1000)\nt2 = timeit.timeit('[x for x in data if x % 2 == 0]', setup=setup, number=1000)\nprint(f\"filter: {t1:.4f}s  |  listcomp: {t2:.4f}s  |  speedup: {t1/t2:.2f}x\")\n\n# pytest-benchmark\n# pip install pytest-benchmark\n# def test_sort(benchmark):\n#     benchmark(sorted, list(range(1000)))"
      },
      {
        "title": "Built-in pprof",
        "body": "// Add to main.go for HTTP-accessible profiling\nimport (\n    \"net/http\"\n    _ \"net/http/pprof\"\n)\n\nfunc main() {\n    go func() {\n        http.ListenAndServe(\"localhost:6060\", nil)\n    }()\n    // ... rest of app\n}\n\n# CPU profile (30 seconds)\ngo tool pprof http://localhost:6060/debug/pprof/profile?seconds=30\n\n# Memory profile\ngo tool pprof http://localhost:6060/debug/pprof/heap\n\n# Goroutine profile\ngo tool pprof http://localhost:6060/debug/pprof/goroutine\n\n# Inside pprof interactive mode:\n# top 20          - top functions by CPU/memory\n# list funcName   - source code with annotations\n# web             - open flame graph in browser\n# png > out.png   - save call graph as image"
      },
      {
        "title": "Go benchmarks",
        "body": "// math_test.go\nfunc BenchmarkAdd(b *testing.B) {\n    for i := 0; i < b.N; i++ {\n        Add(42, 58)\n    }\n}\n\nfunc BenchmarkSort1000(b *testing.B) {\n    data := make([]int, 1000)\n    for i := range data {\n        data[i] = rand.Intn(1000)\n    }\n    b.ResetTimer()\n    for i := 0; i < b.N; i++ {\n        sort.Ints(append([]int{}, data...))\n    }\n}\n\n# Run benchmarks\ngo test -bench=. -benchmem ./...\n\n# Compare before/after\ngo test -bench=. -count=5 ./... > old.txt\n# ... make changes ...\ngo test -bench=. -count=5 ./... > new.txt\ngo install golang.org/x/perf/cmd/benchstat@latest\nbenchstat old.txt new.txt"
      },
      {
        "title": "Generate flame graphs",
        "body": "# Node.js: 0x (easiest)\nnpx 0x app.js\n# Opens interactive flame graph in browser\n\n# Node.js: clinic.js (comprehensive)\nnpx clinic flame -- node app.js\nnpx clinic doctor -- node app.js\nnpx clinic bubbleprof -- node app.js\n\n# Python: py-spy (sampling profiler, no code changes needed)\npip install py-spy\npy-spy record -o flame.svg -- python3 my_script.py\n\n# Profile running Python process\npy-spy record -o flame.svg --pid 12345\n\n# Go: built-in\ngo tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30\n# Navigate to \"Flame Graph\" view\n\n# Linux (any process): perf + flamegraph\nperf record -g -p PID -- sleep 30\nperf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg"
      },
      {
        "title": "Reading flame graphs",
        "body": "Key concepts:\n- X-axis: NOT time. It's alphabetical sort of stack frames. Width = % of samples.\n- Y-axis: Stack depth. Top = leaf function (where CPU time is spent).\n- Wide bars at the top = hot functions (optimize these first).\n- Narrow tall stacks = deep call chains (may indicate excessive abstraction).\n\nWhat to look for:\n1. Wide plateaus at the top → function that dominates CPU time\n2. Multiple paths converging to one function → shared bottleneck\n3. GC/runtime frames taking significant width → memory pressure\n4. Unexpected functions appearing wide → performance bug"
      },
      {
        "title": "curl-based quick test",
        "body": "# Single request timing\ncurl -o /dev/null -s -w \"HTTP %{http_code} | Total: %{time_total}s | TTFB: %{time_starttransfer}s | Connect: %{time_connect}s\\n\" https://api.example.com/endpoint\n\n# Multiple requests in sequence\nfor i in $(seq 1 20); do\n  curl -o /dev/null -s -w \"%{time_total}\\n\" https://api.example.com/endpoint\ndone | awk '{sum+=$1; count++; if($1>max)max=$1} END {printf \"avg=%.3fs max=%.3fs n=%d\\n\", sum/count, max, count}'"
      },
      {
        "title": "Apache Bench (ab)",
        "body": "# 100 requests, 10 concurrent\nab -n 100 -c 10 http://localhost:3000/api/endpoint\n\n# With POST data\nab -n 100 -c 10 -p data.json -T application/json http://localhost:3000/api/endpoint\n\n# Key metrics to watch:\n# - Requests per second (throughput)\n# - Time per request (latency)\n# - Percentage of requests served within a certain time (p50, p90, p99)"
      },
      {
        "title": "wrk (modern load testing)",
        "body": "# Install: https://github.com/wg/wrk\n# 10 seconds, 4 threads, 100 connections\nwrk -t4 -c100 -d10s http://localhost:3000/api/endpoint\n\n# With Lua script for custom requests\nwrk -t4 -c100 -d10s -s post.lua http://localhost:3000/api/endpoint\n\n-- post.lua\nwrk.method = \"POST\"\nwrk.body   = '{\"key\": \"value\"}'\nwrk.headers[\"Content-Type\"] = \"application/json\"\n\n-- Custom request generation\nrequest = function()\n  local id = math.random(1, 10000)\n  local path = \"/api/users/\" .. id\n  return wrk.format(\"GET\", path)\nend"
      },
      {
        "title": "Autocannon (Node.js load testing)",
        "body": "npx autocannon -c 100 -d 10 http://localhost:3000/api/endpoint\nnpx autocannon -c 100 -d 10 -m POST -b '{\"key\":\"value\"}' -H 'Content-Type=application/json' http://localhost:3000/api/endpoint"
      },
      {
        "title": "EXPLAIN analysis",
        "body": "# PostgreSQL\npsql -c \"EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE user_id = 123;\"\n\n# MySQL\nmysql -e \"EXPLAIN SELECT * FROM orders WHERE user_id = 123;\" mydb\n\n# SQLite\nsqlite3 mydb.sqlite \"EXPLAIN QUERY PLAN SELECT * FROM orders WHERE user_id = 123;\""
      },
      {
        "title": "Slow query detection",
        "body": "# PostgreSQL: enable slow query logging\n# In postgresql.conf:\n# log_min_duration_statement = 100  (ms)\n\n# MySQL: slow query log\n# In my.cnf:\n# slow_query_log = 1\n# long_query_time = 0.1\n\n# Find queries missing indexes (PostgreSQL)\npsql -c \"\nSELECT schemaname, relname, seq_scan, seq_tup_read,\n       idx_scan, idx_tup_fetch,\n       seq_tup_read / GREATEST(seq_scan, 1) AS avg_rows_per_scan\nFROM pg_stat_user_tables\nWHERE seq_scan > 100 AND seq_tup_read / GREATEST(seq_scan, 1) > 1000\nORDER BY seq_tup_read DESC\nLIMIT 10;\n\""
      },
      {
        "title": "Node.js",
        "body": "// Track object counts over time\nconst v8 = require('v8');\n\nfunction checkMemory() {\n  const heap = v8.getHeapStatistics();\n  const usage = process.memoryUsage();\n  return {\n    heapUsedMB: (usage.heapUsed / 1024 / 1024).toFixed(1),\n    heapTotalMB: (usage.heapTotal / 1024 / 1024).toFixed(1),\n    rssMB: (usage.rss / 1024 / 1024).toFixed(1),\n    externalMB: (usage.external / 1024 / 1024).toFixed(1),\n    arrayBuffersMB: (usage.arrayBuffers / 1024 / 1024).toFixed(1),\n  };\n}\n\n// Sample every 10s, alert on growth\nlet baseline = process.memoryUsage().heapUsed;\nsetInterval(() => {\n  const current = process.memoryUsage().heapUsed;\n  const growthMB = (current - baseline) / 1024 / 1024;\n  if (growthMB > 50) {\n    console.warn(`Memory grew ${growthMB.toFixed(1)}MB since start`);\n    console.warn(checkMemory());\n  }\n}, 10000);"
      },
      {
        "title": "Common leak patterns",
        "body": "Node.js:\n- Event listeners not removed (emitter.on without emitter.off)\n- Closures capturing large objects in long-lived scopes\n- Global caches without eviction (Map/Set that only grows)\n- Unresolved promises accumulating\n\nPython:\n- Circular references (use weakref for caches)\n- Global lists/dicts that grow unbounded\n- File handles not closed (use context managers)\n- C extension objects not properly freed\n\nGo:\n- Goroutine leaks (goroutine started, never returns)\n- Forgotten channel listeners\n- Unclosed HTTP response bodies\n- Global maps that grow forever"
      },
      {
        "title": "Performance Comparison Script",
        "body": "#!/bin/bash\n# perf-compare.sh - Compare performance before/after a change\n# Usage: perf-compare.sh <command> [runs]\nCMD=\"${1:?Usage: perf-compare.sh <command> [runs]}\"\nRUNS=\"${2:-10}\"\n\necho \"Benchmarking: $CMD\"\necho \"Runs: $RUNS\"\necho \"\"\n\ntimes=()\nfor i in $(seq 1 \"$RUNS\"); do\n  start=$(date +%s%N)\n  eval \"$CMD\" > /dev/null 2>&1\n  end=$(date +%s%N)\n  elapsed=$(echo \"scale=3; ($end - $start) / 1000000\" | bc)\n  times+=(\"$elapsed\")\n  printf \"  Run %2d: %sms\\n\" \"$i\" \"$elapsed\"\ndone\n\necho \"\"\nprintf '%s\\n' \"${times[@]}\" | awk '{\n  sum += $1\n  sumsq += $1 * $1\n  if (NR == 1 || $1 < min) min = $1\n  if (NR == 1 || $1 > max) max = $1\n  count++\n} END {\n  avg = sum / count\n  stddev = sqrt(sumsq/count - avg*avg)\n  printf \"Results: avg=%.1fms min=%.1fms max=%.1fms stddev=%.1fms (n=%d)\\n\", avg, min, max, stddev, count\n}'"
      },
      {
        "title": "Tips",
        "body": "Profile before optimizing. Guessing where bottlenecks are is wrong more often than right. Measure first.\nOptimize the hot path. Flame graphs show you exactly which functions consume the most time. A 10% improvement in a function that takes 80% of CPU time is worth more than a 50% improvement in one that takes 2%.\nMemory and CPU are different problems. A memory leak can exist in fast code. A CPU bottleneck can exist in code with stable memory. Profile both independently.\nBenchmark under realistic conditions. Microbenchmarks (empty loops, single-function timing) can be misleading due to JIT optimization, caching, and branch prediction. Use realistic data and workloads.\np99 matters more than average. An API with 50ms average but 2s p99 has a tail latency problem. Always look at percentiles, not just averages.\nLoad test before shipping. ab, wrk, or autocannon for 60 seconds at expected peak traffic reveals problems that unit tests never will.\nGC pauses are real. In Node.js, Python, Go, and Java, garbage collection can cause latency spikes. If flame graphs show significant GC time, reduce allocation pressure (reuse objects, use object pools, avoid unnecessary copies).\nDatabase queries are usually the bottleneck. Before optimizing application code, run EXPLAIN on your slowest queries. An index can turn a 2-second query into 2ms."
      }
    ],
    "body": "Performance Profiler\n\nMeasure, profile, and optimize application performance. Covers CPU profiling, memory analysis, flame graphs, benchmarking, load testing, and language-specific optimization patterns.\n\nWhen to Use\nDiagnosing why an application or function is slow\nMeasuring CPU and memory usage\nGenerating flame graphs to visualize hot paths\nBenchmarking functions or endpoints\nLoad testing APIs before deployment\nFinding and fixing memory leaks\nOptimizing database query performance\nComparing performance before and after changes\nQuick Timing\nCommand-line timing\n# Time any command\ntime my-command --flag\n\n# More precise: multiple runs with stats\nfor i in $(seq 1 10); do\n  /usr/bin/time -f \"%e\" my-command 2>&1\ndone | awk '{sum+=$1; sumsq+=$1*$1; count++} END {\n  avg=sum/count;\n  stddev=sqrt(sumsq/count - avg*avg);\n  printf \"runs=%d avg=%.3fs stddev=%.3fs\\n\", count, avg, stddev\n}'\n\n# Hyperfine (better benchmarking tool)\n# Install: https://github.com/sharkdp/hyperfine\nhyperfine 'command-a' 'command-b'\nhyperfine --warmup 3 --runs 20 'my-command'\nhyperfine --export-json results.json 'old-version' 'new-version'\n\nInline timing (any language)\n// Node.js\nconsole.time('operation');\nawait doExpensiveThing();\nconsole.timeEnd('operation'); // \"operation: 142.3ms\"\n\n// High-resolution\nconst start = performance.now();\nawait doExpensiveThing();\nconst elapsed = performance.now() - start;\nconsole.log(`Elapsed: ${elapsed.toFixed(2)}ms`);\n\n# Python\nimport time\n\nstart = time.perf_counter()\ndo_expensive_thing()\nelapsed = time.perf_counter() - start\nprint(f\"Elapsed: {elapsed:.4f}s\")\n\n# Context manager\nfrom contextlib import contextmanager\n\n@contextmanager\ndef timer(label=\"\"):\n    start = time.perf_counter()\n    yield\n    elapsed = time.perf_counter() - start\n    print(f\"{label}: {elapsed:.4f}s\")\n\nwith timer(\"data processing\"):\n    process_data()\n\n// Go\nstart := time.Now()\ndoExpensiveThing()\nfmt.Printf(\"Elapsed: %v\\n\", time.Since(start))\n\nNode.js Profiling\nCPU profiling with V8 inspector\n# Generate CPU profile (writes .cpuprofile file)\nnode --cpu-prof app.js\n# Open the .cpuprofile in Chrome DevTools > Performance tab\n\n# Profile for a specific duration\nnode --cpu-prof --cpu-prof-interval=100 app.js\n\n# Inspect running process\nnode --inspect app.js\n# Open chrome://inspect in Chrome, click \"inspect\"\n# Go to Performance tab, click Record\n\nHeap snapshots (memory)\n# Generate heap snapshot\nnode --heap-prof app.js\n\n# Take snapshots programmatically\nnode -e \"\nconst v8 = require('v8');\nconst fs = require('fs');\n\n// Take snapshot\nconst snapshotStream = v8.writeHeapSnapshot();\nconsole.log('Heap snapshot written to:', snapshotStream);\n\"\n\n# Compare heap snapshots to find leaks:\n# 1. Take snapshot A (baseline)\n# 2. Run operations that might leak\n# 3. Take snapshot B\n# 4. In Chrome DevTools > Memory, load both and use \"Comparison\" view\n\nMemory usage monitoring\n// Print memory usage periodically\nsetInterval(() => {\n  const usage = process.memoryUsage();\n  console.log({\n    rss: `${(usage.rss / 1024 / 1024).toFixed(1)}MB`,\n    heapUsed: `${(usage.heapUsed / 1024 / 1024).toFixed(1)}MB`,\n    heapTotal: `${(usage.heapTotal / 1024 / 1024).toFixed(1)}MB`,\n    external: `${(usage.external / 1024 / 1024).toFixed(1)}MB`,\n  });\n}, 5000);\n\n// Detect memory growth\nlet lastHeap = 0;\nsetInterval(() => {\n  const heap = process.memoryUsage().heapUsed;\n  const delta = heap - lastHeap;\n  if (delta > 1024 * 1024) { // > 1MB growth\n    console.warn(`Heap grew by ${(delta / 1024 / 1024).toFixed(1)}MB`);\n  }\n  lastHeap = heap;\n}, 10000);\n\nNode.js benchmarking\n// Simple benchmark function\nfunction benchmark(name, fn, iterations = 10000) {\n  // Warmup\n  for (let i = 0; i < 100; i++) fn();\n\n  const start = performance.now();\n  for (let i = 0; i < iterations; i++) fn();\n  const elapsed = performance.now() - start;\n\n  console.log(`${name}: ${(elapsed / iterations).toFixed(4)}ms/op (${iterations} iterations in ${elapsed.toFixed(1)}ms)`);\n}\n\nbenchmark('JSON.parse', () => JSON.parse('{\"key\":\"value\",\"num\":42}'));\nbenchmark('regex match', () => /^\\d{4}-\\d{2}-\\d{2}$/.test('2026-02-03'));\n\nPython Profiling\ncProfile (built-in CPU profiler)\n# Profile a script\npython3 -m cProfile -s cumulative my_script.py\n\n# Save to file for analysis\npython3 -m cProfile -o profile.prof my_script.py\n\n# Analyze saved profile\npython3 -c \"\nimport pstats\nstats = pstats.Stats('profile.prof')\nstats.sort_stats('cumulative')\nstats.print_stats(20)\n\"\n\n# Profile a specific function\npython3 -c \"\nimport cProfile\nfrom my_module import expensive_function\n\ncProfile.run('expensive_function()', sort='cumulative')\n\"\n\nline_profiler (line-by-line)\n# Install\npip install line_profiler\n\n# Add @profile decorator to functions of interest, then:\nkernprof -l -v my_script.py\n\n# Programmatic usage\nfrom line_profiler import LineProfiler\n\ndef process_data(data):\n    result = []\n    for item in data:           # Is this loop the bottleneck?\n        transformed = transform(item)\n        if validate(transformed):\n            result.append(transformed)\n    return result\n\nprofiler = LineProfiler()\nprofiler.add_function(process_data)\nprofiler.enable()\nprocess_data(large_dataset)\nprofiler.disable()\nprofiler.print_stats()\n\nMemory profiling (Python)\n# memory_profiler\npip install memory_profiler\n\n# Profile memory line-by-line\npython3 -m memory_profiler my_script.py\n\nfrom memory_profiler import profile\n\n@profile\ndef load_data():\n    data = []\n    for i in range(1000000):\n        data.append({'id': i, 'value': f'item_{i}'})\n    return data\n\n# Track memory over time\nimport tracemalloc\n\ntracemalloc.start()\n\n# ... run code ...\n\nsnapshot = tracemalloc.take_snapshot()\ntop_stats = snapshot.statistics('lineno')\nfor stat in top_stats[:10]:\n    print(stat)\n\nPython benchmarking\nimport timeit\n\n# Time a statement\nresult = timeit.timeit('sorted(range(1000))', number=10000)\nprint(f\"sorted: {result:.4f}s for 10000 iterations\")\n\n# Compare two approaches\nsetup = \"data = list(range(10000))\"\nt1 = timeit.timeit('list(filter(lambda x: x % 2 == 0, data))', setup=setup, number=1000)\nt2 = timeit.timeit('[x for x in data if x % 2 == 0]', setup=setup, number=1000)\nprint(f\"filter: {t1:.4f}s  |  listcomp: {t2:.4f}s  |  speedup: {t1/t2:.2f}x\")\n\n# pytest-benchmark\n# pip install pytest-benchmark\n# def test_sort(benchmark):\n#     benchmark(sorted, list(range(1000)))\n\nGo Profiling\nBuilt-in pprof\n// Add to main.go for HTTP-accessible profiling\nimport (\n    \"net/http\"\n    _ \"net/http/pprof\"\n)\n\nfunc main() {\n    go func() {\n        http.ListenAndServe(\"localhost:6060\", nil)\n    }()\n    // ... rest of app\n}\n\n# CPU profile (30 seconds)\ngo tool pprof http://localhost:6060/debug/pprof/profile?seconds=30\n\n# Memory profile\ngo tool pprof http://localhost:6060/debug/pprof/heap\n\n# Goroutine profile\ngo tool pprof http://localhost:6060/debug/pprof/goroutine\n\n# Inside pprof interactive mode:\n# top 20          - top functions by CPU/memory\n# list funcName   - source code with annotations\n# web             - open flame graph in browser\n# png > out.png   - save call graph as image\n\nGo benchmarks\n// math_test.go\nfunc BenchmarkAdd(b *testing.B) {\n    for i := 0; i < b.N; i++ {\n        Add(42, 58)\n    }\n}\n\nfunc BenchmarkSort1000(b *testing.B) {\n    data := make([]int, 1000)\n    for i := range data {\n        data[i] = rand.Intn(1000)\n    }\n    b.ResetTimer()\n    for i := 0; i < b.N; i++ {\n        sort.Ints(append([]int{}, data...))\n    }\n}\n\n# Run benchmarks\ngo test -bench=. -benchmem ./...\n\n# Compare before/after\ngo test -bench=. -count=5 ./... > old.txt\n# ... make changes ...\ngo test -bench=. -count=5 ./... > new.txt\ngo install golang.org/x/perf/cmd/benchstat@latest\nbenchstat old.txt new.txt\n\nFlame Graphs\nGenerate flame graphs\n# Node.js: 0x (easiest)\nnpx 0x app.js\n# Opens interactive flame graph in browser\n\n# Node.js: clinic.js (comprehensive)\nnpx clinic flame -- node app.js\nnpx clinic doctor -- node app.js\nnpx clinic bubbleprof -- node app.js\n\n# Python: py-spy (sampling profiler, no code changes needed)\npip install py-spy\npy-spy record -o flame.svg -- python3 my_script.py\n\n# Profile running Python process\npy-spy record -o flame.svg --pid 12345\n\n# Go: built-in\ngo tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30\n# Navigate to \"Flame Graph\" view\n\n# Linux (any process): perf + flamegraph\nperf record -g -p PID -- sleep 30\nperf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg\n\nReading flame graphs\nKey concepts:\n- X-axis: NOT time. It's alphabetical sort of stack frames. Width = % of samples.\n- Y-axis: Stack depth. Top = leaf function (where CPU time is spent).\n- Wide bars at the top = hot functions (optimize these first).\n- Narrow tall stacks = deep call chains (may indicate excessive abstraction).\n\nWhat to look for:\n1. Wide plateaus at the top → function that dominates CPU time\n2. Multiple paths converging to one function → shared bottleneck\n3. GC/runtime frames taking significant width → memory pressure\n4. Unexpected functions appearing wide → performance bug\n\nLoad Testing\ncurl-based quick test\n# Single request timing\ncurl -o /dev/null -s -w \"HTTP %{http_code} | Total: %{time_total}s | TTFB: %{time_starttransfer}s | Connect: %{time_connect}s\\n\" https://api.example.com/endpoint\n\n# Multiple requests in sequence\nfor i in $(seq 1 20); do\n  curl -o /dev/null -s -w \"%{time_total}\\n\" https://api.example.com/endpoint\ndone | awk '{sum+=$1; count++; if($1>max)max=$1} END {printf \"avg=%.3fs max=%.3fs n=%d\\n\", sum/count, max, count}'\n\nApache Bench (ab)\n# 100 requests, 10 concurrent\nab -n 100 -c 10 http://localhost:3000/api/endpoint\n\n# With POST data\nab -n 100 -c 10 -p data.json -T application/json http://localhost:3000/api/endpoint\n\n# Key metrics to watch:\n# - Requests per second (throughput)\n# - Time per request (latency)\n# - Percentage of requests served within a certain time (p50, p90, p99)\n\nwrk (modern load testing)\n# Install: https://github.com/wg/wrk\n# 10 seconds, 4 threads, 100 connections\nwrk -t4 -c100 -d10s http://localhost:3000/api/endpoint\n\n# With Lua script for custom requests\nwrk -t4 -c100 -d10s -s post.lua http://localhost:3000/api/endpoint\n\n-- post.lua\nwrk.method = \"POST\"\nwrk.body   = '{\"key\": \"value\"}'\nwrk.headers[\"Content-Type\"] = \"application/json\"\n\n-- Custom request generation\nrequest = function()\n  local id = math.random(1, 10000)\n  local path = \"/api/users/\" .. id\n  return wrk.format(\"GET\", path)\nend\n\nAutocannon (Node.js load testing)\nnpx autocannon -c 100 -d 10 http://localhost:3000/api/endpoint\nnpx autocannon -c 100 -d 10 -m POST -b '{\"key\":\"value\"}' -H 'Content-Type=application/json' http://localhost:3000/api/endpoint\n\nDatabase Query Performance\nEXPLAIN analysis\n# PostgreSQL\npsql -c \"EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE user_id = 123;\"\n\n# MySQL\nmysql -e \"EXPLAIN SELECT * FROM orders WHERE user_id = 123;\" mydb\n\n# SQLite\nsqlite3 mydb.sqlite \"EXPLAIN QUERY PLAN SELECT * FROM orders WHERE user_id = 123;\"\n\nSlow query detection\n# PostgreSQL: enable slow query logging\n# In postgresql.conf:\n# log_min_duration_statement = 100  (ms)\n\n# MySQL: slow query log\n# In my.cnf:\n# slow_query_log = 1\n# long_query_time = 0.1\n\n# Find queries missing indexes (PostgreSQL)\npsql -c \"\nSELECT schemaname, relname, seq_scan, seq_tup_read,\n       idx_scan, idx_tup_fetch,\n       seq_tup_read / GREATEST(seq_scan, 1) AS avg_rows_per_scan\nFROM pg_stat_user_tables\nWHERE seq_scan > 100 AND seq_tup_read / GREATEST(seq_scan, 1) > 1000\nORDER BY seq_tup_read DESC\nLIMIT 10;\n\"\n\nMemory Leak Detection Patterns\nNode.js\n// Track object counts over time\nconst v8 = require('v8');\n\nfunction checkMemory() {\n  const heap = v8.getHeapStatistics();\n  const usage = process.memoryUsage();\n  return {\n    heapUsedMB: (usage.heapUsed / 1024 / 1024).toFixed(1),\n    heapTotalMB: (usage.heapTotal / 1024 / 1024).toFixed(1),\n    rssMB: (usage.rss / 1024 / 1024).toFixed(1),\n    externalMB: (usage.external / 1024 / 1024).toFixed(1),\n    arrayBuffersMB: (usage.arrayBuffers / 1024 / 1024).toFixed(1),\n  };\n}\n\n// Sample every 10s, alert on growth\nlet baseline = process.memoryUsage().heapUsed;\nsetInterval(() => {\n  const current = process.memoryUsage().heapUsed;\n  const growthMB = (current - baseline) / 1024 / 1024;\n  if (growthMB > 50) {\n    console.warn(`Memory grew ${growthMB.toFixed(1)}MB since start`);\n    console.warn(checkMemory());\n  }\n}, 10000);\n\nCommon leak patterns\nNode.js:\n- Event listeners not removed (emitter.on without emitter.off)\n- Closures capturing large objects in long-lived scopes\n- Global caches without eviction (Map/Set that only grows)\n- Unresolved promises accumulating\n\nPython:\n- Circular references (use weakref for caches)\n- Global lists/dicts that grow unbounded\n- File handles not closed (use context managers)\n- C extension objects not properly freed\n\nGo:\n- Goroutine leaks (goroutine started, never returns)\n- Forgotten channel listeners\n- Unclosed HTTP response bodies\n- Global maps that grow forever\n\nPerformance Comparison Script\n#!/bin/bash\n# perf-compare.sh - Compare performance before/after a change\n# Usage: perf-compare.sh <command> [runs]\nCMD=\"${1:?Usage: perf-compare.sh <command> [runs]}\"\nRUNS=\"${2:-10}\"\n\necho \"Benchmarking: $CMD\"\necho \"Runs: $RUNS\"\necho \"\"\n\ntimes=()\nfor i in $(seq 1 \"$RUNS\"); do\n  start=$(date +%s%N)\n  eval \"$CMD\" > /dev/null 2>&1\n  end=$(date +%s%N)\n  elapsed=$(echo \"scale=3; ($end - $start) / 1000000\" | bc)\n  times+=(\"$elapsed\")\n  printf \"  Run %2d: %sms\\n\" \"$i\" \"$elapsed\"\ndone\n\necho \"\"\nprintf '%s\\n' \"${times[@]}\" | awk '{\n  sum += $1\n  sumsq += $1 * $1\n  if (NR == 1 || $1 < min) min = $1\n  if (NR == 1 || $1 > max) max = $1\n  count++\n} END {\n  avg = sum / count\n  stddev = sqrt(sumsq/count - avg*avg)\n  printf \"Results: avg=%.1fms min=%.1fms max=%.1fms stddev=%.1fms (n=%d)\\n\", avg, min, max, stddev, count\n}'\n\nTips\nProfile before optimizing. Guessing where bottlenecks are is wrong more often than right. Measure first.\nOptimize the hot path. Flame graphs show you exactly which functions consume the most time. A 10% improvement in a function that takes 80% of CPU time is worth more than a 50% improvement in one that takes 2%.\nMemory and CPU are different problems. A memory leak can exist in fast code. A CPU bottleneck can exist in code with stable memory. Profile both independently.\nBenchmark under realistic conditions. Microbenchmarks (empty loops, single-function timing) can be misleading due to JIT optimization, caching, and branch prediction. Use realistic data and workloads.\np99 matters more than average. An API with 50ms average but 2s p99 has a tail latency problem. Always look at percentiles, not just averages.\nLoad test before shipping. ab, wrk, or autocannon for 60 seconds at expected peak traffic reveals problems that unit tests never will.\nGC pauses are real. In Node.js, Python, Go, and Java, garbage collection can cause latency spikes. If flame graphs show significant GC time, reduce allocation pressure (reuse objects, use object pools, avoid unnecessary copies).\nDatabase queries are usually the bottleneck. Before optimizing application code, run EXPLAIN on your slowest queries. An index can turn a 2-second query into 2ms."
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/gitgoodordietrying/perf-profiler",
    "publisherUrl": "https://clawhub.ai/gitgoodordietrying/perf-profiler",
    "owner": "gitgoodordietrying",
    "version": "1.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/perf-profiler",
    "downloadUrl": "https://openagent3.xyz/downloads/perf-profiler",
    "agentUrl": "https://openagent3.xyz/skills/perf-profiler/agent",
    "manifestUrl": "https://openagent3.xyz/skills/perf-profiler/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/perf-profiler/agent.md"
  }
}