Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Be one of the first to benchmark your agent's memory — and help shape how AI remembers. Runs a peer-review-grade evaluation suite (LLM-as-judge, nDCG/MAP/MRR...
Be one of the first to benchmark your agent's memory — and help shape how AI remembers. Runs a peer-review-grade evaluation suite (LLM-as-judge, nDCG/MAP/MRR...
This item's current download entry is known to bounce back to a listing or homepage instead of returning a package file.
Use the source page and any available docs to guide the install because the item currently does not return a direct package file.
I tried to install a skill package from Yavira, but the item currently does not return a direct package file. Inspect the source page and any extracted docs, then tell me what you can confirm and any manual steps still required.
I tried to upgrade a skill package from Yavira, but the item currently does not return a direct package file. Compare the source page and any extracted docs with my current installation, then summarize what changed and what manual follow-up I still need.
Collect, assess, and submit anonymized memory system statistics for the ENGRAM and CORTEX research papers.
Run the standard test set (30 queries across 4 types × 3 difficulty levels) with LLM-as-judge: # Full assessment with GPT-4o-mini judge + ablation (recommended) python3 scripts/rate.py --queries 30 --judge openai --ablation # Without OpenAI key: local embedding judge (weaker, marked in output) python3 scripts/rate.py --queries 30 --judge local --ablation # Custom test set python3 scripts/rate.py --testset path/to/queries.json --judge openai What it measures: RAR (Recall Accuracy Ratio), MRR (Mean Reciprocal Rank) nDCG@5, MAP@5, Precision@5, Hit Rate All metrics include 95% bootstrap confidence intervals Ablation: runs with AND without spreading activation to isolate its contribution Judge methods: openai — GPT-4o-mini rates each (query, result) pair 1-5. Independent from retrieval system. ~$0.01 per run. local — Embedding cosine similarity. Weaker, marked as such in output. Zero cost. Standard test set (scripts/testset.json): 30 queries stratified across semantic/episodic/procedural/strategic types and easy/medium/hard difficulty. No lexical overlap with stored memories. All deployments run the same queries for cross-site comparability.
python3 scripts/collect.py --contributor GITHUB_USER --days 14 --output /tmp/memory-bench-report.json Collected (anonymized): Memory counts/types/ages, strength/importance histograms, association graph size, hierarchy levels, consolidation history, retrieval metrics (RAR/MRR/nDCG/MAP with CIs), ablation results, judge method, algorithm version, embedding coverage. Instance ID is a random UUID (not reversible). Never collected: Memory content, queries, file paths, usernames, hostnames.
scripts/submit.sh /tmp/memory-bench-report.json GITHUB_USERNAME Forks, branches, places report, updates INDEX.json, opens PR. Requires gh CLI.
For peer-review-ready data, contributors should: Run rate.py --ablation --judge openai (minimum N=30 queries) Collect at least 2 reports from the same instance, ≥7 days apart (longitudinal) Report the algorithm version (auto-captured from git)
Custom test sets are JSON arrays: [ { "id": "T01", "query": "...", "category": "semantic|episodic|procedural|strategic", "difficulty": "easy|medium|hard" } ]
When asked to submit benchmarks: run rate.py --ablation --judge openai, then collect.py, review summary, then submit.sh. Share the PR link.
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.