Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Query, optimize, and administer ClickHouse OLAP databases with schema design, performance tuning, and data ingestion patterns.
Query, optimize, and administer ClickHouse OLAP databases with schema design, performance tuning, and data ingestion patterns.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Real-time analytics on billions of rows. Sub-second queries. No indexes needed.
On first use, read setup.md for connection configuration.
User needs OLAP analytics, log analysis, time-series data, or real-time dashboards. Agent handles schema design, query optimization, data ingestion, and cluster administration.
Memory lives in ~/clickhouse/. See memory-template.md for structure. ~/clickhouse/ โโโ memory.md # Connection profiles + query patterns โโโ schemas/ # Table definitions per database โโโ queries/ # Saved analytical queries
TopicFileSetup & connectionsetup.mdMemory templatememory-template.mdQuery patternsqueries.mdPerformance tuningperformance.mdData ingestioningestion.md
Every table needs an explicit engine. Default to MergeTree family: -- Time-series / logs CREATE TABLE events ( timestamp DateTime, event_type String, data String ) ENGINE = MergeTree() ORDER BY (timestamp, event_type); -- Aggregated metrics CREATE TABLE daily_stats ( date Date, metric String, value AggregateFunction(sum, UInt64) ) ENGINE = AggregatingMergeTree() ORDER BY (date, metric);
ClickHouse has no traditional indexes. The ORDER BY clause determines data layout: Put high-cardinality filter columns first Put range columns (dates, timestamps) early Match your most common WHERE patterns -- Good: filters by user_id, then date range ORDER BY (user_id, date, event_type) -- Bad: date first when you filter by user_id ORDER BY (date, user_id, event_type)
Use CaseTypeWhyTimestampsDateTime or DateTime64Native time functionsLow-cardinality stringsLowCardinality(String)10x compressionEnums with few valuesEnum8 or Enum16Smallest footprintNullable only if neededNullable(T)Adds overheadIPsIPv4 or IPv64 bytes vs 16+
Never insert row-by-row. ClickHouse is optimized for batch writes: # Good: batch insert clickhouse-client --query="INSERT INTO events FORMAT JSONEachRow" < batch.json # Bad: individual inserts in a loop for row in data: INSERT INTO events VALUES (...) Minimum batch: 1,000 rows. Optimal: 10,000-100,000 rows.
Queries on ReplacingMergeTree/CollapsingMergeTree need FINAL for accuracy: -- May return duplicates/old versions SELECT * FROM users WHERE id = 123; -- Guaranteed latest version SELECT * FROM users FINAL WHERE id = 123; FINAL has performance cost. For dashboards, consider materialized views.
Pre-aggregate expensive computations: CREATE MATERIALIZED VIEW hourly_events ENGINE = SummingMergeTree() ORDER BY (hour, event_type) AS SELECT toStartOfHour(timestamp) AS hour, event_type, count() AS events FROM events GROUP BY hour, event_type;
Before debugging, check system tables: -- Running queries SELECT * FROM system.processes; -- Recent query performance SELECT query, elapsed, read_rows, memory_usage FROM system.query_log WHERE type = 'QueryFinish' ORDER BY event_time DESC LIMIT 10; -- Table sizes SELECT database, table, formatReadableSize(total_bytes) as size FROM system.tables ORDER BY total_bytes DESC;
String instead of LowCardinality โ 10x larger storage for status/type columns Wrong ORDER BY โ Full table scans instead of index lookups Row-by-row inserts โ Massive part fragmentation, slow writes Missing TTL โ Unbounded table growth, disk full **SELECT *** โ Reads all columns, kills columnar advantage Nullable everywhere โ Overhead + NULL handling complexity Forgetting FINAL โ Stale/duplicate data in merge tables
Before running expensive queries: Check EXPLAIN: EXPLAIN SELECT ... shows execution plan Sample first: SELECT ... FROM table SAMPLE 0.01 for 1% sample Limit columns: Only SELECT what you need Use PREWHERE: Filters before reading all columns Check parts: SELECT count() FROM system.parts WHERE table='X' -- PREWHERE optimization SELECT user_id, event_type, data FROM events PREWHERE date = today() WHERE event_type = 'click';
-- Delete old data ALTER TABLE events MODIFY TTL timestamp + INTERVAL 90 DAY; -- Move to cold storage ALTER TABLE events MODIFY TTL timestamp + INTERVAL 30 DAY TO VOLUME 'cold';
SELECT database, table, formatReadableSize(sum(bytes_on_disk)) as disk_size, sum(rows) as total_rows, count() as parts FROM system.parts WHERE active GROUP BY database, table ORDER BY sum(bytes_on_disk) DESC;
EndpointData SentPurposelocalhost:8123SQL queriesHTTP interfacelocalhost:9000SQL queriesNative TCP interface No external services contacted. All queries run against user-specified ClickHouse instances.
Data saved locally (with user consent): Connection profiles (host, port, database) in ~/clickhouse/memory.md Query patterns and schema documentation Authentication method preferences (password vs certificate) Important: If you provide database passwords, they are stored in plain text in ~/clickhouse/. Consider using environment variables or connection profiles managed by clickhouse-client instead. This skill does NOT: Connect to any ClickHouse without explicit user configuration Send data to external services Automatically collect or store credentials without asking
Install with clawhub install <slug> if user confirms: sql โ SQL query patterns analytics โ data analysis workflows data-analysis โ structured data exploration
If useful: clawhub star clickhouse Stay updated: clawhub sync
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.