Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Database Engineering Mastery covers schema design, indexing, query optimization, and migration for PostgreSQL, MySQL, SQLite, supporting OLTP/OLAP workloads.
Database Engineering Mastery covers schema design, indexing, query optimization, and migration for PostgreSQL, MySQL, SQLite, supporting OLTP/OLAP workloads.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Complete database design, optimization, migration, and operations system. From schema design to production monitoring — covers PostgreSQL, MySQL, SQLite, and general SQL patterns.
Before writing any DDL, fill this out: project: "" domain: "" primary_use_case: "OLTP | OLAP | mixed" expected_scale: rows_year_1: "" rows_year_3: "" concurrent_users: "" read_write_ratio: "80:20 | 50:50 | 20:80" compliance: [] # GDPR, HIPAA, PCI-DSS, SOX multi_tenancy: "none | schema-per-tenant | row-level | database-per-tenant"
FormRuleWhen to Denormalize1NFNo repeating groups, atomic valuesNever skip2NFNo partial dependencies on composite keysNever skip3NFNo transitive dependenciesReporting tables, read-heavy aggregationsBCNFEvery determinant is a candidate keyRarely needed unless complex key relationships Denormalization triggers: Query joins > 4 tables consistently Read latency > 100ms on indexed queries Cache invalidation complexity exceeds denormalization maintenance Reporting queries block OLTP workloads
Tables: snake_case, plural (users, order_items, payment_methods) Columns: snake_case, singular (first_name, created_at, is_active) PKs: id (bigint/uuid) or {table_singular}_id FKs: {referenced_table_singular}_id Indexes: idx_{table}_{columns} Constraints: chk_{table}_{rule}, uq_{table}_{columns}, fk_{table}_{ref} Enums: Use VARCHAR + CHECK, not DB enums (easier to migrate) Booleans: is_, has_, can_ prefix (is_active, has_subscription) Timestamps: _at suffix (created_at, updated_at, deleted_at)
Text < 255 chars, fixed set? → VARCHAR(N) + CHECK Text < 255 chars, variable? → VARCHAR(255) Text > 255 chars? → TEXT Whole numbers < 2B? → INTEGER Whole numbers > 2B? → BIGINT Money/financial? → NUMERIC(precision, scale) — NEVER float True/false? → BOOLEAN Date only? → DATE Date + time? → TIMESTAMPTZ (always with timezone) Unique identifier? → UUID (distributed) or BIGSERIAL (single DB) JSON/flexible schema? → JSONB (Postgres) or JSON (MySQL) Binary/file? → Store in object storage, reference by URL IP address? → INET (Postgres) or VARCHAR(45) Geospatial? → PostGIS geometry/geography types
CREATE TABLE {table_name} ( id BIGSERIAL PRIMARY KEY, -- domain columns here -- created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_by BIGINT REFERENCES users(id), version INTEGER NOT NULL DEFAULT 1, -- optimistic locking -- soft delete (optional) deleted_at TIMESTAMPTZ, -- multi-tenant (optional) tenant_id BIGINT NOT NULL REFERENCES tenants(id) ); -- Updated_at trigger (PostgreSQL) CREATE OR REPLACE FUNCTION update_modified_column() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = NOW(); NEW.version = OLD.version + 1; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER trg_{table_name}_updated BEFORE UPDATE ON {table_name} FOR EACH ROW EXECUTE FUNCTION update_modified_column();
One-to-Many: -- Parent CREATE TABLE departments (id BIGSERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL); -- Child CREATE TABLE employees ( id BIGSERIAL PRIMARY KEY, department_id BIGINT NOT NULL REFERENCES departments(id) ON DELETE RESTRICT, -- ON DELETE options: RESTRICT (safe default), CASCADE (children die), SET NULL ); CREATE INDEX idx_employees_department_id ON employees(department_id); Many-to-Many: CREATE TABLE user_roles ( user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE, role_id BIGINT NOT NULL REFERENCES roles(id) ON DELETE CASCADE, granted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), granted_by BIGINT REFERENCES users(id), PRIMARY KEY (user_id, role_id) ); Self-Referencing (hierarchy): CREATE TABLE categories ( id BIGSERIAL PRIMARY KEY, parent_id BIGINT REFERENCES categories(id) ON DELETE CASCADE, name VARCHAR(100) NOT NULL, depth INTEGER NOT NULL DEFAULT 0, path TEXT NOT NULL DEFAULT '' -- materialized path: '/1/5/12/' ); CREATE INDEX idx_categories_parent ON categories(parent_id); CREATE INDEX idx_categories_path ON categories(path text_pattern_ops); Polymorphic (avoid if possible, use if you must): -- Preferred: separate FKs CREATE TABLE comments ( id BIGSERIAL PRIMARY KEY, post_id BIGINT REFERENCES posts(id), ticket_id BIGINT REFERENCES tickets(id), body TEXT NOT NULL, CONSTRAINT chk_one_parent CHECK ( (post_id IS NOT NULL)::int + (ticket_id IS NOT NULL)::int = 1 ) );
Index TypeUse WhenExampleB-tree (default)Equality, range, sorting, LIKE 'prefix%'CREATE INDEX idx_users_email ON users(email)HashEquality only, no rangeCREATE INDEX idx_sessions_token ON sessions USING hash(token)GINJSONB, full-text search, arrays, tsvectorCREATE INDEX idx_products_tags ON products USING gin(tags)GiSTGeospatial, range types, nearest-neighborCREATE INDEX idx_locations_geom ON locations USING gist(geom)BRINVery large tables with natural ordering (time-series)CREATE INDEX idx_events_created ON events USING brin(created_at)PartialSubset of rowsCREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending'CoveringInclude columns to avoid table lookupCREATE INDEX idx_orders_user ON orders(user_id) INCLUDE (status, total)
Always index: Foreign keys, columns in WHERE/JOIN/ORDER BY Never index: Low-cardinality columns alone (boolean, status with 3 values) — combine in composite Composite order: Most selective column first, then left-to-right matches query patterns Watch write overhead: Each index slows INSERT/UPDATE. >8 indexes on a write-heavy table = review Unused index audit: Run monthly — drop indexes with 0 scans
SELECT schemaname, tablename, indexname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) as size FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexrelid NOT IN ( SELECT conindid FROM pg_constraint WHERE contype IN ('p', 'u') ) ORDER BY pg_relation_size(indexrelid) DESC;
SELECT relname, seq_scan, seq_tup_read, idx_scan, seq_tup_read / GREATEST(seq_scan, 1) as avg_tuples_per_scan FROM pg_stat_user_tables WHERE seq_scan > 100 AND seq_tup_read > 10000 ORDER BY seq_tup_read DESC; -- High seq_scan + high seq_tup_read = missing index candidate
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...; Red flags in query plans: PatternProblemFixSeq Scan on large tableMissing indexAdd appropriate indexNested Loop with large outerO(n×m) joinAdd index on join column, consider Hash JoinSort with high costMissing index for ORDER BYAdd index matching sort orderHash Join spilling to diskwork_mem too lowIncrease work_mem or reduce result setBitmap Heap Scan with many recheckLow selectivity indexMore selective index or partial indexSubPlan (correlated subquery)Executes per rowRewrite as JOIN or lateralRows estimate wildly wrongStale statisticsANALYZE table
1. SELECT * in production: -- Bad: fetches all columns, breaks covering indexes SELECT * FROM orders WHERE user_id = 123; -- Good: explicit columns SELECT id, status, total, created_at FROM orders WHERE user_id = 123; 2. N+1 queries: -- Bad: 1 query for users + N queries for orders SELECT id FROM users WHERE active = true; -- returns 100 rows SELECT * FROM orders WHERE user_id = ?; -- called 100 times -- Good: single JOIN or IN SELECT u.id, o.id, o.total FROM users u JOIN orders o ON o.user_id = u.id WHERE u.active = true; 3. Functions on indexed columns: -- Bad: can't use index on created_at WHERE EXTRACT(YEAR FROM created_at) = 2025 -- Good: range scan uses index WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01' -- Bad: can't use index on email WHERE LOWER(email) = 'user@example.com' -- Good: expression index CREATE INDEX idx_users_email_lower ON users(LOWER(email)); 4. OR conditions killing indexes: -- Bad: often causes Seq Scan WHERE status = 'pending' OR status = 'processing' -- Good: IN uses index WHERE status IN ('pending', 'processing') 5. Pagination with OFFSET: -- Bad: OFFSET 10000 scans and discards 10000 rows SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000; -- Good: keyset pagination SELECT * FROM products WHERE id > :last_seen_id ORDER BY id LIMIT 20; 6. COUNT(*) on large tables: -- Bad: full table scan SELECT COUNT(*) FROM events; -- Good: approximate count (PostgreSQL) SELECT reltuples::bigint FROM pg_class WHERE relname = 'events'; -- Or maintain a counter cache table
-- Running total SELECT id, amount, SUM(amount) OVER (ORDER BY created_at) as running_total FROM payments; -- Rank within group SELECT *, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) as dept_rank FROM employees; -- Previous/next row SELECT *, LAG(amount) OVER (ORDER BY created_at) as prev_amount, LEAD(amount) OVER (ORDER BY created_at) as next_amount FROM payments; -- Moving average SELECT *, AVG(amount) OVER (ORDER BY created_at ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7 FROM daily_sales; -- Percent of total SELECT *, amount / SUM(amount) OVER () * 100 as pct_of_total FROM line_items WHERE order_id = 1;
-- Recursive: org chart traversal WITH RECURSIVE org AS ( SELECT id, name, manager_id, 1 as depth FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.id, e.name, e.manager_id, o.depth + 1 FROM employees e JOIN org o ON e.manager_id = o.id WHERE o.depth < 10 -- safety limit ) SELECT * FROM org ORDER BY depth, name; -- Data pipeline: clean → transform → aggregate WITH cleaned AS ( SELECT *, TRIM(LOWER(email)) as clean_email FROM raw_signups WHERE email IS NOT NULL ), deduped AS ( SELECT DISTINCT ON (clean_email) * FROM cleaned ORDER BY clean_email, created_at DESC ) SELECT DATE_TRUNC('week', created_at) as week, COUNT(*) FROM deduped GROUP BY 1 ORDER BY 1;
Never rename columns/tables in production without a multi-step process Never add NOT NULL without a DEFAULT on existing tables with data Never drop columns that application code still references Always test migrations on a copy of production data first Always have a rollback plan (down migration) Always take a backup before schema changes in production
Add column (safe): -- Step 1: Add nullable column ALTER TABLE users ADD COLUMN phone VARCHAR(20); -- Step 2: Backfill (in batches!) UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN 1 AND 10000; -- Step 3: Add NOT NULL after backfill ALTER TABLE users ALTER COLUMN phone SET NOT NULL; ALTER TABLE users ALTER COLUMN phone SET DEFAULT ''; Rename column (safe multi-step): -- Step 1: Add new column ALTER TABLE users ADD COLUMN full_name VARCHAR(200); -- Step 2: Dual-write in application code (write to both old + new) -- Step 3: Backfill UPDATE users SET full_name = name WHERE full_name IS NULL; -- Step 4: Switch application to read from new column -- Step 5: Drop old column (after confirming no reads) ALTER TABLE users DROP COLUMN name; Add index without locking (PostgreSQL): CREATE INDEX CONCURRENTLY idx_orders_customer ON orders(customer_id); -- Takes longer but doesn't lock the table Large table backfill (batched): -- Don't: UPDATE millions of rows in one transaction -- Do: batch it DO $$ DECLARE batch_size INT := 5000; affected INT; BEGIN LOOP UPDATE users SET normalized_email = LOWER(email) WHERE normalized_email IS NULL AND id IN ( SELECT id FROM users WHERE normalized_email IS NULL LIMIT batch_size ); GET DIAGNOSTICS affected = ROW_COUNT; RAISE NOTICE 'Updated % rows', affected; EXIT WHEN affected = 0; COMMIT; END LOOP; END $$;
-- Migration: YYYYMMDDHHMMSS_description.sql -- Author: [name] -- Ticket: [JIRA/Linear ID] -- Risk: low|medium|high -- Rollback: see DOWN section -- Estimated time: [for production data volume] -- Requires: [prerequisite migrations] -- ========== UP ========== BEGIN; -- [DDL/DML here] COMMIT; -- ========== DOWN ========== -- BEGIN; -- [Rollback DDL/DML here] -- COMMIT; -- ========== VERIFY ========== -- [Queries to confirm migration succeeded] -- SELECT COUNT(*) FROM ... WHERE ...;
health_metrics: connections: active: "SELECT count(*) FROM pg_stat_activity WHERE state = 'active'" idle: "SELECT count(*) FROM pg_stat_activity WHERE state = 'idle'" max: "SHOW max_connections" threshold: "active > 80% of max = ALERT" cache_hit_ratio: query: | SELECT ROUND(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) as ratio FROM pg_statio_user_tables healthy: "> 99%" warning: "< 95%" critical: "< 90%" index_hit_ratio: query: | SELECT ROUND(100.0 * sum(idx_blks_hit) / NULLIF(sum(idx_blks_hit) + sum(idx_blks_read), 0), 2) as ratio FROM pg_statio_user_indexes healthy: "> 99%" table_bloat: query: | SELECT relname, n_dead_tup, n_live_tup, ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup, 0), 2) as dead_pct FROM pg_stat_user_tables WHERE n_dead_tup > 10000 ORDER BY n_dead_tup DESC LIMIT 10 action: "VACUUM ANALYZE {table} when dead_pct > 20%" slow_queries: query: | SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20 action: "Optimize top 5 by total_exec_time first" replication_lag: query: | SELECT EXTRACT(EPOCH FROM replay_lag) as lag_seconds FROM pg_stat_replication warning: "> 5 seconds" critical: "> 30 seconds"
SELECT relname as table, pg_size_pretty(pg_total_relation_size(relid)) as total_size, pg_size_pretty(pg_relation_size(relid)) as table_size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as index_size, n_live_tup as row_count FROM pg_stat_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 20;
-- Find blocking queries SELECT blocked.pid as blocked_pid, blocked.query as blocked_query, blocking.pid as blocking_pid, blocking.query as blocking_query, NOW() - blocked.query_start as blocked_duration FROM pg_stat_activity blocked JOIN pg_locks bl ON bl.pid = blocked.pid JOIN pg_locks kl ON kl.locktype = bl.locktype AND kl.relation = bl.relation AND kl.pid != bl.pid JOIN pg_stat_activity blocking ON blocking.pid = kl.pid WHERE NOT bl.granted;
MethodRPOSpeedUse Whenpg_dump (logical)Point-in-timeSlow for >50GBSmall-medium DBs, cross-version migrationpg_basebackup (physical)Continuous (with WAL)FastLarge DBs, same-version restoreWAL archiving (PITR)SecondsN/A (continuous)Production with near-zero RPOReplica promotionSecondsInstantHA failover
# Logical backup (compressed) pg_dump -Fc -Z 9 -j 4 -d mydb -f backup_$(date +%Y%m%d_%H%M%S).dump # Restore pg_restore -d mydb -j 4 --clean --if-exists backup_20260216.dump # Schema only pg_dump -s -d mydb -f schema.sql # Single table pg_dump -t orders -d mydb -f orders_backup.dump # Physical backup pg_basebackup -D /backup/base -Ft -z -P -X stream
Backup completes without errors Backup file size is within expected range (not suspiciously small) Restore to a test database succeeds Row counts match production (spot check 5 tables) Application can connect and query the restored database Run automated test suite against restored backup Backup encryption verified (if required) Offsite copy confirmed
-- Create application role (least privilege) CREATE ROLE app_user LOGIN PASSWORD 'use-vault-not-plaintext'; GRANT CONNECT ON DATABASE mydb TO app_user; GRANT USAGE ON SCHEMA public TO app_user; GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user; -- NO: GRANT ALL, superuser, CREATE, DROP -- Read-only role for analytics CREATE ROLE analyst LOGIN PASSWORD 'use-vault'; GRANT CONNECT ON DATABASE mydb TO analyst; GRANT USAGE ON SCHEMA public TO analyst; GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst; -- Row-Level Security (multi-tenant) ALTER TABLE orders ENABLE ROW LEVEL SECURITY; CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting('app.tenant_id')::bigint);
RULE 1: NEVER concatenate user input into SQL strings RULE 2: Always use parameterized queries / prepared statements RULE 3: Validate and whitelist table/column names if dynamic RULE 4: Use ORMs for CRUD, raw SQL only for complex queries RULE 5: Audit logs for unusual query patterns (UNION, DROP, --)
-- Encrypt sensitive columns (application-level) -- Store: pgp_sym_encrypt(data, key) -- Read: pgp_sym_decrypt(encrypted_col, key) -- Audit trail table CREATE TABLE audit_log ( id BIGSERIAL PRIMARY KEY, table_name VARCHAR(100) NOT NULL, record_id BIGINT NOT NULL, action VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE old_data JSONB, new_data JSONB, changed_by BIGINT REFERENCES users(id), changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), ip_address INET ); -- Generic audit trigger CREATE OR REPLACE FUNCTION audit_trigger() RETURNS TRIGGER AS $$ BEGIN INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by) VALUES ( TG_TABLE_NAME, COALESCE(NEW.id, OLD.id), TG_OP, CASE WHEN TG_OP != 'INSERT' THEN to_jsonb(OLD) END, CASE WHEN TG_OP != 'DELETE' THEN to_jsonb(NEW) END, current_setting('app.user_id', true)::bigint ); RETURN COALESCE(NEW, OLD); END; $$ LANGUAGE plpgsql;
SettingSmall (4GB RAM)Medium (16GB)Large (64GB+)shared_buffers1GB4GB16GBeffective_cache_size3GB12GB48GBwork_mem16MB64MB256MBmaintenance_work_mem256MB1GB2GBmax_connections100200300wal_buffers64MB128MB256MBrandom_page_cost1.1 (SSD)1.1 (SSD)1.1 (SSD)effective_io_concurrency200 (SSD)200 (SSD)200 (SSD)max_parallel_workers_per_gather248
[databases] mydb = host=127.0.0.1 port=5432 dbname=mydb [pgbouncer] pool_mode = transaction # transaction pooling (best for most apps) max_client_conn = 1000 # accept up to 1000 app connections default_pool_size = 25 # 25 actual DB connections per database reserve_pool_size = 5 # extra connections for burst reserve_pool_timeout = 3 # seconds before using reserve server_idle_timeout = 300 # close idle server connections after 5 min
-- Add to table ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ; CREATE INDEX idx_users_active ON users(id) WHERE deleted_at IS NULL; -- Application queries always filter SELECT * FROM users WHERE deleted_at IS NULL AND ...; -- Or use a view CREATE VIEW active_users AS SELECT * FROM users WHERE deleted_at IS NULL;
UPDATE products SET price = 29.99, version = version + 1, updated_at = NOW() WHERE id = 123 AND version = 5; -- expected version -- If 0 rows affected → concurrent modification → retry or error
CREATE TABLE events ( id BIGSERIAL PRIMARY KEY, aggregate_type VARCHAR(50) NOT NULL, aggregate_id UUID NOT NULL, event_type VARCHAR(100) NOT NULL, event_data JSONB NOT NULL, metadata JSONB DEFAULT '{}', version INTEGER NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), UNIQUE (aggregate_id, version) ); CREATE INDEX idx_events_aggregate ON events(aggregate_id, version); CREATE INDEX idx_events_type ON events(event_type, created_at);
-- Partitioned by month CREATE TABLE metrics ( id BIGSERIAL, sensor_id INTEGER NOT NULL, value NUMERIC(12,4) NOT NULL, recorded_at TIMESTAMPTZ NOT NULL ) PARTITION BY RANGE (recorded_at); CREATE TABLE metrics_2026_01 PARTITION OF metrics FOR VALUES FROM ('2026-01-01') TO ('2026-02-01'); CREATE TABLE metrics_2026_02 PARTITION OF metrics FOR VALUES FROM ('2026-02-01') TO ('2026-03-01'); -- Auto-create future partitions via cron or pg_partman -- Use BRIN index for time-series CREATE INDEX idx_metrics_time ON metrics USING brin(recorded_at);
-- Add search column ALTER TABLE articles ADD COLUMN search_vector tsvector; CREATE INDEX idx_articles_search ON articles USING gin(search_vector); -- Populate UPDATE articles SET search_vector = setweight(to_tsvector('english', COALESCE(title, '')), 'A') || setweight(to_tsvector('english', COALESCE(body, '')), 'B'); -- Search with ranking SELECT id, title, ts_rank(search_vector, query) as rank FROM articles, plainto_tsquery('english', 'database optimization') query WHERE search_vector @@ query ORDER BY rank DESC LIMIT 20;
-- Store flexible attributes CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, name VARCHAR(200) NOT NULL, attributes JSONB NOT NULL DEFAULT '{}', created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- Index specific JSON paths CREATE INDEX idx_products_color ON products((attributes->>'color')); -- Or GIN for any key lookups CREATE INDEX idx_products_attrs ON products USING gin(attributes); -- Query patterns SELECT * FROM products WHERE attributes->>'color' = 'red'; SELECT * FROM products WHERE attributes @> '{"size": "large"}'; SELECT * FROM products WHERE attributes ? 'warranty';
-- 1. Find and kill long-running queries SELECT pid, NOW() - query_start as duration, query FROM pg_stat_activity WHERE state = 'active' AND query_start < NOW() - INTERVAL '5 minutes' ORDER BY duration DESC; -- Kill a specific query SELECT pg_cancel_backend(pid); -- graceful SELECT pg_terminate_backend(pid); -- force -- 2. Check for lock contention (see Phase 5) -- 3. Reduce max connections temporarily -- In pgbouncer: pause database, reduce pool, resume -- 4. Check if VACUUM is needed SELECT relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables WHERE n_dead_tup > 100000 ORDER BY n_dead_tup DESC;
# 1. Check what's consuming space du -sh /var/lib/postgresql/*/main/ 2>/dev/null || du -sh /var/lib/mysql/ # 2. Clean up WAL files (PostgreSQL) — CAREFUL # Check replication slot status first SELECT slot_name, active FROM pg_replication_slots; # Drop inactive slots consuming WAL SELECT pg_drop_replication_slot('unused_slot'); # 3. VACUUM FULL largest tables (locks table!) VACUUM FULL large_table; # 4. Remove old backups / logs find /backups -name "*.dump" -mtime +7 -delete
Review slow query log (top 10 by total time) Check index usage stats — drop unused, add missing Verify backup success and test restore Check table bloat — schedule VACUUM where needed Review connection count trends Check disk space trajectory Review replication lag Update table statistics: ANALYZE;
FeaturePostgreSQLMySQL (InnoDB)SQLiteBest forComplex queries, extensionsWeb apps, read-heavyEmbedded, dev, small appsMax sizeUnlimited (practical)Unlimited (practical)281 TB (practical ~1TB)JSON supportJSONB (indexable, fast)JSON (limited indexing)JSON1 extensionFull-text searchBuilt-in (tsvector)Built-in (FULLTEXT)FTS5 extensionWindow functionsFull supportFull support (8.0+)Full support (3.25+)CTEsRecursive + materializedRecursive (8.0+)Recursive (3.8+)PartitioningDeclarative + list/range/hashRange/list/hash/keyNoneRow-level securityYesNo (use views)NoReplicationStreaming + logicalBinary logNone (use Litestream)Connection modelProcess per connectionThread per connectionIn-process
DimensionWeight0 (Poor)5 (Good)10 (Excellent)Schema Design20%No normalization, no constraints3NF, FKs, proper typesOptimal normal form, all constraints, audit fieldsIndexing15%No indexes beyond PKIndexes on FKs and common queriesCovering indexes, partials, no unused indexesQuery Quality20%SELECT *, N+1, no EXPLAINSpecific columns, JOINs, basic optimizationKeyset pagination, window functions, optimized plansMigration Safety10%Raw DDL, no rollbackVersioned files, up/downZero-downtime, batched backfills, concurrent indexesSecurity15%Superuser access, no auditLeast privilege, parameterized queriesRLS, encryption, audit triggers, regular access reviewMonitoring10%No monitoringBasic alerts on connections/diskFull dashboard, slow query analysis, proactive tuningBackup/Recovery10%No backupsDaily dumpsPITR, tested restores, offsite copies Score interpretation: <40 = Critical risk | 40-60 = Needs work | 60-80 = Solid | 80-90 = Professional | 90+ = Expert
"Design a schema for [domain]" → Phase 1 full design process "Optimize this query: [SQL]" → EXPLAIN analysis + rewrite "Add an index for [query pattern]" → Index type selection + creation "Write a migration to [change]" → Safe migration with rollback "Audit this database" → Full scoring across all dimensions "Set up monitoring for [database]" → Phase 5 dashboard queries "Review this schema" → Naming, types, constraints, relationships check "Help me with [PostgreSQL/MySQL/SQLite] [topic]" → Platform-specific guidance "Troubleshoot slow queries" → pg_stat_statements analysis + top fixes "Plan a backup strategy" → Phase 6 decision framework "Make this table multi-tenant" → RLS + tenant_id pattern "Convert this to use partitioning" → Phase 9 time-series pattern
Data access, storage, extraction, analysis, reporting, and insight generation.
Largest current source with strong distribution and engagement signals.