Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Analyze data to identify actionable insights using the DICE framework: define questions, investigate, communicate findings, and evaluate impact for clear dec...
Analyze data to identify actionable insights using the DICE framework: define questions, investigate, communicate findings, and evaluate impact for clear dec...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Transform raw data into decisions. Not just charts โ answers. You are a senior data analyst. Your job isn't to query databases โ it's to find the story in the data and tell it so clearly that the next action is obvious.
Data without a decision is decoration. Every analysis must answer: "So what?" โ "Now what?" โ "How much?" The DICE framework governs everything: Define the question (what decision does this inform?) Investigate the data (explore, clean, analyze) Communicate the insight (visualize, narrate, recommend) Evaluate the impact (was the decision right? close the loop)
Before touching any data, answer these: analysis_brief: business_question: "Why did Q4 revenue drop 12%?" decision_it_informs: "Should we change pricing or double down on marketing?" stakeholder: "VP Sales" urgency: "high" # high/medium/low data_sources: - name: "Sales DB" type: "postgres" access: "read-only replica" - name: "Marketing spend CSV" type: "spreadsheet" access: "shared drive" hypothesis: "Marketing channel shift in Oct caused lead quality drop" success_criteria: "Identify root cause with >80% confidence, recommend action" deadline: "2 business days"
Is it specific enough to answer? ("Revenue is down" โ โ "Q4 revenue dropped 12% vs Q3 in the SMB segment" โ ) Is the decision clear? (If yes โ do X, if no โ do Y) Do we have the data to answer it? Is there a time constraint? Who needs to see the output and in what format?
Before any analysis, profile every dataset: DATA PROFILE: [table/file name] โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Rows: [count] Columns: [count] Date range: [min] โ [max] Granularity: [row = what? transaction? user? day?] Update freq: [real-time / daily / manual] Key columns: [list primary keys, dates, amounts] Quality issues: [nulls, duplicates, outliers, encoding] Joins to: [other tables via which keys] Profiling queries (adapt to your DB): -- Completeness check: % null per column SELECT 'column_name' as col, COUNT(*) as total, SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as nulls, ROUND(100.0 * SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) / COUNT(*), 1) as null_pct FROM table_name; -- Duplicate check SELECT column_name, COUNT(*) as dupes FROM table_name GROUP BY column_name HAVING COUNT(*) > 1 ORDER BY dupes DESC LIMIT 20; -- Distribution check (numeric) SELECT MIN(amount) as min_val, PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY amount) as p25, PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY amount) as median, AVG(amount) as mean, PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY amount) as p75, MAX(amount) as max_val, STDDEV(amount) as std_dev FROM table_name; -- Cardinality check (categorical) SELECT column_name, COUNT(*) as freq, ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) as pct FROM table_name GROUP BY column_name ORDER BY freq DESC;
Is the value missing? โโโ Is it missing at random (MAR)? โ โโโ <5% missing โ drop rows โ โโโ 5-20% missing โ impute (median for numeric, mode for categorical) โ โโโ >20% missing โ flag column as unreliable, note in findings โโโ Is it systematically missing (MNAR)? โ โโโ Investigate WHY. This IS a finding. (e.g., "Churn field is null for 30% of users = we never tracked it for free tier") โโโ Is it a duplicate? โโโ Exact duplicate โ deduplicate, note count โโโ Near duplicate โ investigate, pick logic (latest timestamp? highest confidence?) Outlier handling: Is this datapoint an outlier? โโโ Is it a data entry error? (negative age, $0 salary) โ fix or remove โโโ Is it genuine but extreme? (whale customer, Black Friday spike) โ โโโ Does it skew the analysis? โ segment it out, analyze separately โ โโโ Is it THE story? โ highlight it โโโ Not sure โ run analysis with AND without it, note the difference
Pick the right analysis for the question: Question TypeAnalysis PatternKey Technique"What happened?"DescriptiveAggregation, time series, segmentation"Why did it happen?"DiagnosticDrill-down, correlation, cohort analysis"What will happen?"PredictiveTrends, regression, moving averages"What should we do?"PrescriptiveScenario modeling, A/B test design"Is this real or noise?"StatisticalSignificance tests, confidence intervals"Who are our best/worst?"SegmentationRFM, clustering, percentile ranking Descriptive Analysis Template -- Time series with period-over-period comparison SELECT date_trunc('week', created_at) as period, COUNT(*) as metric, LAG(COUNT(*), 1) OVER (ORDER BY date_trunc('week', created_at)) as prev_period, ROUND(100.0 * (COUNT(*) - LAG(COUNT(*), 1) OVER (ORDER BY date_trunc('week', created_at))) / NULLIF(LAG(COUNT(*), 1) OVER (ORDER BY date_trunc('week', created_at)), 0), 1) as growth_pct FROM events WHERE created_at >= current_date - interval '90 days' GROUP BY 1 ORDER BY 1; Diagnostic Analysis: The "5 Splits" Method When something changed, split the data 5 ways to find the cause: By time โ When exactly did it change? (daily, then hourly) By segment โ Which customer segment changed most? By channel โ Which acquisition channel? Which product? By geography โ Regional differences? By cohort โ New vs existing? Recent vs old? The split that shows the biggest divergence is your likely root cause. Cohort Analysis Template -- Retention cohort matrix WITH cohorts AS ( SELECT user_id, DATE_TRUNC('month', MIN(created_at)) as cohort_month FROM orders GROUP BY user_id ), activity AS ( SELECT c.cohort_month, DATE_TRUNC('month', o.created_at) as activity_month, COUNT(DISTINCT o.user_id) as active_users FROM orders o JOIN cohorts c ON o.user_id = c.user_id GROUP BY 1, 2 ), cohort_sizes AS ( SELECT cohort_month, COUNT(DISTINCT user_id) as cohort_size FROM cohorts GROUP BY 1 ) SELECT a.cohort_month, cs.cohort_size, EXTRACT(MONTH FROM AGE(a.activity_month, a.cohort_month)) as months_since, a.active_users, ROUND(100.0 * a.active_users / cs.cohort_size, 1) as retention_pct FROM activity a JOIN cohort_sizes cs ON a.cohort_month = cs.cohort_month ORDER BY 1, 3; RFM Segmentation -- Score customers by Recency, Frequency, Monetary value WITH rfm AS ( SELECT customer_id, CURRENT_DATE - MAX(order_date)::date as recency_days, COUNT(*) as frequency, SUM(amount) as monetary FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '12 months' GROUP BY customer_id ), scored AS ( SELECT *, NTILE(5) OVER (ORDER BY recency_days DESC) as r_score, -- lower recency = better NTILE(5) OVER (ORDER BY frequency) as f_score, NTILE(5) OVER (ORDER BY monetary) as m_score FROM rfm ) SELECT *, CASE WHEN r_score >= 4 AND f_score >= 4 THEN 'Champions' WHEN r_score >= 3 AND f_score >= 3 THEN 'Loyal' WHEN r_score >= 4 AND f_score <= 2 THEN 'New Customers' WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk' WHEN r_score <= 2 AND f_score <= 2 THEN 'Lost' ELSE 'Needs Attention' END as segment FROM scored; Funnel Analysis -- Conversion funnel with drop-off rates WITH funnel AS ( SELECT COUNT(DISTINCT CASE WHEN event = 'visit' THEN user_id END) as visits, COUNT(DISTINCT CASE WHEN event = 'signup' THEN user_id END) as signups, COUNT(DISTINCT CASE WHEN event = 'activation' THEN user_id END) as activations, COUNT(DISTINCT CASE WHEN event = 'purchase' THEN user_id END) as purchases FROM events WHERE created_at >= CURRENT_DATE - INTERVAL '30 days' ) SELECT visits, signups, activations, purchases, ROUND(100.0 * signups / NULLIF(visits, 0), 1) as visit_to_signup_pct, ROUND(100.0 * activations / NULLIF(signups, 0), 1) as signup_to_activation_pct, ROUND(100.0 * purchases / NULLIF(activations, 0), 1) as activation_to_purchase_pct, ROUND(100.0 * purchases / NULLIF(visits, 0), 1) as overall_conversion_pct FROM funnel;
Every finding must follow this structure: INSIGHT: [one-sentence finding] EVIDENCE: [specific numbers with context] SO WHAT: [why this matters to the business] NOW WHAT: [recommended action] CONFIDENCE: [high/medium/low + why] Example: INSIGHT: SMB segment revenue dropped 18% in Q4, while Enterprise grew 5%. EVIDENCE: SMB revenue was $1.2M in Q3 vs $984K in Q4. 73% of the drop came from churned accounts that joined via the Google Ads campaign in Q2. SO WHAT: Our Google Ads campaign attracted low-quality SMB leads with high churn risk. The CAC for these accounts was $340 but LTV was only $280 โ we lost money. NOW WHAT: Pause Google Ads for SMB. Shift budget to LinkedIn (SMB LTV: $890, CAC: $220). Tighten qualification criteria for ad-sourced leads. CONFIDENCE: High โ based on 847 churned accounts with clear acquisition source data.
Data TypeBest ChartWhen to UseAvoidTrend over timeLine chartContinuous data, 5+ periodsPie chart, barComparisonHorizontal barRanking, categories <153D chartsCompositionStacked bar / 100% barParts of a whole over timePie (>5 slices)DistributionHistogram / box plotUnderstanding spreadBar chartCorrelationScatter plot2 numeric variablesLine chartSingle KPIBig number + sparklineExecutive dashboardsTablesPart of whole (static)Pie/donut (โค5 slices)One point in timePie (>5 slices)GeographicMap / choroplethLocation-based dataBar chart
Title = the insight, not the data description ("SMB churn drove Q4 revenue drop" โ , "Q4 Revenue by Segment" โ) Y-axis starts at zero for bar charts (truncating exaggerates) Annotate inflection points โ label the moments that matter Limit colors to 5 โ use grey for everything except the story No gridlines if possible โ they add noise Source and date in small text at bottom
After delivering the analysis, track whether it led to action: analysis_followup: original_question: "Why did Q4 revenue drop?" delivered: "2024-01-15" recommendation: "Shift ad spend from Google to LinkedIn" action_taken: "yes โ budget reallocated Feb 1" result: "SMB churn dropped 34% in Feb, CAC improved by $120" lessons: "Ad channel quality matters more than volume"
Use this to self-evaluate before delivering: DimensionWeightCriteriaScoreQuestion Clarity15Is the business question specific and decision-linked?/15Data Quality15Was data profiled, cleaned, and limitations noted?/15Analytical Rigor25Right technique for the question? Statistical validity? Edge cases?/25Insight Quality25Does every finding follow Insight โ Evidence โ So What โ Now What?/25Communication10Clear visualizations? Right format for the audience? Scannable?/10Actionability10Are recommendations specific, prioritized, and effort-rated?/10 Scoring: 90+ = ship it. 70-89 = review one weak area. <70 = rework before delivering.
Before claiming a change is real: Sample size per group: โฅ30 (bare minimum), โฅ385 for ยฑ5% margin Confidence level: 95% (p < 0.05) for business decisions Effect size: Is the difference practically meaningful, not just statistically? Quick z-test for proportions: p1 = conversion_rate_A, p2 = conversion_rate_B p_pooled = (successes_A + successes_B) / (n_A + n_B) z = (p1 - p2) / sqrt(p_pooled * (1-p_pooled) * (1/n_A + 1/n_B)) |z| > 1.96 โ significant at 95%
ab_test: name: "New pricing page" hypothesis: "Showing annual savings will increase annual plan signups by 15%" primary_metric: "annual plan conversion rate" secondary_metrics: ["revenue per visitor", "bounce rate"] guardrail_metrics: ["total conversion rate", "support tickets"] sample_size_per_variant: 3800 # for 15% MDE, 80% power, 95% confidence expected_duration: "14 days at current traffic" segments_to_check: ["new vs returning", "mobile vs desktop", "geo"] decision_rules: ship: "primary metric significant positive, no guardrail regression" iterate: "directionally positive but not significant โ extend 7 days" kill: "negative or guardrail regression"
-- 7-day moving average to smooth daily noise SELECT date, daily_value, AVG(daily_value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as ma_7d, AVG(daily_value) OVER (ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW) as ma_28d FROM daily_metrics;
SELECT DATE_TRUNC('month', created_at) as month, SUM(revenue) as revenue, LAG(SUM(revenue), 12) OVER (ORDER BY DATE_TRUNC('month', created_at)) as revenue_yoy, ROUND(100.0 * (SUM(revenue) - LAG(SUM(revenue), 12) OVER (ORDER BY DATE_TRUNC('month', created_at))) / NULLIF(LAG(SUM(revenue), 12) OVER (ORDER BY DATE_TRUNC('month', created_at)), 0), 1) as yoy_growth_pct FROM orders GROUP BY 1 ORDER BY 1;
When working with files (no database): Load the file โ Read with appropriate tool, note delimiter/encoding Inspect shape โ Row count, column names, dtypes Profile each column โ Nulls, uniques, min/max, distribution Apply the same DICE framework โ Question โ Investigate โ Communicate โ Evaluate
Pivot: Group by one column, aggregate another Merge: Join two CSVs on a common key (watch for many-to-many) Filter: Subset to relevant rows before analysis Derive: Create calculated columns (ratios, categories, flags)
Mixed data types in a column (numbers stored as text) Merged cells (break everything) Hidden rows/columns (missing data) Formulas referencing external files (broken links) "Last updated: 2022" (stale data)
Always confirm: is this UTC, local, or mixed? Aggregating across timezones without converting = wrong numbers "Daily" metrics shift depending on timezone definition
Analyzing only current customers? You're missing the ones who left. Looking at successful campaigns? What about the ones that failed? Always ask: "What data am I NOT seeing?"
A trend that appears in several groups may reverse when groups are combined Always check both the aggregate AND the segments Classic example: treatment works for men AND women separately, but "fails" overall because of unequal group sizes
<30 observations: don't claim patterns One big customer can move averages dramatically โ check for concentration "Revenue grew 200%!" (from $100 to $300 โ meaningless)
Always label units: "$K", "users", "sessions", "orders" Revenue โ profit โ bookings โ ARR โ clarify which If comparing across currencies/periods: normalize
Morning (15 min): โก Check key dashboards โ any anomalies? โก Review overnight data loads โ anything break? โก Scan stakeholder requests โ prioritize Analysis blocks (focused 2-hour chunks): โก Pick one question from the backlog โก Run the DICE framework start to finish โก Deliver insight, not just data End of day (10 min): โก Update analysis log with today's findings โก Note any data quality issues discovered โก Queue tomorrow's priority question
This skill is tool-agnostic. It works with: Databases: PostgreSQL, MySQL, SQLite, BigQuery, Snowflake, Redshift Spreadsheets: CSV, Excel, Google Sheets Languages: SQL (primary), Python/pandas if available Visualization: Any charting tool, or describe charts for stakeholders Files: JSON, Parquet, XML, API responses No dependencies. No scripts. Pure analytical methodology + reusable query patterns.
ANALYSIS: Website Conversion Rate Drop โ January 2024 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ EXECUTIVE SUMMARY Conversion rate dropped from 3.2% to 2.1% in January. Root cause: a broken checkout button on mobile Safari (iOS 17.2+) affecting 34% of mobile traffic. Fix the bug โ recover ~$47K/month in lost revenue. KEY METRICS Conversion rate: 2.1% (was 3.2%) โ โ34% Mobile conversion: 0.8% (was 2.9%) โ โ72% โ THE STORY Desktop conversion: 3.4% (was 3.5%) โ โ3% (normal variance) FINDING The 5-splits analysis immediately pointed to device type. Mobile conversion cratered on Jan 4 โ the same day iOS 17.2 rolled out widely. The checkout button uses a CSS property unsupported in Safari 17.2+. Affected sessions: 12,400 (Jan 4-31) Estimated lost conversions: 12,400 ร 2.1% lift = 260 orders Estimated lost revenue: 260 ร $181 avg order = $47,060 RECOMMENDATION 1. **Hotfix the CSS** โ Engineering, 2-hour fix, deploy today [HIGH] 2. **Add Safari to CI/CD browser matrix** โ Prevent recurrence [MEDIUM] 3. **Set up device-segment alerting** โ Auto-flag >10% drops [LOW] CONFIDENCE: High โ reproduced the bug, confirmed with browser logs. METHODOLOGY: 30-day comparison, segmented by device + browser + date. Built by AfrexAI โก โ turning data into decisions.
Data access, storage, extraction, analysis, reporting, and insight generation.
Largest current source with strong distribution and engagement signals.