{
  "schemaVersion": "1.0",
  "item": {
    "slug": "cost-prediction",
    "name": "Cost Prediction",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/datadrivenconstruction/cost-prediction",
    "canonicalUrl": "https://clawhub.ai/datadrivenconstruction/cost-prediction",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/cost-prediction",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=cost-prediction",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "claw.json",
      "instructions.md",
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-04-30T16:55:25.780Z",
      "expiresAt": "2026-05-07T16:55:25.780Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=network",
        "contentDisposition": "attachment; filename=\"network-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/cost-prediction"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/cost-prediction",
    "agentPageUrl": "https://openagent3.xyz/skills/cost-prediction/agent",
    "manifestUrl": "https://openagent3.xyz/skills/cost-prediction/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/cost-prediction/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "Overview",
        "body": "Based on DDC methodology (Chapter 4.5), this skill enables predicting construction project costs using historical data and machine learning algorithms. The approach transforms traditional expert-based estimation into data-driven prediction.\n\nBook Reference: \"Будущее: прогнозы и машинное обучение\" / \"Future: Predictions and Machine Learning\"\n\n\"Предсказания и прогнозы на основе исторических данных позволяют компаниям принимать более точные решения о стоимости и сроках проектов.\"\n— DDC Book, Chapter 4.5"
      },
      {
        "title": "Core Concepts",
        "body": "Historical Data → Feature Engineering → ML Model → Cost Prediction\n    │                    │                │              │\n    ▼                    ▼                ▼              ▼\nPast projects      Prepare data      Train model    New project\nwith costs         for ML            on history     cost forecast"
      },
      {
        "title": "Quick Start",
        "body": "import pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.metrics import mean_absolute_error, r2_score\n\n# Load historical project data\ndf = pd.read_csv(\"historical_projects.csv\")\n\n# Features and target\nX = df[['area_m2', 'floors', 'complexity_score']]\ny = df['total_cost']\n\n# Split data\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n\n# Train model\nmodel = LinearRegression()\nmodel.fit(X_train, y_train)\n\n# Predict\npredictions = model.predict(X_test)\nprint(f\"R² Score: {r2_score(y_test, predictions):.2f}\")\nprint(f\"MAE: ${mean_absolute_error(y_test, predictions):,.0f}\")\n\n# Predict new project\nnew_project = [[5000, 10, 3]]  # area, floors, complexity\ncost = model.predict(new_project)\nprint(f\"Predicted cost: ${cost[0]:,.0f}\")"
      },
      {
        "title": "Prepare Historical Dataset",
        "body": "import pandas as pd\nimport numpy as np\n\ndef prepare_cost_dataset(df):\n    \"\"\"Prepare historical project data for ML\"\"\"\n    # Select relevant features\n    features = [\n        'area_m2',\n        'floors',\n        'building_type',\n        'location',\n        'year_completed',\n        'complexity_score',\n        'material_quality',\n        'total_cost'\n    ]\n\n    df = df[features].copy()\n\n    # Handle missing values\n    df = df.dropna(subset=['total_cost'])\n    df['complexity_score'] = df['complexity_score'].fillna(df['complexity_score'].median())\n\n    # Encode categorical variables\n    df = pd.get_dummies(df, columns=['building_type', 'location'])\n\n    # Calculate derived features\n    df['cost_per_m2'] = df['total_cost'] / df['area_m2']\n    df['cost_per_floor'] = df['total_cost'] / df['floors']\n\n    # Adjust for inflation (to current year prices)\n    current_year = 2024\n    inflation_rate = 0.03  # 3% annual\n    df['years_ago'] = current_year - df['year_completed']\n    df['adjusted_cost'] = df['total_cost'] * (1 + inflation_rate) ** df['years_ago']\n\n    return df\n\n# Usage\ndf = pd.read_csv(\"projects_history.csv\")\ndf_prepared = prepare_cost_dataset(df)"
      },
      {
        "title": "Feature Engineering",
        "body": "def engineer_features(df):\n    \"\"\"Create additional features for better predictions\"\"\"\n    # Interaction features\n    df['area_x_floors'] = df['area_m2'] * df['floors']\n    df['area_x_complexity'] = df['area_m2'] * df['complexity_score']\n\n    # Polynomial features\n    df['area_squared'] = df['area_m2'] ** 2\n\n    # Log transforms (for skewed features)\n    df['log_area'] = np.log1p(df['area_m2'])\n\n    # Binned features\n    df['size_category'] = pd.cut(\n        df['area_m2'],\n        bins=[0, 1000, 5000, 10000, float('inf')],\n        labels=['small', 'medium', 'large', 'xlarge']\n    )\n\n    return df"
      },
      {
        "title": "Linear Regression",
        "body": "from sklearn.linear_model import LinearRegression\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.pipeline import Pipeline\n\ndef train_linear_model(X_train, y_train):\n    \"\"\"Train Linear Regression model with scaling\"\"\"\n    pipeline = Pipeline([\n        ('scaler', StandardScaler()),\n        ('regressor', LinearRegression())\n    ])\n\n    pipeline.fit(X_train, y_train)\n\n    # Feature importance (coefficients)\n    coefficients = pd.DataFrame({\n        'feature': X_train.columns,\n        'coefficient': pipeline.named_steps['regressor'].coef_\n    }).sort_values('coefficient', key=abs, ascending=False)\n\n    return pipeline, coefficients\n\n# Usage\nmodel, importance = train_linear_model(X_train, y_train)\nprint(\"Feature Importance:\")\nprint(importance)"
      },
      {
        "title": "K-Nearest Neighbors (KNN)",
        "body": "from sklearn.neighbors import KNeighborsRegressor\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.model_selection import GridSearchCV\n\ndef train_knn_model(X_train, y_train):\n    \"\"\"Train KNN model with optimal k\"\"\"\n    # Scale features\n    scaler = StandardScaler()\n    X_scaled = scaler.fit_transform(X_train)\n\n    # Find optimal k using cross-validation\n    param_grid = {'n_neighbors': range(3, 20)}\n    knn = KNeighborsRegressor()\n    grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='neg_mean_absolute_error')\n    grid_search.fit(X_scaled, y_train)\n\n    print(f\"Best k: {grid_search.best_params_['n_neighbors']}\")\n    print(f\"Best MAE: ${-grid_search.best_score_:,.0f}\")\n\n    return grid_search.best_estimator_, scaler\n\n# Usage\nknn_model, scaler = train_knn_model(X_train, y_train)"
      },
      {
        "title": "Random Forest",
        "body": "from sklearn.ensemble import RandomForestRegressor\n\ndef train_random_forest(X_train, y_train):\n    \"\"\"Train Random Forest model\"\"\"\n    rf = RandomForestRegressor(\n        n_estimators=100,\n        max_depth=10,\n        min_samples_split=5,\n        random_state=42\n    )\n\n    rf.fit(X_train, y_train)\n\n    # Feature importance\n    importance = pd.DataFrame({\n        'feature': X_train.columns,\n        'importance': rf.feature_importances_\n    }).sort_values('importance', ascending=False)\n\n    return rf, importance\n\n# Usage\nrf_model, importance = train_random_forest(X_train, y_train)\nprint(\"Feature Importance:\")\nprint(importance.head(10))"
      },
      {
        "title": "Gradient Boosting",
        "body": "from sklearn.ensemble import GradientBoostingRegressor\n\ndef train_gradient_boosting(X_train, y_train):\n    \"\"\"Train Gradient Boosting model\"\"\"\n    gb = GradientBoostingRegressor(\n        n_estimators=200,\n        learning_rate=0.1,\n        max_depth=5,\n        random_state=42\n    )\n\n    gb.fit(X_train, y_train)\n    return gb\n\n# Usage\ngb_model = train_gradient_boosting(X_train, y_train)"
      },
      {
        "title": "Comprehensive Evaluation",
        "body": "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\nimport numpy as np\n\ndef evaluate_model(model, X_test, y_test, model_name=\"Model\"):\n    \"\"\"Comprehensive model evaluation\"\"\"\n    predictions = model.predict(X_test)\n\n    metrics = {\n        'MAE': mean_absolute_error(y_test, predictions),\n        'RMSE': np.sqrt(mean_squared_error(y_test, predictions)),\n        'R²': r2_score(y_test, predictions),\n        'MAPE': np.mean(np.abs((y_test - predictions) / y_test)) * 100\n    }\n\n    print(f\"\\n{model_name} Evaluation:\")\n    print(f\"  MAE:  ${metrics['MAE']:,.0f}\")\n    print(f\"  RMSE: ${metrics['RMSE']:,.0f}\")\n    print(f\"  R²:   {metrics['R²']:.3f}\")\n    print(f\"  MAPE: {metrics['MAPE']:.1f}%\")\n\n    return metrics, predictions\n\n# Usage\nmetrics, predictions = evaluate_model(model, X_test, y_test, \"Linear Regression\")"
      },
      {
        "title": "Compare Multiple Models",
        "body": "def compare_models(models, X_test, y_test):\n    \"\"\"Compare multiple models\"\"\"\n    results = []\n\n    for name, model in models.items():\n        metrics, _ = evaluate_model(model, X_test, y_test, name)\n        metrics['Model'] = name\n        results.append(metrics)\n\n    comparison = pd.DataFrame(results)\n    comparison = comparison.set_index('Model')\n\n    print(\"\\nModel Comparison:\")\n    print(comparison.round(2))\n\n    return comparison\n\n# Usage\nmodels = {\n    'Linear Regression': linear_model,\n    'KNN': knn_model,\n    'Random Forest': rf_model,\n    'Gradient Boosting': gb_model\n}\ncomparison = compare_models(models, X_test, y_test)"
      },
      {
        "title": "Cross-Validation",
        "body": "from sklearn.model_selection import cross_val_score\n\ndef cross_validate_model(model, X, y, cv=5):\n    \"\"\"Perform cross-validation\"\"\"\n    scores = cross_val_score(model, X, y, cv=cv, scoring='neg_mean_absolute_error')\n    mae_scores = -scores\n\n    print(f\"Cross-Validation MAE: ${mae_scores.mean():,.0f} (+/- ${mae_scores.std():,.0f})\")\n    return mae_scores\n\n# Usage\ncv_scores = cross_validate_model(rf_model, X, y)"
      },
      {
        "title": "Complete Prediction Function",
        "body": "import joblib\n\ndef create_prediction_pipeline(model, feature_names, scaler=None):\n    \"\"\"Create a reusable prediction pipeline\"\"\"\n\n    def predict_cost(project_data):\n        \"\"\"\n        Predict cost for new project\n\n        Args:\n            project_data: dict with project features\n\n        Returns:\n            Predicted cost and confidence interval\n        \"\"\"\n        # Create DataFrame from input\n        df = pd.DataFrame([project_data])\n\n        # Ensure all required features\n        for col in feature_names:\n            if col not in df.columns:\n                df[col] = 0\n\n        df = df[feature_names]\n\n        # Scale if necessary\n        if scaler:\n            df = scaler.transform(df)\n\n        # Predict\n        prediction = model.predict(df)[0]\n\n        # Confidence interval (simple estimation)\n        confidence = 0.15  # 15% margin\n        lower = prediction * (1 - confidence)\n        upper = prediction * (1 + confidence)\n\n        return {\n            'predicted_cost': prediction,\n            'lower_bound': lower,\n            'upper_bound': upper,\n            'confidence_level': f\"{(1-confidence)*100:.0f}%\"\n        }\n\n    return predict_cost\n\n# Usage\npredictor = create_prediction_pipeline(rf_model, X.columns.tolist())\n\n# Predict new project\nnew_project = {\n    'area_m2': 5000,\n    'floors': 8,\n    'complexity_score': 3,\n    'material_quality': 2\n}\n\nresult = predictor(new_project)\nprint(f\"Predicted Cost: ${result['predicted_cost']:,.0f}\")\nprint(f\"Range: ${result['lower_bound']:,.0f} - ${result['upper_bound']:,.0f}\")"
      },
      {
        "title": "Save and Load Model",
        "body": "import joblib\n\n# Save model\ndef save_model(model, filepath):\n    \"\"\"Save trained model to file\"\"\"\n    joblib.dump(model, filepath)\n    print(f\"Model saved to {filepath}\")\n\n# Load model\ndef load_model(filepath):\n    \"\"\"Load model from file\"\"\"\n    model = joblib.load(filepath)\n    print(f\"Model loaded from {filepath}\")\n    return model\n\n# Usage\nsave_model(rf_model, \"cost_prediction_model.pkl\")\nloaded_model = load_model(\"cost_prediction_model.pkl\")"
      },
      {
        "title": "Using with ChatGPT",
        "body": "# Prompt for ChatGPT to help with cost prediction\n\nprompt = \"\"\"\nI have historical construction project data with these columns:\n- area_m2: Building area in square meters\n- floors: Number of floors\n- building_type: residential, commercial, industrial\n- total_cost: Total project cost in USD\n\nWrite Python code using scikit-learn to:\n1. Prepare the data for machine learning\n2. Train a Random Forest model\n3. Evaluate the model\n4. Predict cost for a new 3000 m² commercial building with 5 floors\n\"\"\""
      },
      {
        "title": "Quick Reference",
        "body": "TaskCodeSplit datatrain_test_split(X, y, test_size=0.2)Linear RegressionLinearRegression().fit(X, y)KNNKNeighborsRegressor(n_neighbors=5)Random ForestRandomForestRegressor(n_estimators=100)Predictmodel.predict(X_new)MAEmean_absolute_error(y_true, y_pred)R² Scorer2_score(y_true, y_pred)Cross-validatecross_val_score(model, X, y, cv=5)Save modeljoblib.dump(model, 'file.pkl')"
      },
      {
        "title": "Best Practices",
        "body": "Data Quality: More historical data = better predictions\nFeature Selection: Include relevant project characteristics\nInflation Adjustment: Normalize costs to current prices\nRegular Retraining: Update model with new completed projects\nEnsemble Methods: Combine multiple models for robustness\nConfidence Intervals: Always provide prediction ranges"
      },
      {
        "title": "Resources",
        "body": "Book: \"Data-Driven Construction\" by Artem Boiko, Chapter 4.5\nWebsite: https://datadrivenconstruction.io\nscikit-learn: https://scikit-learn.org"
      },
      {
        "title": "Next Steps",
        "body": "See duration-prediction for project duration forecasting\nSee ml-model-builder for custom ML workflows\nSee kpi-dashboard for visualization\nSee big-data-analysis for large dataset processing"
      }
    ],
    "body": "Construction Cost Prediction with Machine Learning\nOverview\n\nBased on DDC methodology (Chapter 4.5), this skill enables predicting construction project costs using historical data and machine learning algorithms. The approach transforms traditional expert-based estimation into data-driven prediction.\n\nBook Reference: \"Будущее: прогнозы и машинное обучение\" / \"Future: Predictions and Machine Learning\"\n\n\"Предсказания и прогнозы на основе исторических данных позволяют компаниям принимать более точные решения о стоимости и сроках проектов.\" — DDC Book, Chapter 4.5\n\nCore Concepts\nHistorical Data → Feature Engineering → ML Model → Cost Prediction\n    │                    │                │              │\n    ▼                    ▼                ▼              ▼\nPast projects      Prepare data      Train model    New project\nwith costs         for ML            on history     cost forecast\n\nQuick Start\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.metrics import mean_absolute_error, r2_score\n\n# Load historical project data\ndf = pd.read_csv(\"historical_projects.csv\")\n\n# Features and target\nX = df[['area_m2', 'floors', 'complexity_score']]\ny = df['total_cost']\n\n# Split data\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n\n# Train model\nmodel = LinearRegression()\nmodel.fit(X_train, y_train)\n\n# Predict\npredictions = model.predict(X_test)\nprint(f\"R² Score: {r2_score(y_test, predictions):.2f}\")\nprint(f\"MAE: ${mean_absolute_error(y_test, predictions):,.0f}\")\n\n# Predict new project\nnew_project = [[5000, 10, 3]]  # area, floors, complexity\ncost = model.predict(new_project)\nprint(f\"Predicted cost: ${cost[0]:,.0f}\")\n\nData Preparation\nPrepare Historical Dataset\nimport pandas as pd\nimport numpy as np\n\ndef prepare_cost_dataset(df):\n    \"\"\"Prepare historical project data for ML\"\"\"\n    # Select relevant features\n    features = [\n        'area_m2',\n        'floors',\n        'building_type',\n        'location',\n        'year_completed',\n        'complexity_score',\n        'material_quality',\n        'total_cost'\n    ]\n\n    df = df[features].copy()\n\n    # Handle missing values\n    df = df.dropna(subset=['total_cost'])\n    df['complexity_score'] = df['complexity_score'].fillna(df['complexity_score'].median())\n\n    # Encode categorical variables\n    df = pd.get_dummies(df, columns=['building_type', 'location'])\n\n    # Calculate derived features\n    df['cost_per_m2'] = df['total_cost'] / df['area_m2']\n    df['cost_per_floor'] = df['total_cost'] / df['floors']\n\n    # Adjust for inflation (to current year prices)\n    current_year = 2024\n    inflation_rate = 0.03  # 3% annual\n    df['years_ago'] = current_year - df['year_completed']\n    df['adjusted_cost'] = df['total_cost'] * (1 + inflation_rate) ** df['years_ago']\n\n    return df\n\n# Usage\ndf = pd.read_csv(\"projects_history.csv\")\ndf_prepared = prepare_cost_dataset(df)\n\nFeature Engineering\ndef engineer_features(df):\n    \"\"\"Create additional features for better predictions\"\"\"\n    # Interaction features\n    df['area_x_floors'] = df['area_m2'] * df['floors']\n    df['area_x_complexity'] = df['area_m2'] * df['complexity_score']\n\n    # Polynomial features\n    df['area_squared'] = df['area_m2'] ** 2\n\n    # Log transforms (for skewed features)\n    df['log_area'] = np.log1p(df['area_m2'])\n\n    # Binned features\n    df['size_category'] = pd.cut(\n        df['area_m2'],\n        bins=[0, 1000, 5000, 10000, float('inf')],\n        labels=['small', 'medium', 'large', 'xlarge']\n    )\n\n    return df\n\nMachine Learning Models\nLinear Regression\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.pipeline import Pipeline\n\ndef train_linear_model(X_train, y_train):\n    \"\"\"Train Linear Regression model with scaling\"\"\"\n    pipeline = Pipeline([\n        ('scaler', StandardScaler()),\n        ('regressor', LinearRegression())\n    ])\n\n    pipeline.fit(X_train, y_train)\n\n    # Feature importance (coefficients)\n    coefficients = pd.DataFrame({\n        'feature': X_train.columns,\n        'coefficient': pipeline.named_steps['regressor'].coef_\n    }).sort_values('coefficient', key=abs, ascending=False)\n\n    return pipeline, coefficients\n\n# Usage\nmodel, importance = train_linear_model(X_train, y_train)\nprint(\"Feature Importance:\")\nprint(importance)\n\nK-Nearest Neighbors (KNN)\nfrom sklearn.neighbors import KNeighborsRegressor\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.model_selection import GridSearchCV\n\ndef train_knn_model(X_train, y_train):\n    \"\"\"Train KNN model with optimal k\"\"\"\n    # Scale features\n    scaler = StandardScaler()\n    X_scaled = scaler.fit_transform(X_train)\n\n    # Find optimal k using cross-validation\n    param_grid = {'n_neighbors': range(3, 20)}\n    knn = KNeighborsRegressor()\n    grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='neg_mean_absolute_error')\n    grid_search.fit(X_scaled, y_train)\n\n    print(f\"Best k: {grid_search.best_params_['n_neighbors']}\")\n    print(f\"Best MAE: ${-grid_search.best_score_:,.0f}\")\n\n    return grid_search.best_estimator_, scaler\n\n# Usage\nknn_model, scaler = train_knn_model(X_train, y_train)\n\nRandom Forest\nfrom sklearn.ensemble import RandomForestRegressor\n\ndef train_random_forest(X_train, y_train):\n    \"\"\"Train Random Forest model\"\"\"\n    rf = RandomForestRegressor(\n        n_estimators=100,\n        max_depth=10,\n        min_samples_split=5,\n        random_state=42\n    )\n\n    rf.fit(X_train, y_train)\n\n    # Feature importance\n    importance = pd.DataFrame({\n        'feature': X_train.columns,\n        'importance': rf.feature_importances_\n    }).sort_values('importance', ascending=False)\n\n    return rf, importance\n\n# Usage\nrf_model, importance = train_random_forest(X_train, y_train)\nprint(\"Feature Importance:\")\nprint(importance.head(10))\n\nGradient Boosting\nfrom sklearn.ensemble import GradientBoostingRegressor\n\ndef train_gradient_boosting(X_train, y_train):\n    \"\"\"Train Gradient Boosting model\"\"\"\n    gb = GradientBoostingRegressor(\n        n_estimators=200,\n        learning_rate=0.1,\n        max_depth=5,\n        random_state=42\n    )\n\n    gb.fit(X_train, y_train)\n    return gb\n\n# Usage\ngb_model = train_gradient_boosting(X_train, y_train)\n\nModel Evaluation\nComprehensive Evaluation\nfrom sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\nimport numpy as np\n\ndef evaluate_model(model, X_test, y_test, model_name=\"Model\"):\n    \"\"\"Comprehensive model evaluation\"\"\"\n    predictions = model.predict(X_test)\n\n    metrics = {\n        'MAE': mean_absolute_error(y_test, predictions),\n        'RMSE': np.sqrt(mean_squared_error(y_test, predictions)),\n        'R²': r2_score(y_test, predictions),\n        'MAPE': np.mean(np.abs((y_test - predictions) / y_test)) * 100\n    }\n\n    print(f\"\\n{model_name} Evaluation:\")\n    print(f\"  MAE:  ${metrics['MAE']:,.0f}\")\n    print(f\"  RMSE: ${metrics['RMSE']:,.0f}\")\n    print(f\"  R²:   {metrics['R²']:.3f}\")\n    print(f\"  MAPE: {metrics['MAPE']:.1f}%\")\n\n    return metrics, predictions\n\n# Usage\nmetrics, predictions = evaluate_model(model, X_test, y_test, \"Linear Regression\")\n\nCompare Multiple Models\ndef compare_models(models, X_test, y_test):\n    \"\"\"Compare multiple models\"\"\"\n    results = []\n\n    for name, model in models.items():\n        metrics, _ = evaluate_model(model, X_test, y_test, name)\n        metrics['Model'] = name\n        results.append(metrics)\n\n    comparison = pd.DataFrame(results)\n    comparison = comparison.set_index('Model')\n\n    print(\"\\nModel Comparison:\")\n    print(comparison.round(2))\n\n    return comparison\n\n# Usage\nmodels = {\n    'Linear Regression': linear_model,\n    'KNN': knn_model,\n    'Random Forest': rf_model,\n    'Gradient Boosting': gb_model\n}\ncomparison = compare_models(models, X_test, y_test)\n\nCross-Validation\nfrom sklearn.model_selection import cross_val_score\n\ndef cross_validate_model(model, X, y, cv=5):\n    \"\"\"Perform cross-validation\"\"\"\n    scores = cross_val_score(model, X, y, cv=cv, scoring='neg_mean_absolute_error')\n    mae_scores = -scores\n\n    print(f\"Cross-Validation MAE: ${mae_scores.mean():,.0f} (+/- ${mae_scores.std():,.0f})\")\n    return mae_scores\n\n# Usage\ncv_scores = cross_validate_model(rf_model, X, y)\n\nPrediction Pipeline\nComplete Prediction Function\nimport joblib\n\ndef create_prediction_pipeline(model, feature_names, scaler=None):\n    \"\"\"Create a reusable prediction pipeline\"\"\"\n\n    def predict_cost(project_data):\n        \"\"\"\n        Predict cost for new project\n\n        Args:\n            project_data: dict with project features\n\n        Returns:\n            Predicted cost and confidence interval\n        \"\"\"\n        # Create DataFrame from input\n        df = pd.DataFrame([project_data])\n\n        # Ensure all required features\n        for col in feature_names:\n            if col not in df.columns:\n                df[col] = 0\n\n        df = df[feature_names]\n\n        # Scale if necessary\n        if scaler:\n            df = scaler.transform(df)\n\n        # Predict\n        prediction = model.predict(df)[0]\n\n        # Confidence interval (simple estimation)\n        confidence = 0.15  # 15% margin\n        lower = prediction * (1 - confidence)\n        upper = prediction * (1 + confidence)\n\n        return {\n            'predicted_cost': prediction,\n            'lower_bound': lower,\n            'upper_bound': upper,\n            'confidence_level': f\"{(1-confidence)*100:.0f}%\"\n        }\n\n    return predict_cost\n\n# Usage\npredictor = create_prediction_pipeline(rf_model, X.columns.tolist())\n\n# Predict new project\nnew_project = {\n    'area_m2': 5000,\n    'floors': 8,\n    'complexity_score': 3,\n    'material_quality': 2\n}\n\nresult = predictor(new_project)\nprint(f\"Predicted Cost: ${result['predicted_cost']:,.0f}\")\nprint(f\"Range: ${result['lower_bound']:,.0f} - ${result['upper_bound']:,.0f}\")\n\nSave and Load Model\nimport joblib\n\n# Save model\ndef save_model(model, filepath):\n    \"\"\"Save trained model to file\"\"\"\n    joblib.dump(model, filepath)\n    print(f\"Model saved to {filepath}\")\n\n# Load model\ndef load_model(filepath):\n    \"\"\"Load model from file\"\"\"\n    model = joblib.load(filepath)\n    print(f\"Model loaded from {filepath}\")\n    return model\n\n# Usage\nsave_model(rf_model, \"cost_prediction_model.pkl\")\nloaded_model = load_model(\"cost_prediction_model.pkl\")\n\nUsing with ChatGPT\n# Prompt for ChatGPT to help with cost prediction\n\nprompt = \"\"\"\nI have historical construction project data with these columns:\n- area_m2: Building area in square meters\n- floors: Number of floors\n- building_type: residential, commercial, industrial\n- total_cost: Total project cost in USD\n\nWrite Python code using scikit-learn to:\n1. Prepare the data for machine learning\n2. Train a Random Forest model\n3. Evaluate the model\n4. Predict cost for a new 3000 m² commercial building with 5 floors\n\"\"\"\n\nQuick Reference\nTask\tCode\nSplit data\ttrain_test_split(X, y, test_size=0.2)\nLinear Regression\tLinearRegression().fit(X, y)\nKNN\tKNeighborsRegressor(n_neighbors=5)\nRandom Forest\tRandomForestRegressor(n_estimators=100)\nPredict\tmodel.predict(X_new)\nMAE\tmean_absolute_error(y_true, y_pred)\nR² Score\tr2_score(y_true, y_pred)\nCross-validate\tcross_val_score(model, X, y, cv=5)\nSave model\tjoblib.dump(model, 'file.pkl')\nBest Practices\nData Quality: More historical data = better predictions\nFeature Selection: Include relevant project characteristics\nInflation Adjustment: Normalize costs to current prices\nRegular Retraining: Update model with new completed projects\nEnsemble Methods: Combine multiple models for robustness\nConfidence Intervals: Always provide prediction ranges\nResources\nBook: \"Data-Driven Construction\" by Artem Boiko, Chapter 4.5\nWebsite: https://datadrivenconstruction.io\nscikit-learn: https://scikit-learn.org\nNext Steps\nSee duration-prediction for project duration forecasting\nSee ml-model-builder for custom ML workflows\nSee kpi-dashboard for visualization\nSee big-data-analysis for large dataset processing"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/datadrivenconstruction/cost-prediction",
    "publisherUrl": "https://clawhub.ai/datadrivenconstruction/cost-prediction",
    "owner": "datadrivenconstruction",
    "version": "2.0.0",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/cost-prediction",
    "downloadUrl": "https://openagent3.xyz/downloads/cost-prediction",
    "agentUrl": "https://openagent3.xyz/skills/cost-prediction/agent",
    "manifestUrl": "https://openagent3.xyz/skills/cost-prediction/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/cost-prediction/agent.md"
  }
}