← All skills
Tencent SkillHub · AI

Cost Prediction

Predict construction project costs using Machine Learning. Use Linear Regression, K-Nearest Neighbors, and Random Forest models on historical project data. Train, evaluate, and deploy cost prediction models.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Predict construction project costs using Machine Learning. Use Linear Regression, K-Nearest Neighbors, and Random Forest models on historical project data. Train, evaluate, and deploy cost prediction models.

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
claw.json, instructions.md, SKILL.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
2.0.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 19 sections Open source page

Overview

Based on DDC methodology (Chapter 4.5), this skill enables predicting construction project costs using historical data and machine learning algorithms. The approach transforms traditional expert-based estimation into data-driven prediction. Book Reference: "Будущее: прогнозы и машинное обучение" / "Future: Predictions and Machine Learning" "Предсказания и прогнозы на основе исторических данных позволяют компаниям принимать более точные решения о стоимости и сроках проектов." — DDC Book, Chapter 4.5

Core Concepts

Historical Data → Feature Engineering → ML Model → Cost Prediction │ │ │ │ ▼ ▼ ▼ ▼ Past projects Prepare data Train model New project with costs for ML on history cost forecast

Quick Start

import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_absolute_error, r2_score # Load historical project data df = pd.read_csv("historical_projects.csv") # Features and target X = df[['area_m2', 'floors', 'complexity_score']] y = df['total_cost'] # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # Train model model = LinearRegression() model.fit(X_train, y_train) # Predict predictions = model.predict(X_test) print(f"R² Score: {r2_score(y_test, predictions):.2f}") print(f"MAE: ${mean_absolute_error(y_test, predictions):,.0f}") # Predict new project new_project = [[5000, 10, 3]] # area, floors, complexity cost = model.predict(new_project) print(f"Predicted cost: ${cost[0]:,.0f}")

Prepare Historical Dataset

import pandas as pd import numpy as np def prepare_cost_dataset(df): """Prepare historical project data for ML""" # Select relevant features features = [ 'area_m2', 'floors', 'building_type', 'location', 'year_completed', 'complexity_score', 'material_quality', 'total_cost' ] df = df[features].copy() # Handle missing values df = df.dropna(subset=['total_cost']) df['complexity_score'] = df['complexity_score'].fillna(df['complexity_score'].median()) # Encode categorical variables df = pd.get_dummies(df, columns=['building_type', 'location']) # Calculate derived features df['cost_per_m2'] = df['total_cost'] / df['area_m2'] df['cost_per_floor'] = df['total_cost'] / df['floors'] # Adjust for inflation (to current year prices) current_year = 2024 inflation_rate = 0.03 # 3% annual df['years_ago'] = current_year - df['year_completed'] df['adjusted_cost'] = df['total_cost'] * (1 + inflation_rate) ** df['years_ago'] return df # Usage df = pd.read_csv("projects_history.csv") df_prepared = prepare_cost_dataset(df)

Feature Engineering

def engineer_features(df): """Create additional features for better predictions""" # Interaction features df['area_x_floors'] = df['area_m2'] * df['floors'] df['area_x_complexity'] = df['area_m2'] * df['complexity_score'] # Polynomial features df['area_squared'] = df['area_m2'] ** 2 # Log transforms (for skewed features) df['log_area'] = np.log1p(df['area_m2']) # Binned features df['size_category'] = pd.cut( df['area_m2'], bins=[0, 1000, 5000, 10000, float('inf')], labels=['small', 'medium', 'large', 'xlarge'] ) return df

Linear Regression

from sklearn.linear_model import LinearRegression from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline def train_linear_model(X_train, y_train): """Train Linear Regression model with scaling""" pipeline = Pipeline([ ('scaler', StandardScaler()), ('regressor', LinearRegression()) ]) pipeline.fit(X_train, y_train) # Feature importance (coefficients) coefficients = pd.DataFrame({ 'feature': X_train.columns, 'coefficient': pipeline.named_steps['regressor'].coef_ }).sort_values('coefficient', key=abs, ascending=False) return pipeline, coefficients # Usage model, importance = train_linear_model(X_train, y_train) print("Feature Importance:") print(importance)

K-Nearest Neighbors (KNN)

from sklearn.neighbors import KNeighborsRegressor from sklearn.preprocessing import StandardScaler from sklearn.model_selection import GridSearchCV def train_knn_model(X_train, y_train): """Train KNN model with optimal k""" # Scale features scaler = StandardScaler() X_scaled = scaler.fit_transform(X_train) # Find optimal k using cross-validation param_grid = {'n_neighbors': range(3, 20)} knn = KNeighborsRegressor() grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='neg_mean_absolute_error') grid_search.fit(X_scaled, y_train) print(f"Best k: {grid_search.best_params_['n_neighbors']}") print(f"Best MAE: ${-grid_search.best_score_:,.0f}") return grid_search.best_estimator_, scaler # Usage knn_model, scaler = train_knn_model(X_train, y_train)

Random Forest

from sklearn.ensemble import RandomForestRegressor def train_random_forest(X_train, y_train): """Train Random Forest model""" rf = RandomForestRegressor( n_estimators=100, max_depth=10, min_samples_split=5, random_state=42 ) rf.fit(X_train, y_train) # Feature importance importance = pd.DataFrame({ 'feature': X_train.columns, 'importance': rf.feature_importances_ }).sort_values('importance', ascending=False) return rf, importance # Usage rf_model, importance = train_random_forest(X_train, y_train) print("Feature Importance:") print(importance.head(10))

Gradient Boosting

from sklearn.ensemble import GradientBoostingRegressor def train_gradient_boosting(X_train, y_train): """Train Gradient Boosting model""" gb = GradientBoostingRegressor( n_estimators=200, learning_rate=0.1, max_depth=5, random_state=42 ) gb.fit(X_train, y_train) return gb # Usage gb_model = train_gradient_boosting(X_train, y_train)

Comprehensive Evaluation

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score import numpy as np def evaluate_model(model, X_test, y_test, model_name="Model"): """Comprehensive model evaluation""" predictions = model.predict(X_test) metrics = { 'MAE': mean_absolute_error(y_test, predictions), 'RMSE': np.sqrt(mean_squared_error(y_test, predictions)), 'R²': r2_score(y_test, predictions), 'MAPE': np.mean(np.abs((y_test - predictions) / y_test)) * 100 } print(f"\n{model_name} Evaluation:") print(f" MAE: ${metrics['MAE']:,.0f}") print(f" RMSE: ${metrics['RMSE']:,.0f}") print(f" R²: {metrics['R²']:.3f}") print(f" MAPE: {metrics['MAPE']:.1f}%") return metrics, predictions # Usage metrics, predictions = evaluate_model(model, X_test, y_test, "Linear Regression")

Compare Multiple Models

def compare_models(models, X_test, y_test): """Compare multiple models""" results = [] for name, model in models.items(): metrics, _ = evaluate_model(model, X_test, y_test, name) metrics['Model'] = name results.append(metrics) comparison = pd.DataFrame(results) comparison = comparison.set_index('Model') print("\nModel Comparison:") print(comparison.round(2)) return comparison # Usage models = { 'Linear Regression': linear_model, 'KNN': knn_model, 'Random Forest': rf_model, 'Gradient Boosting': gb_model } comparison = compare_models(models, X_test, y_test)

Cross-Validation

from sklearn.model_selection import cross_val_score def cross_validate_model(model, X, y, cv=5): """Perform cross-validation""" scores = cross_val_score(model, X, y, cv=cv, scoring='neg_mean_absolute_error') mae_scores = -scores print(f"Cross-Validation MAE: ${mae_scores.mean():,.0f} (+/- ${mae_scores.std():,.0f})") return mae_scores # Usage cv_scores = cross_validate_model(rf_model, X, y)

Complete Prediction Function

import joblib def create_prediction_pipeline(model, feature_names, scaler=None): """Create a reusable prediction pipeline""" def predict_cost(project_data): """ Predict cost for new project Args: project_data: dict with project features Returns: Predicted cost and confidence interval """ # Create DataFrame from input df = pd.DataFrame([project_data]) # Ensure all required features for col in feature_names: if col not in df.columns: df[col] = 0 df = df[feature_names] # Scale if necessary if scaler: df = scaler.transform(df) # Predict prediction = model.predict(df)[0] # Confidence interval (simple estimation) confidence = 0.15 # 15% margin lower = prediction * (1 - confidence) upper = prediction * (1 + confidence) return { 'predicted_cost': prediction, 'lower_bound': lower, 'upper_bound': upper, 'confidence_level': f"{(1-confidence)*100:.0f}%" } return predict_cost # Usage predictor = create_prediction_pipeline(rf_model, X.columns.tolist()) # Predict new project new_project = { 'area_m2': 5000, 'floors': 8, 'complexity_score': 3, 'material_quality': 2 } result = predictor(new_project) print(f"Predicted Cost: ${result['predicted_cost']:,.0f}") print(f"Range: ${result['lower_bound']:,.0f} - ${result['upper_bound']:,.0f}")

Save and Load Model

import joblib # Save model def save_model(model, filepath): """Save trained model to file""" joblib.dump(model, filepath) print(f"Model saved to {filepath}") # Load model def load_model(filepath): """Load model from file""" model = joblib.load(filepath) print(f"Model loaded from {filepath}") return model # Usage save_model(rf_model, "cost_prediction_model.pkl") loaded_model = load_model("cost_prediction_model.pkl")

Using with ChatGPT

  • # Prompt for ChatGPT to help with cost prediction
  • prompt = """
  • I have historical construction project data with these columns:
  • area_m2: Building area in square meters
  • floors: Number of floors
  • building_type: residential, commercial, industrial
  • total_cost: Total project cost in USD
  • Write Python code using scikit-learn to:
  • 1. Prepare the data for machine learning
  • 2. Train a Random Forest model
  • 3. Evaluate the model
  • 4. Predict cost for a new 3000 m² commercial building with 5 floors
  • """

Quick Reference

TaskCodeSplit datatrain_test_split(X, y, test_size=0.2)Linear RegressionLinearRegression().fit(X, y)KNNKNeighborsRegressor(n_neighbors=5)Random ForestRandomForestRegressor(n_estimators=100)Predictmodel.predict(X_new)MAEmean_absolute_error(y_true, y_pred)R² Scorer2_score(y_true, y_pred)Cross-validatecross_val_score(model, X, y, cv=5)Save modeljoblib.dump(model, 'file.pkl')

Best Practices

Data Quality: More historical data = better predictions Feature Selection: Include relevant project characteristics Inflation Adjustment: Normalize costs to current prices Regular Retraining: Update model with new completed projects Ensemble Methods: Combine multiple models for robustness Confidence Intervals: Always provide prediction ranges

Resources

Book: "Data-Driven Construction" by Artem Boiko, Chapter 4.5 Website: https://datadrivenconstruction.io scikit-learn: https://scikit-learn.org

Next Steps

See duration-prediction for project duration forecasting See ml-model-builder for custom ML workflows See kpi-dashboard for visualization See big-data-analysis for large dataset processing

Category context

Agent frameworks, memory systems, reasoning layers, and model-native orchestration.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs1 Config
  • SKILL.md Primary doc
  • instructions.md Docs
  • claw.json Config