← All skills
Tencent SkillHub · Data Analysis

Pandas Construction Analysis

Comprehensive Pandas toolkit for construction data analysis. Filter, group, aggregate BIM elements, calculate quantities, merge datasets, and generate report...

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Comprehensive Pandas toolkit for construction data analysis. Filter, group, aggregate BIM elements, calculate quantities, merge datasets, and generate report...

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
claw.json, instructions.md, SKILL.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
2.1.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 23 sections Open source page

Overview

Based on DDC methodology (Chapter 2.3), this skill provides comprehensive Pandas operations for construction data processing. Pandas is the Swiss Army knife for data analysts - handling everything from simple data filtering to complex aggregations across millions of rows. Book Reference: "Pandas DataFrame и LLM ChatGPT" / "Pandas DataFrame and LLM ChatGPT" "Используя Pandas, вы можете управлять и анализировать наборы данных, намного превосходящие возможности Excel. В то время как Excel способен обрабатывать до 1 миллиона строк данных, Pandas может без труда работать с наборами данных, содержащими десятки миллионов строк." — DDC Book, Chapter 2.3

Quick Start

import pandas as pd # Read construction data df = pd.read_excel("bim_export.xlsx") # Basic operations print(df.head()) # First 5 rows print(df.info()) # Column types and memory print(df.describe()) # Statistics for numeric columns # Filter structural elements structural = df[df['Category'] == 'Structural'] # Calculate total volume total_volume = df['Volume'].sum() print(f"Total volume: {total_volume:.2f} m³")

Creating DataFrames

import pandas as pd # From dictionary (construction elements) elements = pd.DataFrame({ 'ElementId': ['E001', 'E002', 'E003', 'E004'], 'Category': ['Wall', 'Floor', 'Wall', 'Column'], 'Material': ['Concrete', 'Concrete', 'Brick', 'Steel'], 'Volume_m3': [45.5, 120.0, 32.0, 8.5], 'Level': ['Level 1', 'Level 1', 'Level 2', 'Level 1'] }) # From CSV df_csv = pd.read_csv("construction_data.csv") # From Excel df_excel = pd.read_excel("project_data.xlsx", sheet_name="Elements") # From multiple Excel sheets all_sheets = pd.read_excel("project.xlsx", sheet_name=None) # Dict of DataFrames

Data Types in Construction

# Common data types for construction df = pd.DataFrame({ 'element_id': pd.Series(['W001', 'W002'], dtype='string'), 'quantity': pd.Series([10, 20], dtype='int64'), 'volume': pd.Series([45.5, 32.0], dtype='float64'), 'is_structural': pd.Series([True, False], dtype='bool'), 'created_date': pd.to_datetime(['2024-01-15', '2024-01-16']), 'category': pd.Categorical(['Wall', 'Slab']) }) # Check data types print(df.dtypes) # Convert types df['quantity'] = df['quantity'].astype('float64') df['volume'] = pd.to_numeric(df['volume'], errors='coerce')

Basic Filtering

# Single condition walls = df[df['Category'] == 'Wall'] # Multiple conditions (AND) large_concrete = df[(df['Material'] == 'Concrete') & (df['Volume_m3'] > 50)] # Multiple conditions (OR) walls_or_floors = df[(df['Category'] == 'Wall') | (df['Category'] == 'Floor')] # Using isin for multiple values structural = df[df['Category'].isin(['Wall', 'Column', 'Beam', 'Foundation'])] # String contains insulated = df[df['Description'].str.contains('insulated', case=False, na=False)] # Null value filtering incomplete = df[df['Cost'].isna()] complete = df[df['Cost'].notna()]

Advanced Selection

# Select columns volumes = df[['ElementId', 'Category', 'Volume_m3']] # Query syntax (SQL-like) result = df.query("Category == 'Wall' and Volume_m3 > 30") # Loc and iloc specific_row = df.loc[0] # By label range_rows = df.iloc[0:10] # By position specific_cell = df.loc[0, 'Volume_m3'] # Row and column subset = df.loc[0:5, ['Category', 'Volume_m3']] # Range with columns

GroupBy Operations

# Basic groupby by_category = df.groupby('Category')['Volume_m3'].sum() # Multiple aggregations summary = df.groupby('Category').agg({ 'Volume_m3': ['sum', 'mean', 'count'], 'Cost': ['sum', 'mean'] }) # Named aggregations (cleaner output) summary = df.groupby('Category').agg( total_volume=('Volume_m3', 'sum'), avg_volume=('Volume_m3', 'mean'), element_count=('ElementId', 'count'), total_cost=('Cost', 'sum') ).reset_index() # Multiple grouping columns by_level_cat = df.groupby(['Level', 'Category']).agg({ 'Volume_m3': 'sum', 'Cost': 'sum' }).reset_index()

Pivot Tables

# Create pivot table pivot = pd.pivot_table( df, values='Volume_m3', index='Level', columns='Category', aggfunc='sum', fill_value=0, margins=True, # Add totals margins_name='Total' ) # Multiple values pivot_detailed = pd.pivot_table( df, values=['Volume_m3', 'Cost'], index='Level', columns='Category', aggfunc={'Volume_m3': 'sum', 'Cost': 'mean'} )

Adding Calculated Columns

# Simple calculation df['Cost_Total'] = df['Volume_m3'] * df['Unit_Price'] # Conditional column df['Size_Category'] = df['Volume_m3'].apply( lambda x: 'Large' if x > 50 else ('Medium' if x > 20 else 'Small') ) # Using np.where for binary conditions import numpy as np df['Is_Large'] = np.where(df['Volume_m3'] > 50, True, False) # Using cut for binning df['Volume_Bin'] = pd.cut( df['Volume_m3'], bins=[0, 10, 50, 100, float('inf')], labels=['XS', 'S', 'M', 'L'] )

String Operations

# Extract from strings df['Level_Number'] = df['Level'].str.extract(r'(\d+)').astype(int) # Split and expand df[['Building', 'Floor']] = df['Location'].str.split('-', expand=True) # Clean strings df['Category'] = df['Category'].str.strip().str.lower().str.title() # Replace values df['Material'] = df['Material'].str.replace('Reinforced Concrete', 'RC')

Date Operations

# Parse dates df['Start_Date'] = pd.to_datetime(df['Start_Date']) # Extract components df['Year'] = df['Start_Date'].dt.year df['Month'] = df['Start_Date'].dt.month df['Week'] = df['Start_Date'].dt.isocalendar().week df['DayOfWeek'] = df['Start_Date'].dt.day_name() # Calculate duration df['Duration_Days'] = (df['End_Date'] - df['Start_Date']).dt.days # Filter by date range recent = df[df['Start_Date'] >= '2024-01-01']

Merge DataFrames

# Elements data elements = pd.DataFrame({ 'ElementId': ['E001', 'E002', 'E003'], 'Category': ['Wall', 'Floor', 'Column'], 'Volume_m3': [45.5, 120.0, 8.5] }) # Unit prices prices = pd.DataFrame({ 'Category': ['Wall', 'Floor', 'Column', 'Beam'], 'Unit_Price': [150, 80, 450, 200] }) # Inner join (only matching) merged = elements.merge(prices, on='Category', how='inner') # Left join (keep all elements) merged = elements.merge(prices, on='Category', how='left') # Join on different column names result = df1.merge(df2, left_on='elem_id', right_on='ElementId')

Concatenating DataFrames

# Vertical concatenation (stacking) all_floors = pd.concat([floor1_df, floor2_df, floor3_df], ignore_index=True) # Horizontal concatenation combined = pd.concat([quantities, costs, schedule], axis=1) # Append new rows new_elements = pd.DataFrame({'ElementId': ['E004'], 'Category': ['Beam']}) df = pd.concat([df, new_elements], ignore_index=True)

Quantity Take-Off (QTO)

def generate_qto_report(df): """Generate Quantity Take-Off summary by category""" qto = df.groupby(['Category', 'Material']).agg( count=('ElementId', 'count'), total_volume=('Volume_m3', 'sum'), total_area=('Area_m2', 'sum'), avg_volume=('Volume_m3', 'mean') ).round(2) # Add percentage column qto['volume_pct'] = (qto['total_volume'] / qto['total_volume'].sum() * 100).round(1) return qto.sort_values('total_volume', ascending=False) # Usage qto_report = generate_qto_report(df) qto_report.to_excel("qto_report.xlsx")

Cost Estimation

def calculate_project_cost(elements_df, prices_df, markup=0.15): """Calculate total project cost with markup""" # Merge with prices df = elements_df.merge(prices_df, on='Category', how='left') # Calculate base cost df['Base_Cost'] = df['Volume_m3'] * df['Unit_Price'] # Apply markup df['Total_Cost'] = df['Base_Cost'] * (1 + markup) # Summary by category summary = df.groupby('Category').agg( volume=('Volume_m3', 'sum'), base_cost=('Base_Cost', 'sum'), total_cost=('Total_Cost', 'sum') ).round(2) return df, summary, summary['total_cost'].sum() # Usage detailed, summary, total = calculate_project_cost(elements, prices) print(f"Project Total: ${total:,.2f}")

Material Summary

def material_summary(df): """Summarize materials across project""" summary = df.groupby('Material').agg({ 'Volume_m3': 'sum', 'Weight_kg': 'sum', 'ElementId': 'nunique' }).rename(columns={'ElementId': 'Element_Count'}) summary['Volume_Pct'] = (summary['Volume_m3'] / summary['Volume_m3'].sum() * 100).round(1) return summary.sort_values('Volume_m3', ascending=False)

Level-by-Level Analysis

def analyze_by_level(df): """Analyze construction quantities by building level""" level_summary = df.pivot_table( values=['Volume_m3', 'Cost'], index='Level', columns='Category', aggfunc='sum', fill_value=0 ) level_summary['Total_Volume'] = level_summary['Volume_m3'].sum(axis=1) level_summary['Total_Cost'] = level_summary['Cost'].sum(axis=1) return level_summary

Export to Excel with Multiple Sheets

def export_to_excel_formatted(df, summary, filepath): """Export with multiple sheets""" with pd.ExcelWriter(filepath, engine='openpyxl') as writer: df.to_excel(writer, sheet_name='Details', index=False) summary.to_excel(writer, sheet_name='Summary') pivot = pd.pivot_table(df, values='Volume_m3', index='Level', columns='Category') pivot.to_excel(writer, sheet_name='By_Level') # Usage export_to_excel_formatted(elements, qto_summary, "project_report.xlsx")

Export to CSV

# Basic export df.to_csv("output.csv", index=False) # With encoding for special characters df.to_csv("output.csv", index=False, encoding='utf-8-sig') # Specific columns df[['ElementId', 'Category', 'Volume_m3']].to_csv("volumes.csv", index=False)

Performance Tips

# Use categories for string columns with few unique values df['Category'] = df['Category'].astype('category') # Read only needed columns df = pd.read_csv("large_file.csv", usecols=['ElementId', 'Category', 'Volume']) # Use chunking for very large files chunks = pd.read_csv("huge_file.csv", chunksize=100000) result = pd.concat([chunk[chunk['Category'] == 'Wall'] for chunk in chunks]) # Check memory usage print(df.memory_usage(deep=True).sum() / 1024**2, "MB")

Quick Reference

OperationCodeRead Excelpd.read_excel("file.xlsx")Read CSVpd.read_csv("file.csv")Filter rowsdf[df['Column'] == 'Value']Select columnsdf[['Col1', 'Col2']]Group and sumdf.groupby('Cat')['Vol'].sum()Pivot tablepd.pivot_table(df, values='Vol', index='Level')Mergedf1.merge(df2, on='key')Add columndf['New'] = df['A'] * df['B']Export Exceldf.to_excel("out.xlsx", index=False)

Resources

Book: "Data-Driven Construction" by Artem Boiko, Chapter 2.3 Website: https://datadrivenconstruction.io Pandas Docs: https://pandas.pydata.org/docs/

Next Steps

See llm-data-automation for generating Pandas code with AI See qto-report for specialized QTO calculations See cost-estimation-resource for detailed cost calculations

Category context

Data access, storage, extraction, analysis, reporting, and insight generation.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
2 Docs1 Config
  • SKILL.md Primary doc
  • instructions.md Docs
  • claw.json Config