โ† All skills
Tencent SkillHub ยท Productivity

Office To Md V2

Convert PDF, DOC, DOCX, and PPTX office documents to Markdown, supporting legacy .doc files with text extraction and basic formatting preservation.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Convert PDF, DOC, DOCX, and PPTX office documents to Markdown, supporting legacy .doc files with text extraction and basic formatting preservation.

โฌ‡ 0 downloads โ˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
README.md, SKILL.md, examples/basic-usage.js, office-to-md/SKILL.md, office-to-md/convert.js, office-to-md/openclaw-skill.js

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
0.1.0

Documentation

ClawHub primary doc Primary doc: SKILL.md 22 sections Open source page

Description

Convert office documents (PDF, DOC, DOCX, PPTX) to Markdown format. This skill uses the word-extractor library for .doc support and provides full OpenClaw integration.

When to Use

When you need to extract text from office documents When you want to convert documents to readable Markdown format When analyzing document content in OpenClaw Specifically when dealing with legacy .doc format files

Supported Formats

PDF (.pdf): Text extraction using pdf-parse Word (.docx): Formatting preservation using mammoth + turndown Legacy Word (.doc): Text extraction using word-extractor (supports Chinese encoding) PowerPoint (.pptx): Basic text extraction using python-pptx

Dependencies

Node.js with npm packages: pdf-parse, mammoth, turndown, word-extractor Python3 with python-pptx (for PPTX conversion, optional) OpenClaw exec tool permission

1. Copy the skill to your workspace:

cp -r /root/.openclaw/workspace/office-to-md-v2/office-to-md /path/to/your/workspace/

2. Install dependencies:

cd /path/to/your/workspace/office-to-md npm install

3. For PPTX support (optional):

pip3 install python-pptx

Method 1: Direct exec call

// Convert any supported document const result = await exec( 'node /path/to/office-to-md/openclaw-skill.js /path/to/document.doc', { workdir: '/path/to/workspace', timeout: 60000 } ); if (result.exitCode === 0) { console.log('โœ… Document converted successfully'); // Output file: /path/to/document.md } else { console.error('โŒ Conversion failed:', result.stderr); }

Method 2: Using the wrapper function

// Import the converter const { convertOfficeToMarkdown } = require('/path/to/office-to-md/openclaw-skill.js'); // Convert document const conversionResult = await convertOfficeToMarkdown('/path/to/document.pdf'); if (conversionResult.success) { console.log(`Output: ${conversionResult.outputPath}`); console.log(`Preview: ${conversionResult.preview}`); } else { console.error(`Error: ${conversionResult.error}`); }

Method 3: Complete OpenClaw integration function

async function convertDocumentToMarkdown(filePath) { // Validate file exists try { await read(filePath); } catch (error) { return { success: false, error: `File not found: ${filePath}` }; } // Check file extension const ext = filePath.toLowerCase().slice(-5); const supported = ['.pdf', '.doc', '.docx', '.pptx']; if (!supported.some(s => ext.endsWith(s))) { return { success: false, error: `Unsupported file type. Supported: ${supported.join(', ')}` }; } // Convert using the skill const cmd = `node /path/to/office-to-md/openclaw-skill.js "${filePath}"`; const result = await exec(cmd, { workdir: '/path/to/workspace', timeout: 120000 // 2 minutes for large files }); if (result.exitCode === 0) { const outputPath = filePath.replace(/\.[^/.]+$/, '.md'); return { success: true, outputPath: outputPath, message: `Converted to: ${outputPath}` }; } else { return { success: false, error: result.stderr || 'Conversion failed' }; } } // Usage example const result = await convertDocumentToMarkdown('/path/to/document.doc'); if (result.success) { const markdown = await read(result.outputPath); console.log(markdown.substring(0, 1000)); }

Example 1: Convert and analyze a document

// Convert a .doc file and analyze its content const docPath = '/path/to/document.doc'; const convertResult = await exec( `node /path/to/office-to-md/openclaw-skill.js "${docPath}"`, { workdir: '/path/to/workspace' } ); if (convertResult.exitCode === 0) { const mdPath = docPath.replace('.doc', '.md'); const content = await read(mdPath); // Analyze the content const wordCount = content.split(/\s+/).length; const lines = content.split('\n').length; const hasChinese = /[\u4e00-\u9fff]/.test(content); console.log(`Document analysis:`); console.log(`- Word count: ${wordCount}`); console.log(`- Lines: ${lines}`); console.log(`- Contains Chinese: ${hasChinese}`); console.log(`- Preview: ${content.substring(0, 200)}...`); }

Example 2: Batch conversion

// Convert multiple documents of different formats const documents = [ '/path/to/report.pdf', '/path/to/legacy.doc', '/path/to/modern.docx', '/path/to/presentation.pptx' ]; const results = []; for (const doc of documents) { console.log(`Converting ${doc}...`); const result = await exec( `node /path/to/office-to-md/openclaw-skill.js "${doc}"`, { workdir: '/path/to/workspace', timeout: 90000 } ); const success = result.exitCode === 0; results.push({ file: doc, success: success, error: success ? null : result.stderr }); console.log(success ? 'โœ… Success' : 'โŒ Failed'); } // Summary const successful = results.filter(r => r.success).length; console.log(`\nConversion summary: ${successful}/${results.length} successful`);

convertOfficeToMarkdown(filePath)

Returns a Promise that resolves to: { success: boolean, outputPath?: string, markdown?: string, preview?: string, fileType?: string, message?: string, stats?: { lines: number, characters: number, words: number }, error?: string, stack?: string }

Timeout Settings

Small files (<1MB): 30 seconds Medium files (1-10MB): 60 seconds Large files (>10MB): 120 seconds

Memory Limits

Default Node.js memory limit is sufficient for most documents For very large files, you may need to increase memory: node --max-old-space-size=4096 openclaw-skill.js large-file.doc

Common Issues

"File not found" Check file path and permissions Use absolute paths for reliability "Unsupported file type" Ensure file has correct extension Check if file is actually the claimed format Conversion errors with .doc files The file may be corrupted or in an unusual format Try opening in Word and saving as .docx first Chinese text appears as gibberish word-extractor should handle Chinese encoding automatically If issues persist, the file may use unusual encoding Timeout errors Increase timeout for large files Check system resources

Debug Mode

Enable debug logging by setting environment variable: DEBUG=office-to-md node openclaw-skill.js document.doc

Performance

PDF: Fast, depends on file size DOCX: Fast to medium, good formatting preservation DOC: Medium, requires binary parsing PPTX: Slow, requires Python and external library

Limitations

Images in documents are not extracted Complex formatting may not be fully preserved Tables may convert imperfectly to Markdown Very old or corrupted .doc files may fail Password-protected files are not supported

v2.0.0 (2026-02-15)

Added full .doc support using word-extractor Fixed ESM compatibility issues with pptConverter Added comprehensive OpenClaw integration Improved Chinese text extraction Added structured output with statistics

v1.0.0 (Initial)

Basic PDF, DOCX, PPTX support Simple conversion without .doc support

License

This skill is provided as-is. The underlying libraries have their own licenses: pdf-parse: MIT mammoth: BSD-2-Clause turndown: MIT word-extractor: MIT python-pptx: MIT

Category context

Workflow acceleration for inboxes, docs, calendars, planning, and execution loops.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
3 Docs3 Scripts
  • SKILL.md Primary doc
  • office-to-md/SKILL.md Docs
  • README.md Docs
  • examples/basic-usage.js Scripts
  • office-to-md/convert.js Scripts
  • office-to-md/openclaw-skill.js Scripts