Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Convert PDF, DOC, DOCX, and PPTX office documents to Markdown, supporting legacy .doc files with text extraction and basic formatting preservation.
Convert PDF, DOC, DOCX, and PPTX office documents to Markdown, supporting legacy .doc files with text extraction and basic formatting preservation.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Convert office documents (PDF, DOC, DOCX, PPTX) to Markdown format. This skill uses the word-extractor library for .doc support and provides full OpenClaw integration.
When you need to extract text from office documents When you want to convert documents to readable Markdown format When analyzing document content in OpenClaw Specifically when dealing with legacy .doc format files
PDF (.pdf): Text extraction using pdf-parse Word (.docx): Formatting preservation using mammoth + turndown Legacy Word (.doc): Text extraction using word-extractor (supports Chinese encoding) PowerPoint (.pptx): Basic text extraction using python-pptx
Node.js with npm packages: pdf-parse, mammoth, turndown, word-extractor Python3 with python-pptx (for PPTX conversion, optional) OpenClaw exec tool permission
cp -r /root/.openclaw/workspace/office-to-md-v2/office-to-md /path/to/your/workspace/
cd /path/to/your/workspace/office-to-md npm install
pip3 install python-pptx
// Convert any supported document const result = await exec( 'node /path/to/office-to-md/openclaw-skill.js /path/to/document.doc', { workdir: '/path/to/workspace', timeout: 60000 } ); if (result.exitCode === 0) { console.log('โ Document converted successfully'); // Output file: /path/to/document.md } else { console.error('โ Conversion failed:', result.stderr); }
// Import the converter const { convertOfficeToMarkdown } = require('/path/to/office-to-md/openclaw-skill.js'); // Convert document const conversionResult = await convertOfficeToMarkdown('/path/to/document.pdf'); if (conversionResult.success) { console.log(`Output: ${conversionResult.outputPath}`); console.log(`Preview: ${conversionResult.preview}`); } else { console.error(`Error: ${conversionResult.error}`); }
async function convertDocumentToMarkdown(filePath) { // Validate file exists try { await read(filePath); } catch (error) { return { success: false, error: `File not found: ${filePath}` }; } // Check file extension const ext = filePath.toLowerCase().slice(-5); const supported = ['.pdf', '.doc', '.docx', '.pptx']; if (!supported.some(s => ext.endsWith(s))) { return { success: false, error: `Unsupported file type. Supported: ${supported.join(', ')}` }; } // Convert using the skill const cmd = `node /path/to/office-to-md/openclaw-skill.js "${filePath}"`; const result = await exec(cmd, { workdir: '/path/to/workspace', timeout: 120000 // 2 minutes for large files }); if (result.exitCode === 0) { const outputPath = filePath.replace(/\.[^/.]+$/, '.md'); return { success: true, outputPath: outputPath, message: `Converted to: ${outputPath}` }; } else { return { success: false, error: result.stderr || 'Conversion failed' }; } } // Usage example const result = await convertDocumentToMarkdown('/path/to/document.doc'); if (result.success) { const markdown = await read(result.outputPath); console.log(markdown.substring(0, 1000)); }
// Convert a .doc file and analyze its content const docPath = '/path/to/document.doc'; const convertResult = await exec( `node /path/to/office-to-md/openclaw-skill.js "${docPath}"`, { workdir: '/path/to/workspace' } ); if (convertResult.exitCode === 0) { const mdPath = docPath.replace('.doc', '.md'); const content = await read(mdPath); // Analyze the content const wordCount = content.split(/\s+/).length; const lines = content.split('\n').length; const hasChinese = /[\u4e00-\u9fff]/.test(content); console.log(`Document analysis:`); console.log(`- Word count: ${wordCount}`); console.log(`- Lines: ${lines}`); console.log(`- Contains Chinese: ${hasChinese}`); console.log(`- Preview: ${content.substring(0, 200)}...`); }
// Convert multiple documents of different formats const documents = [ '/path/to/report.pdf', '/path/to/legacy.doc', '/path/to/modern.docx', '/path/to/presentation.pptx' ]; const results = []; for (const doc of documents) { console.log(`Converting ${doc}...`); const result = await exec( `node /path/to/office-to-md/openclaw-skill.js "${doc}"`, { workdir: '/path/to/workspace', timeout: 90000 } ); const success = result.exitCode === 0; results.push({ file: doc, success: success, error: success ? null : result.stderr }); console.log(success ? 'โ Success' : 'โ Failed'); } // Summary const successful = results.filter(r => r.success).length; console.log(`\nConversion summary: ${successful}/${results.length} successful`);
Returns a Promise that resolves to: { success: boolean, outputPath?: string, markdown?: string, preview?: string, fileType?: string, message?: string, stats?: { lines: number, characters: number, words: number }, error?: string, stack?: string }
Small files (<1MB): 30 seconds Medium files (1-10MB): 60 seconds Large files (>10MB): 120 seconds
Default Node.js memory limit is sufficient for most documents For very large files, you may need to increase memory: node --max-old-space-size=4096 openclaw-skill.js large-file.doc
"File not found" Check file path and permissions Use absolute paths for reliability "Unsupported file type" Ensure file has correct extension Check if file is actually the claimed format Conversion errors with .doc files The file may be corrupted or in an unusual format Try opening in Word and saving as .docx first Chinese text appears as gibberish word-extractor should handle Chinese encoding automatically If issues persist, the file may use unusual encoding Timeout errors Increase timeout for large files Check system resources
Enable debug logging by setting environment variable: DEBUG=office-to-md node openclaw-skill.js document.doc
PDF: Fast, depends on file size DOCX: Fast to medium, good formatting preservation DOC: Medium, requires binary parsing PPTX: Slow, requires Python and external library
Images in documents are not extracted Complex formatting may not be fully preserved Tables may convert imperfectly to Markdown Very old or corrupted .doc files may fail Password-protected files are not supported
Added full .doc support using word-extractor Fixed ESM compatibility issues with pptConverter Added comprehensive OpenClaw integration Improved Chinese text extraction Added structured output with statistics
Basic PDF, DOCX, PPTX support Simple conversion without .doc support
This skill is provided as-is. The underlying libraries have their own licenses: pdf-parse: MIT mammoth: BSD-2-Clause turndown: MIT word-extractor: MIT python-pptx: MIT
Workflow acceleration for inboxes, docs, calendars, planning, and execution loops.
Largest current source with strong distribution and engagement signals.