Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Extract structured data from websites using Tabstack API. Use when you need to scrape job listings, news articles, product pages, or any structured web content. Provides JSON schema-based extraction and clean markdown conversion. Requires TABSTACK_API_KEY environment variable.
Extract structured data from websites using Tabstack API. Use when you need to scrape job listings, news articles, product pages, or any structured web content. Provides JSON schema-based extraction and clean markdown conversion. Requires TABSTACK_API_KEY environment variable.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
This skill enables structured data extraction from websites using the Tabstack API. It's ideal for web scraping tasks where you need consistent, schema-based data extraction from job boards, news sites, product pages, or any structured content.
# Option A: From GitHub (recommended for sharing) curl -s https://raw.githubusercontent.com/babashka/babashka/master/install | bash # Option B: From Nix nix-shell -p babashka # Option C: From Homebrew brew install borkdude/brew/babashka
Option A: Environment variable (recommended) export TABSTACK_API_KEY="your_api_key_here" Option B: Configuration file mkdir -p ~/.config/tabstack echo '{:api-key "your_api_key_here"}' > ~/.config/tabstack/config.edn Get an API key: Sign up at Tabstack Console
bb scripts/tabstack.clj test
bb scripts/tabstack.clj markdown "https://example.com"
# Start with simple schema (fast, reliable) bb scripts/tabstack.clj json "https://example.com" references/simple_article.json # Try more complex schemas (may be slower) bb scripts/tabstack.clj json "https://news.site" references/news_schema.json
# Extract with retry logic (3 retries, 1s delay) bb scripts/tabstack.clj json-retry "https://example.com" references/simple_article.json # Extract with caching (24-hour cache) bb scripts/tabstack.clj json-cache "https://example.com" references/simple_article.json # Batch extract from URLs file echo "https://example.com" > urls.txt echo "https://example.org" >> urls.txt bb scripts/tabstack.clj batch urls.txt references/simple_article.json
Extract clean, readable markdown from any webpage. Useful for content analysis, summarization, or archiving. When to use: When you need the textual content of a page without the HTML clutter. Example use cases: Extract article content for summarization Archive webpage content Analyze blog post content
Extract structured data using JSON schemas. Define exactly what data you want and get it in a consistent format. When to use: When scraping job listings, product pages, news articles, or any structured data. Example use cases: Scrape job listings from BuiltIn/LinkedIn Extract product details from e-commerce sites Gather news articles with consistent metadata
Pre-built schemas for common scraping tasks. See references/ directory for templates. Available schemas: Job listing schema (see references/job_schema.json) News article schema Product page schema Contact information schema
Follow this workflow to scrape job listings: Identify target sites - BuiltIn, LinkedIn, company career pages Choose or create schema - Use references/job_schema.json or customize Test extraction - Run a single page to verify schema works Scale up - Process multiple URLs Store results - Save to database or file Example job schema: { "type": "object", "properties": { "title": {"type": "string"}, "company": {"type": "string"}, "location": {"type": "string"}, "description": {"type": "string"}, "salary": {"type": "string"}, "apply_url": {"type": "string"}, "posted_date": {"type": "string"}, "requirements": {"type": "array", "items": {"type": "string"}} } }
Use web_search to find relevant URLs Use Tabstack to extract structured data from those URLs Store results in Datalevin (future skill)
Use browser tool to navigate complex sites Extract page URLs Use Tabstack for structured extraction
Common issues and solutions: Authentication failed - Check TABSTACK_API_KEY environment variable Invalid URL - Ensure URL is accessible and correct Schema mismatch - Adjust schema to match page structure Rate limiting - Add delays between requests
tabstack.clj - Main API wrapper in Babashka (recommended, has retry logic, caching, batch processing) tabstack_curl.sh - Bash/curl fallback (simple, no dependencies) tabstack_api.py - Python API wrapper (requires requests module)
job_schema.json - Template schema for job listings api_reference.md - Tabstack API documentation
Start small - Test with single pages before scaling Respect robots.txt - Check site scraping policies Add delays - Avoid overwhelming target sites Validate schemas - Test schemas on sample pages Handle errors gracefully - Implement retry logic for failed requests
This skill is designed to teach agents how to use Tabstack API effectively. The key is learning to create appropriate JSON schemas for different websites.
Start Simple - Use references/simple_article.json (4 basic fields) Test Extensively - Try schemas on multiple page types Iterate - Add fields based on what the page actually contains Optimize - Remove unnecessary fields for speed See Schema Creation Guide for detailed instructions and examples.
Over-complex schemas - Start with 2-3 fields, not 20 Missing fields - Don't require fields that don't exist on the page No testing - Always test with example.com first, then target sites Ignoring timeouts - Complex schemas take longer (45s timeout)
Using Babashka for this skill provides: Single binary - Easy to share/install (GitHub releases, brew, nix) Fast startup - No JVM warmup, ~50ms startup time Built-in HTTP client - No external dependencies Clojure syntax - Familiar to you (Wes), expressive Retry logic & caching - Built into the skill Batch processing - Parallel extraction for multiple URLs
For this skill to trigger: "Scrape job listings from Docker careers page" "Extract the main content from this article" "Get structured product data from this e-commerce page" "Pull all the news articles from this site" "Extract contact information from this company page" "Batch extract job listings from these 20 URLs" "Get cached results for this page (avoid API calls)"
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.