Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
A skill for discovering and scraping YouTube channels based on categories and locations without requiring API keys or login.
A skill for discovering and scraping YouTube channels based on categories and locations without requiring API keys or login.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
A browser-based YouTube channel discovery and scraping tool. Part of ScrapeClaw β a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required. --- name: youtube-scrapper description: Discover and scrape YouTube channels from your browser. emoji: πΊ version: 1.0.2 author: influenza tags: - youtube - scraping - social-media - channel-discovery - influencer-discovery metadata: clawdbot: requires: bins: - python3 - chromium config: stateDirs: - data/output - data/queue - thumbnails outputFormats: - json - csv ---
This skill provides a two-phase YouTube scraping system: Channel Discovery β Find YouTube channels via Google Search (browser-based, no API key required) Browser Scraping β Scrape public channel data using Playwright with anti-detection (no login required)
π - Discover YouTube channels by location and category π - Full browser simulation for accurate scraping π‘οΈ - Browser fingerprinting, human behavior simulation, and stealth scripts π - Channel info, subscribers, views, videos, engagement data, and media πΎ - JSON export with downloaded thumbnails π - Resume interrupted scraping sessions β‘ - Auto-skip unavailable channels and low-subscriber profiles π - Built-in residential proxy support with 4 providers πΊοΈ - Regional configs for US, UK, Europe, India, Gulf, and East Asia
For OpenClaw agent integration, the skill provides JSON output: # Discover YouTube channels (returns JSON queue) python scripts/youtube_channel_discovery.py --categories tech --locations India # Scrape from a queue file python scripts/youtube_channel_scraper.py --queue data/queue/your_queue_file.json # Full orchestration β discover + scrape in one go python scripts/youtube_orchestrator.py --config resources/scraper_config_ind.json
{ "channel_name": "Marques Brownlee", "channel_url": "https://www.youtube.com/@mkbhd", "subscribers": 19200000, "total_views": 4500000000, "video_count": 1800, "description": "MKBHD: Quality Tech Videos...", "joined_date": "Mar 21, 2008", "country": "United States", "profile_pic_url": "https://...", "profile_pic_local": "thumbnails/mkbhd/profile_abc123.jpg", "banner_url": "https://...", "banner_local": "thumbnails/mkbhd/banner_def456.jpg", "influencer_tier": "mega", "category": "tech", "scrape_location": "New York", "scraped_at": "2026-02-17T12:00:00", "recent_videos": [ { "title": "Galaxy S26 Ultra Review", "url": "https://www.youtube.com/watch?v=...", "views": 5200000, "published": "2 days ago", "duration": "14:32", "thumbnail_url": "https://...", "thumbnail_local": "thumbnails/mkbhd/video_0_ghi789.jpg" } ] }
{ "location": "India", "category": "tech", "total": 20, "channels": ["@channel1", "@channel2", "..."], "completed": ["@channel1"], "failed": {"@channel3": "not_found"}, "current_index": 2, "created_at": "2026-02-17T12:00:00", "source": "google_search" }
TierSubscribers Rangenano< 1,000micro1,000 β 10,000mid10,000 β 100,000macro100,000 β 1Mmega> 1,000,000
Queue files: data/queue/{region}/{location}_{category}_{timestamp}.json Scraped data: data/output_{region}/{channel_name}.json Thumbnails: thumbnails_{region}/{channel}/profile_*.jpg, thumbnails_{region}/{channel}/video_*.jpg Progress: data/progress/discovery_progress_{region}.json
Regional config files live in resources/: resources/scraper_config_us.json resources/scraper_config_uk.json resources/scraper_config_eur.json resources/scraper_config_ind.json resources/scraper_config_gulf.json resources/scraper_config_east.json Example config (resources/scraper_config_ind.json): { "proxy": { "enabled": false, "provider": "brightdata", "country": "", "sticky": true, "sticky_ttl_minutes": 10 }, "categories": [ "gaming", "tech", "beauty", "fashion", "fitness", "food", "travel", "music", "education", "comedy", "lifestyle", "cooking", "diy", "art", "finance", "health", "entertainment" ], "locations": [ "India", "Mumbai", "Delhi", "Bangalore", "Hyderabad", "Chennai", "Kolkata", "Pune", "Ahmedabad", "Jaipur" ], "max_videos_to_scrape": 6, "headless": false, "results_per_search": 20, "search_delay": [3, 7], "scrape_delay": [2, 5], "rate_limit_wait": 60, "max_retries": 3 }
The scraper automatically filters out: β Unavailable or terminated channels β Channels with < 500 subscribers (configurable) β Non-existent channel URLs β Already scraped entries (deduplication) β Rate-limited requests (auto-retry with backoff)
The scraper uses multiple anti-detection techniques: Browser fingerprinting β Rotating fingerprint profiles (viewport, user agent, timezone, WebGL, etc.) Stealth JavaScript β Hides navigator.webdriver, spoofs plugins/languages/hardware, canvas noise, fake chrome object Human behavior simulation β Random delays, mouse movements, scrolling patterns Network randomization β Variable timing between requests Request interception β Blocks known fingerprinting and tracking scripts
Try different location/category combinations Check if Google Search is returning CAPTCHA pages Run with --headless false to debug visually
Reduce scraping speed (increase delays in config) Run during off-peak hours Use a residential proxy (see below)
The orchestrator auto-restarts the browser every 50 channels Interrupted scrapes can be resumed β queue files track progress automatically
Running a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes: AdvantageDescriptionAvoid IP BansResidential IPs look like real household users, not data-center bots. YouTube is far less likely to flag them.Automatic IP RotationEach request (or session) gets a fresh IP, so rate-limits never stack up on one address.Geo-TargetingRoute traffic through a specific country/city so scraped content matches the target audience's locale.Sticky SessionsKeep the same IP for a configurable window (e.g. 10 min) β critical for maintaining a consistent browsing session.Higher Success RateRotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on YouTube.Long-Running ScrapesScrape thousands of channels over hours or days without interruption.Concurrent ScrapingRun multiple browser instances across different IPs simultaneously.
We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill: ProviderBest ForSign UpBright DataWorld's largest network, 72M+ IPs, enterprise-gradeπ Get Bright DataIProyalPay-as-you-go, 195+ countries, no traffic expiryπ Get IProyalStorm ProxiesFast & reliable, developer-friendly API, competitive pricingπ Get Storm ProxiesNetNutISP-grade network, 52M+ IPs, direct connectivityπ Get NetNut
1. Get Your Proxy Credentials Sign up with any provider above, then grab: Username (from your provider dashboard) Password (from your provider dashboard) Host and Port are pre-configured per provider (or use custom) 2. Configure via Environment Variables export PROXY_ENABLED=true export PROXY_PROVIDER=brightdata # brightdata | iproyal | stormproxies | netnut | custom export PROXY_USERNAME=your_user export PROXY_PASSWORD=your_pass export PROXY_COUNTRY=us # optional: two-letter country code export PROXY_STICKY=true # optional: keep same IP per session 3. Provider-Specific Host/Port Defaults These are auto-configured when you set the provider name: ProviderHostPortBright Databrd.superproxy.io22225IProyalproxy.iproyal.com12321Storm Proxiesrotating.stormproxies.com9999NetNutgw-resi.netnut.io5959 Override with PROXY_HOST / PROXY_PORT env vars if your plan uses a different gateway. 4. Custom Proxy Provider For any other proxy service, set provider to custom and supply host/port manually: { "proxy": { "enabled": true, "provider": "custom", "host": "your.proxy.host", "port": 8080, "username": "user", "password": "pass" } }
Once configured, the scraper picks up the proxy automatically β no extra flags needed: # Discover and scrape as usual β proxy is applied automatically python scripts/youtube_orchestrator.py --config resources/scraper_config_ind.json # The log will confirm proxy is active: # INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225> # INFO - Browser using proxy: brightdata β brd.superproxy.io:22225
from proxy_manager import ProxyManager # From config (auto-reads config from resources/) pm = ProxyManager.from_config() # From environment variables pm = ProxyManager.from_env() # Manual construction pm = ProxyManager( provider="brightdata", username="your_user", password="your_pass", country="us", sticky=True ) # For Playwright browser context proxy = pm.get_playwright_proxy() # β {"server": "http://brd.superproxy.io:22225", "username": "user-country-us-session-abc123", "password": "pass"} # For requests / aiohttp proxies = pm.get_requests_proxy() # β {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"} # Force new IP (rotates session ID) pm.rotate_session() # Debug info print(pm.info())
Use sticky sessions β YouTube requires consistent IPs during a browsing session. Set "sticky": true. Target the right country β Set "country": "us" (or your target region) so YouTube serves content in the expected locale. Combine with existing anti-detection β This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer. Rotate sessions between batches β Call pm.rotate_session() between large batches of channels to get a fresh IP. Use delays β Even with proxies, respect scrape_delay in config (default 2-5s) to avoid aggressive patterns. Monitor your proxy dashboard β All providers have dashboards showing bandwidth usage and success rates.
No login required β Only scrapes publicly visible content Checkpoint/resume β Queue files track progress; interrupted scrapes can be resumed automatically Rate limiting β Waits 60s on rate limit, exponential backoff on consecutive failures Resilient orchestration β Auto-restarts browser, retries failed channels, graceful shutdown on SIGINT/SIGTERM Regional configs β Pre-built configs for 6 regions covering 200+ cities worldwide
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.