{
  "schemaVersion": "1.0",
  "item": {
    "slug": "youtube-scrapper",
    "name": "Youtube Scrapper",
    "source": "tencent",
    "type": "skill",
    "category": "开发工具",
    "sourceUrl": "https://clawhub.ai/ArulmozhiV/youtube-scrapper",
    "canonicalUrl": "https://clawhub.ai/ArulmozhiV/youtube-scrapper",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadMode": "redirect",
    "downloadUrl": "/downloads/youtube-scrapper",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=youtube-scrapper",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "installMethod": "Manual import",
    "extraction": "Extract archive",
    "prerequisites": [
      "OpenClaw"
    ],
    "packageFormat": "ZIP package",
    "includedAssets": [
      "SKILL.md"
    ],
    "primaryDoc": "SKILL.md",
    "quickSetup": [
      "Download the package from Yavira.",
      "Extract the archive and review SKILL.md first.",
      "Import or place the package into your OpenClaw setup."
    ],
    "agentAssist": {
      "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
      "steps": [
        "Download the package from Yavira.",
        "Extract it into a folder your agent can access.",
        "Paste one of the prompts below and point your agent at the extracted folder."
      ],
      "prompts": [
        {
          "label": "New install",
          "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
        },
        {
          "label": "Upgrade existing",
          "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
        }
      ]
    },
    "sourceHealth": {
      "source": "tencent",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-07T17:22:31.273Z",
      "expiresAt": "2026-05-14T17:22:31.273Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=afrexai-annual-report",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=afrexai-annual-report",
        "contentDisposition": "attachment; filename=\"afrexai-annual-report-1.0.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null
      },
      "scope": "source",
      "summary": "Source download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this source.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/youtube-scrapper"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    },
    "downloadPageUrl": "https://openagent3.xyz/downloads/youtube-scrapper",
    "agentPageUrl": "https://openagent3.xyz/skills/youtube-scrapper/agent",
    "manifestUrl": "https://openagent3.xyz/skills/youtube-scrapper/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/youtube-scrapper/agent.md"
  },
  "agentAssist": {
    "summary": "Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.",
    "steps": [
      "Download the package from Yavira.",
      "Extract it into a folder your agent can access.",
      "Paste one of the prompts below and point your agent at the extracted folder."
    ],
    "prompts": [
      {
        "label": "New install",
        "body": "I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete."
      },
      {
        "label": "Upgrade existing",
        "body": "I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run."
      }
    ]
  },
  "documentation": {
    "source": "clawhub",
    "primaryDoc": "SKILL.md",
    "sections": [
      {
        "title": "YouTube Channel Scraper",
        "body": "A browser-based YouTube channel discovery and scraping tool.\n\nPart of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required.\n\n---\nname: youtube-scrapper\ndescription: Discover and scrape YouTube channels from your browser.\nemoji: 📺\nversion: 1.0.2\nauthor: influenza\ntags:\n  - youtube\n  - scraping\n  - social-media\n  - channel-discovery\n  - influencer-discovery\nmetadata:\n  clawdbot:\n    requires:\n      bins:\n        - python3\n        - chromium\n\n    config:\n      stateDirs:\n        - data/output\n        - data/queue\n        - thumbnails\n      outputFormats:\n        - json\n        - csv\n---"
      },
      {
        "title": "Overview",
        "body": "This skill provides a two-phase YouTube scraping system:\n\nChannel Discovery — Find YouTube channels via Google Search (browser-based, no API key required)\nBrowser Scraping — Scrape public channel data using Playwright with anti-detection (no login required)"
      },
      {
        "title": "Features",
        "body": "🔍  - Discover YouTube channels by location and category\n🌐  - Full browser simulation for accurate scraping\n🛡️  - Browser fingerprinting, human behavior simulation, and stealth scripts\n📊  - Channel info, subscribers, views, videos, engagement data, and media\n💾  - JSON export with downloaded thumbnails\n🔄  - Resume interrupted scraping sessions\n⚡  - Auto-skip unavailable channels and low-subscriber profiles\n🌍  - Built-in residential proxy support with 4 providers\n🗺️  - Regional configs for US, UK, Europe, India, Gulf, and East Asia"
      },
      {
        "title": "Agent Tool Interface",
        "body": "For OpenClaw agent integration, the skill provides JSON output:\n\n# Discover YouTube channels (returns JSON queue)\npython scripts/youtube_channel_discovery.py --categories tech --locations India\n\n# Scrape from a queue file\npython scripts/youtube_channel_scraper.py --queue data/queue/your_queue_file.json\n\n# Full orchestration — discover + scrape in one go\npython scripts/youtube_orchestrator.py --config resources/scraper_config_ind.json"
      },
      {
        "title": "Channel Data Structure",
        "body": "{\n  \"channel_name\": \"Marques Brownlee\",\n  \"channel_url\": \"https://www.youtube.com/@mkbhd\",\n  \"subscribers\": 19200000,\n  \"total_views\": 4500000000,\n  \"video_count\": 1800,\n  \"description\": \"MKBHD: Quality Tech Videos...\",\n  \"joined_date\": \"Mar 21, 2008\",\n  \"country\": \"United States\",\n  \"profile_pic_url\": \"https://...\",\n  \"profile_pic_local\": \"thumbnails/mkbhd/profile_abc123.jpg\",\n  \"banner_url\": \"https://...\",\n  \"banner_local\": \"thumbnails/mkbhd/banner_def456.jpg\",\n  \"influencer_tier\": \"mega\",\n  \"category\": \"tech\",\n  \"scrape_location\": \"New York\",\n  \"scraped_at\": \"2026-02-17T12:00:00\",\n  \"recent_videos\": [\n    {\n      \"title\": \"Galaxy S26 Ultra Review\",\n      \"url\": \"https://www.youtube.com/watch?v=...\",\n      \"views\": 5200000,\n      \"published\": \"2 days ago\",\n      \"duration\": \"14:32\",\n      \"thumbnail_url\": \"https://...\",\n      \"thumbnail_local\": \"thumbnails/mkbhd/video_0_ghi789.jpg\"\n    }\n  ]\n}"
      },
      {
        "title": "Queue File Structure",
        "body": "{\n  \"location\": \"India\",\n  \"category\": \"tech\",\n  \"total\": 20,\n  \"channels\": [\"@channel1\", \"@channel2\", \"...\"],\n  \"completed\": [\"@channel1\"],\n  \"failed\": {\"@channel3\": \"not_found\"},\n  \"current_index\": 2,\n  \"created_at\": \"2026-02-17T12:00:00\",\n  \"source\": \"google_search\"\n}"
      },
      {
        "title": "Influencer Tiers",
        "body": "TierSubscribers Rangenano< 1,000micro1,000 – 10,000mid10,000 – 100,000macro100,000 – 1Mmega> 1,000,000"
      },
      {
        "title": "File Outputs",
        "body": "Queue files: data/queue/{region}/{location}_{category}_{timestamp}.json\nScraped data: data/output_{region}/{channel_name}.json\nThumbnails: thumbnails_{region}/{channel}/profile_*.jpg, thumbnails_{region}/{channel}/video_*.jpg\nProgress: data/progress/discovery_progress_{region}.json"
      },
      {
        "title": "Configuration",
        "body": "Regional config files live in resources/:\n\nresources/scraper_config_us.json\nresources/scraper_config_uk.json\nresources/scraper_config_eur.json\nresources/scraper_config_ind.json\nresources/scraper_config_gulf.json\nresources/scraper_config_east.json\n\nExample config (resources/scraper_config_ind.json):\n\n{\n  \"proxy\": {\n    \"enabled\": false,\n    \"provider\": \"brightdata\",\n    \"country\": \"\",\n    \"sticky\": true,\n    \"sticky_ttl_minutes\": 10\n  },\n  \"categories\": [\n    \"gaming\", \"tech\", \"beauty\", \"fashion\", \"fitness\",\n    \"food\", \"travel\", \"music\", \"education\", \"comedy\",\n    \"lifestyle\", \"cooking\", \"diy\", \"art\", \"finance\",\n    \"health\", \"entertainment\"\n  ],\n  \"locations\": [\n    \"India\", \"Mumbai\", \"Delhi\", \"Bangalore\", \"Hyderabad\",\n    \"Chennai\", \"Kolkata\", \"Pune\", \"Ahmedabad\", \"Jaipur\"\n  ],\n  \"max_videos_to_scrape\": 6,\n  \"headless\": false,\n  \"results_per_search\": 20,\n  \"search_delay\": [3, 7],\n  \"scrape_delay\": [2, 5],\n  \"rate_limit_wait\": 60,\n  \"max_retries\": 3\n}"
      },
      {
        "title": "Filters Applied",
        "body": "The scraper automatically filters out:\n\n❌ Unavailable or terminated channels\n❌ Channels with < 500 subscribers (configurable)\n❌ Non-existent channel URLs\n❌ Already scraped entries (deduplication)\n❌ Rate-limited requests (auto-retry with backoff)"
      },
      {
        "title": "Anti-Detection",
        "body": "The scraper uses multiple anti-detection techniques:\n\nBrowser fingerprinting — Rotating fingerprint profiles (viewport, user agent, timezone, WebGL, etc.)\nStealth JavaScript — Hides navigator.webdriver, spoofs plugins/languages/hardware, canvas noise, fake chrome object\nHuman behavior simulation — Random delays, mouse movements, scrolling patterns\nNetwork randomization — Variable timing between requests\nRequest interception — Blocks known fingerprinting and tracking scripts"
      },
      {
        "title": "No Channels Discovered",
        "body": "Try different location/category combinations\nCheck if Google Search is returning CAPTCHA pages\nRun with --headless false to debug visually"
      },
      {
        "title": "Rate Limiting",
        "body": "Reduce scraping speed (increase delays in config)\nRun during off-peak hours\nUse a residential proxy (see below)"
      },
      {
        "title": "Browser Crashes",
        "body": "The orchestrator auto-restarts the browser every 50 channels\nInterrupted scrapes can be resumed — queue files track progress automatically"
      },
      {
        "title": "Why Use a Residential Proxy?",
        "body": "Running a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:\n\nAdvantageDescriptionAvoid IP BansResidential IPs look like real household users, not data-center bots. YouTube is far less likely to flag them.Automatic IP RotationEach request (or session) gets a fresh IP, so rate-limits never stack up on one address.Geo-TargetingRoute traffic through a specific country/city so scraped content matches the target audience's locale.Sticky SessionsKeep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a consistent browsing session.Higher Success RateRotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on YouTube.Long-Running ScrapesScrape thousands of channels over hours or days without interruption.Concurrent ScrapingRun multiple browser instances across different IPs simultaneously."
      },
      {
        "title": "Recommended Proxy Providers",
        "body": "We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:\n\nProviderBest ForSign UpBright DataWorld's largest network, 72M+ IPs, enterprise-grade👉 Get Bright DataIProyalPay-as-you-go, 195+ countries, no traffic expiry👉 Get IProyalStorm ProxiesFast & reliable, developer-friendly API, competitive pricing👉 Get Storm ProxiesNetNutISP-grade network, 52M+ IPs, direct connectivity👉 Get NetNut"
      },
      {
        "title": "Setup Steps",
        "body": "1. Get Your Proxy Credentials\n\nSign up with any provider above, then grab:\n\nUsername (from your provider dashboard)\nPassword (from your provider dashboard)\nHost and Port are pre-configured per provider (or use custom)\n\n2. Configure via Environment Variables\n\nexport PROXY_ENABLED=true\nexport PROXY_PROVIDER=brightdata    # brightdata | iproyal | stormproxies | netnut | custom\nexport PROXY_USERNAME=your_user\nexport PROXY_PASSWORD=your_pass\nexport PROXY_COUNTRY=us             # optional: two-letter country code\nexport PROXY_STICKY=true            # optional: keep same IP per session\n\n3. Provider-Specific Host/Port Defaults\n\nThese are auto-configured when you set the provider name:\n\nProviderHostPortBright Databrd.superproxy.io22225IProyalproxy.iproyal.com12321Storm Proxiesrotating.stormproxies.com9999NetNutgw-resi.netnut.io5959\n\nOverride with PROXY_HOST / PROXY_PORT env vars if your plan uses a different gateway.\n\n4. Custom Proxy Provider\n\nFor any other proxy service, set provider to custom and supply host/port manually:\n\n{\n  \"proxy\": {\n    \"enabled\": true,\n    \"provider\": \"custom\",\n    \"host\": \"your.proxy.host\",\n    \"port\": 8080,\n    \"username\": \"user\",\n    \"password\": \"pass\"\n  }\n}"
      },
      {
        "title": "Running the Scraper with Proxy",
        "body": "Once configured, the scraper picks up the proxy automatically — no extra flags needed:\n\n# Discover and scrape as usual — proxy is applied automatically\npython scripts/youtube_orchestrator.py --config resources/scraper_config_ind.json\n\n# The log will confirm proxy is active:\n# INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>\n# INFO - Browser using proxy: brightdata → brd.superproxy.io:22225"
      },
      {
        "title": "Using the Proxy Manager Programmatically",
        "body": "from proxy_manager import ProxyManager\n\n# From config (auto-reads config from resources/)\npm = ProxyManager.from_config()\n\n# From environment variables\npm = ProxyManager.from_env()\n\n# Manual construction\npm = ProxyManager(\n    provider=\"brightdata\",\n    username=\"your_user\",\n    password=\"your_pass\",\n    country=\"us\",\n    sticky=True\n)\n\n# For Playwright browser context\nproxy = pm.get_playwright_proxy()\n# → {\"server\": \"http://brd.superproxy.io:22225\", \"username\": \"user-country-us-session-abc123\", \"password\": \"pass\"}\n\n# For requests / aiohttp\nproxies = pm.get_requests_proxy()\n# → {\"http\": \"http://user:pass@host:port\", \"https\": \"http://user:pass@host:port\"}\n\n# Force new IP (rotates session ID)\npm.rotate_session()\n\n# Debug info\nprint(pm.info())"
      },
      {
        "title": "Best Practices for Long-Running Scrapes",
        "body": "Use sticky sessions — YouTube requires consistent IPs during a browsing session. Set \"sticky\": true.\nTarget the right country — Set \"country\": \"us\" (or your target region) so YouTube serves content in the expected locale.\nCombine with existing anti-detection — This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer.\nRotate sessions between batches — Call pm.rotate_session() between large batches of channels to get a fresh IP.\nUse delays — Even with proxies, respect scrape_delay in config (default 2-5s) to avoid aggressive patterns.\nMonitor your proxy dashboard — All providers have dashboards showing bandwidth usage and success rates."
      },
      {
        "title": "Notes",
        "body": "No login required — Only scrapes publicly visible content\nCheckpoint/resume — Queue files track progress; interrupted scrapes can be resumed automatically\nRate limiting — Waits 60s on rate limit, exponential backoff on consecutive failures\nResilient orchestration — Auto-restarts browser, retries failed channels, graceful shutdown on SIGINT/SIGTERM\nRegional configs — Pre-built configs for 6 regions covering 200+ cities worldwide"
      }
    ],
    "body": "YouTube Channel Scraper\n\nA browser-based YouTube channel discovery and scraping tool.\n\nPart of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required.\n\n---\nname: youtube-scrapper\ndescription: Discover and scrape YouTube channels from your browser.\nemoji: 📺\nversion: 1.0.2\nauthor: influenza\ntags:\n  - youtube\n  - scraping\n  - social-media\n  - channel-discovery\n  - influencer-discovery\nmetadata:\n  clawdbot:\n    requires:\n      bins:\n        - python3\n        - chromium\n\n    config:\n      stateDirs:\n        - data/output\n        - data/queue\n        - thumbnails\n      outputFormats:\n        - json\n        - csv\n---\n\nOverview\n\nThis skill provides a two-phase YouTube scraping system:\n\nChannel Discovery — Find YouTube channels via Google Search (browser-based, no API key required)\nBrowser Scraping — Scrape public channel data using Playwright with anti-detection (no login required)\nFeatures\n🔍 - Discover YouTube channels by location and category\n🌐 - Full browser simulation for accurate scraping\n🛡️ - Browser fingerprinting, human behavior simulation, and stealth scripts\n📊 - Channel info, subscribers, views, videos, engagement data, and media\n💾 - JSON export with downloaded thumbnails\n🔄 - Resume interrupted scraping sessions\n⚡ - Auto-skip unavailable channels and low-subscriber profiles\n🌍 - Built-in residential proxy support with 4 providers\n🗺️ - Regional configs for US, UK, Europe, India, Gulf, and East Asia\nUsage\nAgent Tool Interface\n\nFor OpenClaw agent integration, the skill provides JSON output:\n\n# Discover YouTube channels (returns JSON queue)\npython scripts/youtube_channel_discovery.py --categories tech --locations India\n\n# Scrape from a queue file\npython scripts/youtube_channel_scraper.py --queue data/queue/your_queue_file.json\n\n# Full orchestration — discover + scrape in one go\npython scripts/youtube_orchestrator.py --config resources/scraper_config_ind.json\n\nOutput Data\nChannel Data Structure\n{\n  \"channel_name\": \"Marques Brownlee\",\n  \"channel_url\": \"https://www.youtube.com/@mkbhd\",\n  \"subscribers\": 19200000,\n  \"total_views\": 4500000000,\n  \"video_count\": 1800,\n  \"description\": \"MKBHD: Quality Tech Videos...\",\n  \"joined_date\": \"Mar 21, 2008\",\n  \"country\": \"United States\",\n  \"profile_pic_url\": \"https://...\",\n  \"profile_pic_local\": \"thumbnails/mkbhd/profile_abc123.jpg\",\n  \"banner_url\": \"https://...\",\n  \"banner_local\": \"thumbnails/mkbhd/banner_def456.jpg\",\n  \"influencer_tier\": \"mega\",\n  \"category\": \"tech\",\n  \"scrape_location\": \"New York\",\n  \"scraped_at\": \"2026-02-17T12:00:00\",\n  \"recent_videos\": [\n    {\n      \"title\": \"Galaxy S26 Ultra Review\",\n      \"url\": \"https://www.youtube.com/watch?v=...\",\n      \"views\": 5200000,\n      \"published\": \"2 days ago\",\n      \"duration\": \"14:32\",\n      \"thumbnail_url\": \"https://...\",\n      \"thumbnail_local\": \"thumbnails/mkbhd/video_0_ghi789.jpg\"\n    }\n  ]\n}\n\nQueue File Structure\n{\n  \"location\": \"India\",\n  \"category\": \"tech\",\n  \"total\": 20,\n  \"channels\": [\"@channel1\", \"@channel2\", \"...\"],\n  \"completed\": [\"@channel1\"],\n  \"failed\": {\"@channel3\": \"not_found\"},\n  \"current_index\": 2,\n  \"created_at\": \"2026-02-17T12:00:00\",\n  \"source\": \"google_search\"\n}\n\nInfluencer Tiers\nTier\tSubscribers Range\nnano\t< 1,000\nmicro\t1,000 – 10,000\nmid\t10,000 – 100,000\nmacro\t100,000 – 1M\nmega\t> 1,000,000\nFile Outputs\nQueue files: data/queue/{region}/{location}_{category}_{timestamp}.json\nScraped data: data/output_{region}/{channel_name}.json\nThumbnails: thumbnails_{region}/{channel}/profile_*.jpg, thumbnails_{region}/{channel}/video_*.jpg\nProgress: data/progress/discovery_progress_{region}.json\nConfiguration\n\nRegional config files live in resources/:\n\nresources/scraper_config_us.json\nresources/scraper_config_uk.json\nresources/scraper_config_eur.json\nresources/scraper_config_ind.json\nresources/scraper_config_gulf.json\nresources/scraper_config_east.json\n\n\nExample config (resources/scraper_config_ind.json):\n\n{\n  \"proxy\": {\n    \"enabled\": false,\n    \"provider\": \"brightdata\",\n    \"country\": \"\",\n    \"sticky\": true,\n    \"sticky_ttl_minutes\": 10\n  },\n  \"categories\": [\n    \"gaming\", \"tech\", \"beauty\", \"fashion\", \"fitness\",\n    \"food\", \"travel\", \"music\", \"education\", \"comedy\",\n    \"lifestyle\", \"cooking\", \"diy\", \"art\", \"finance\",\n    \"health\", \"entertainment\"\n  ],\n  \"locations\": [\n    \"India\", \"Mumbai\", \"Delhi\", \"Bangalore\", \"Hyderabad\",\n    \"Chennai\", \"Kolkata\", \"Pune\", \"Ahmedabad\", \"Jaipur\"\n  ],\n  \"max_videos_to_scrape\": 6,\n  \"headless\": false,\n  \"results_per_search\": 20,\n  \"search_delay\": [3, 7],\n  \"scrape_delay\": [2, 5],\n  \"rate_limit_wait\": 60,\n  \"max_retries\": 3\n}\n\nFilters Applied\n\nThe scraper automatically filters out:\n\n❌ Unavailable or terminated channels\n❌ Channels with < 500 subscribers (configurable)\n❌ Non-existent channel URLs\n❌ Already scraped entries (deduplication)\n❌ Rate-limited requests (auto-retry with backoff)\nAnti-Detection\n\nThe scraper uses multiple anti-detection techniques:\n\nBrowser fingerprinting — Rotating fingerprint profiles (viewport, user agent, timezone, WebGL, etc.)\nStealth JavaScript — Hides navigator.webdriver, spoofs plugins/languages/hardware, canvas noise, fake chrome object\nHuman behavior simulation — Random delays, mouse movements, scrolling patterns\nNetwork randomization — Variable timing between requests\nRequest interception — Blocks known fingerprinting and tracking scripts\nTroubleshooting\nNo Channels Discovered\nTry different location/category combinations\nCheck if Google Search is returning CAPTCHA pages\nRun with --headless false to debug visually\nRate Limiting\nReduce scraping speed (increase delays in config)\nRun during off-peak hours\nUse a residential proxy (see below)\nBrowser Crashes\nThe orchestrator auto-restarts the browser every 50 channels\nInterrupted scrapes can be resumed — queue files track progress automatically\n🌐 Residential Proxy Support\nWhy Use a Residential Proxy?\n\nRunning a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:\n\nAdvantage\tDescription\nAvoid IP Bans\tResidential IPs look like real household users, not data-center bots. YouTube is far less likely to flag them.\nAutomatic IP Rotation\tEach request (or session) gets a fresh IP, so rate-limits never stack up on one address.\nGeo-Targeting\tRoute traffic through a specific country/city so scraped content matches the target audience's locale.\nSticky Sessions\tKeep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a consistent browsing session.\nHigher Success Rate\tRotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on YouTube.\nLong-Running Scrapes\tScrape thousands of channels over hours or days without interruption.\nConcurrent Scraping\tRun multiple browser instances across different IPs simultaneously.\nRecommended Proxy Providers\n\nWe have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:\n\nProvider\tBest For\tSign Up\nBright Data\tWorld's largest network, 72M+ IPs, enterprise-grade\t👉 Get Bright Data\nIProyal\tPay-as-you-go, 195+ countries, no traffic expiry\t👉 Get IProyal\nStorm Proxies\tFast & reliable, developer-friendly API, competitive pricing\t👉 Get Storm Proxies\nNetNut\tISP-grade network, 52M+ IPs, direct connectivity\t👉 Get NetNut\nSetup Steps\n1. Get Your Proxy Credentials\n\nSign up with any provider above, then grab:\n\nUsername (from your provider dashboard)\nPassword (from your provider dashboard)\nHost and Port are pre-configured per provider (or use custom)\n2. Configure via Environment Variables\nexport PROXY_ENABLED=true\nexport PROXY_PROVIDER=brightdata    # brightdata | iproyal | stormproxies | netnut | custom\nexport PROXY_USERNAME=your_user\nexport PROXY_PASSWORD=your_pass\nexport PROXY_COUNTRY=us             # optional: two-letter country code\nexport PROXY_STICKY=true            # optional: keep same IP per session\n\n3. Provider-Specific Host/Port Defaults\n\nThese are auto-configured when you set the provider name:\n\nProvider\tHost\tPort\nBright Data\tbrd.superproxy.io\t22225\nIProyal\tproxy.iproyal.com\t12321\nStorm Proxies\trotating.stormproxies.com\t9999\nNetNut\tgw-resi.netnut.io\t5959\n\nOverride with PROXY_HOST / PROXY_PORT env vars if your plan uses a different gateway.\n\n4. Custom Proxy Provider\n\nFor any other proxy service, set provider to custom and supply host/port manually:\n\n{\n  \"proxy\": {\n    \"enabled\": true,\n    \"provider\": \"custom\",\n    \"host\": \"your.proxy.host\",\n    \"port\": 8080,\n    \"username\": \"user\",\n    \"password\": \"pass\"\n  }\n}\n\nRunning the Scraper with Proxy\n\nOnce configured, the scraper picks up the proxy automatically — no extra flags needed:\n\n# Discover and scrape as usual — proxy is applied automatically\npython scripts/youtube_orchestrator.py --config resources/scraper_config_ind.json\n\n# The log will confirm proxy is active:\n# INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>\n# INFO - Browser using proxy: brightdata → brd.superproxy.io:22225\n\nUsing the Proxy Manager Programmatically\nfrom proxy_manager import ProxyManager\n\n# From config (auto-reads config from resources/)\npm = ProxyManager.from_config()\n\n# From environment variables\npm = ProxyManager.from_env()\n\n# Manual construction\npm = ProxyManager(\n    provider=\"brightdata\",\n    username=\"your_user\",\n    password=\"your_pass\",\n    country=\"us\",\n    sticky=True\n)\n\n# For Playwright browser context\nproxy = pm.get_playwright_proxy()\n# → {\"server\": \"http://brd.superproxy.io:22225\", \"username\": \"user-country-us-session-abc123\", \"password\": \"pass\"}\n\n# For requests / aiohttp\nproxies = pm.get_requests_proxy()\n# → {\"http\": \"http://user:pass@host:port\", \"https\": \"http://user:pass@host:port\"}\n\n# Force new IP (rotates session ID)\npm.rotate_session()\n\n# Debug info\nprint(pm.info())\n\nBest Practices for Long-Running Scrapes\nUse sticky sessions — YouTube requires consistent IPs during a browsing session. Set \"sticky\": true.\nTarget the right country — Set \"country\": \"us\" (or your target region) so YouTube serves content in the expected locale.\nCombine with existing anti-detection — This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer.\nRotate sessions between batches — Call pm.rotate_session() between large batches of channels to get a fresh IP.\nUse delays — Even with proxies, respect scrape_delay in config (default 2-5s) to avoid aggressive patterns.\nMonitor your proxy dashboard — All providers have dashboards showing bandwidth usage and success rates.\nNotes\nNo login required — Only scrapes publicly visible content\nCheckpoint/resume — Queue files track progress; interrupted scrapes can be resumed automatically\nRate limiting — Waits 60s on rate limit, exponential backoff on consecutive failures\nResilient orchestration — Auto-restarts browser, retries failed channels, graceful shutdown on SIGINT/SIGTERM\nRegional configs — Pre-built configs for 6 regions covering 200+ cities worldwide"
  },
  "trust": {
    "sourceLabel": "tencent",
    "provenanceUrl": "https://clawhub.ai/ArulmozhiV/youtube-scrapper",
    "publisherUrl": "https://clawhub.ai/ArulmozhiV/youtube-scrapper",
    "owner": "ArulmozhiV",
    "version": "0.1.1",
    "license": null,
    "verificationStatus": "Indexed source record"
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/youtube-scrapper",
    "downloadUrl": "https://openagent3.xyz/downloads/youtube-scrapper",
    "agentUrl": "https://openagent3.xyz/skills/youtube-scrapper/agent",
    "manifestUrl": "https://openagent3.xyz/skills/youtube-scrapper/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/youtube-scrapper/agent.md"
  }
}