← All skills
Tencent SkillHub Β· Developer Tools

Twitter Scraper

Scrapes public Twitter/X profiles and recent tweets using browser automation with anti-detection and optional profile discovery via Google or DuckDuckGo.

skill openclawclawhub Free
0 Downloads
0 Stars
0 Installs
0 Score
High Signal

Scrapes public Twitter/X profiles and recent tweets using browser automation with anti-detection and optional profile discovery via Google or DuckDuckGo.

⬇ 0 downloads β˜… 0 stars Unverified but indexed

Install for OpenClaw

Quick setup
  1. Download the package from Yavira.
  2. Extract the archive and review SKILL.md first.
  3. Import or place the package into your OpenClaw setup.

Requirements

Target platform
OpenClaw
Install method
Manual import
Extraction
Extract archive
Prerequisites
OpenClaw
Primary doc
SKILL.md

Package facts

Download mode
Yavira redirect
Package format
ZIP package
Source platform
Tencent SkillHub
What's included
SKILL.md

Validation

  • Use the Yavira download entry.
  • Review SKILL.md after the package is downloaded.
  • Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

  1. Download the package from Yavira.
  2. Extract it into a folder your agent can access.
  3. Paste one of the prompts below and point your agent at the extracted folder.
New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.

Trust & source

Release facts

Source
Tencent SkillHub
Verification
Indexed source record
Version
0.1.2

Documentation

ClawHub primary doc Primary doc: SKILL.md 21 sections Open source page

Twitter/X Profile Scraper

A browser-based Twitter/X profile discovery and scraping tool. Part of ScrapeClaw β€” a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required. --- name: twitter-scraper description: Discover and scrape Twitter/X public profiles from your browser. emoji: 🐦 version: 1.0.2 author: influenza tags: - twitter - x - scraping - social-media - profile-discovery - influencer-discovery metadata: clawdbot: requires: bins: - python3 - chromium config: stateDirs: - data/output - data/queue - thumbnails outputFormats: - json - csv ---

Overview

This skill provides a two-phase Twitter/X scraping system: Profile Discovery β€” Find Twitter accounts via Google Custom Search API or DuckDuckGo Browser Scraping β€” Scrape public profiles using Playwright with anti-detection (no login required)

Features

πŸ” - Discover Twitter/X profiles by location and category 🌐 - Full browser simulation for accurate scraping πŸ›‘οΈ - Browser fingerprinting, human behavior simulation, and stealth scripts πŸ“Š - Profile info, followers, tweets, engagement data, and media πŸ’Ύ - JSON/CSV export with downloaded thumbnails πŸ”„ - Resume interrupted scraping sessions ⚑ - Auto-skip private accounts, low-follower profiles, suspended users 🌍 - Built-in residential proxy support with 4 providers Getting Google API Credentials (Optional) Go to Google Cloud Console Create a new project or select existing Enable "Custom Search API" Create API credentials β†’ API Key Go to Programmable Search Engine Create a search engine with x.com and twitter.com as the sites to search Copy the Search Engine ID If not configured, discovery falls back to DuckDuckGo (no API key needed).

Agent Tool Interface

For OpenClaw agent integration, the skill provides JSON output: # Discover Twitter profiles (returns JSON) discover --location "Miami" --category "tech" --output json # Discover profiles in a specific category (returns JSON) discover --location "New York" --category "crypto" --output json # Scrape single profile (returns JSON) scrape --username elonmusk --output json # Scrape from a queue file scrape data/queue/Miami_tech_20260220_120000.json

Profile Data Structure

{ "username": "elonmusk", "display_name": "Elon Musk", "bio": "...", "followers": 180000000, "following": 800, "tweets_count": 45000, "is_verified": true, "profile_pic_url": "https://...", "profile_pic_local": "thumbnails/elonmusk/profile_abc123.jpg", "user_location": "Mars & Earth", "join_date": "June 2009", "website": "https://x.ai", "influencer_tier": "mega", "category": "tech", "scrape_location": "New York", "scraped_at": "2026-02-17T12:00:00", "recent_tweets": [ { "id": "1234567890", "text": "Tweet content...", "timestamp": "2026-02-17T10:30:00.000Z", "likes": 50000, "retweets": 12000, "replies": 3000, "views": "5.2M", "media_urls": ["https://..."], "media_local": ["thumbnails/elonmusk/tweet_media_0_def456.jpg"], "is_retweet": false, "is_reply": false, "url": "https://x.com/elonmusk/status/1234567890" } ] }

Queue File Structure

{ "location": "New York", "category": "tech", "total": 15, "usernames": ["user1", "user2", "..."], "completed": ["user1"], "failed": {"user3": "not_found"}, "current_index": 2, "created_at": "2026-02-17T12:00:00", "source": "google_api" }

Influencer Tiers

TierFollowers Rangenano< 1,000micro1,000 - 10,000mid10,000 - 100,000macro100,000 - 1Mmega> 1,000,000

File Outputs

Queue files: data/queue/{location}_{category}_{timestamp}.json Scraped data: data/output/{username}.json Thumbnails: thumbnails/{username}/profile_*.jpg, thumbnails/{username}/tweet_media_*.jpg Export files: data/export_{timestamp}.json, data/export_{timestamp}.csv

Configuration

Edit config/scraper_config.json: { "proxy": { "enabled": false, "provider": "brightdata", "country": "", "sticky": true, "sticky_ttl_minutes": 10 }, "google_search": { "enabled": true, "api_key": "", "search_engine_id": "", "queries_per_location": 3 }, "scraper": { "headless": false, "min_followers": 500, "max_tweets": 20, "download_thumbnails": true, "max_thumbnails": 6, "delay_between_profiles": [4, 8], "timeout": 60000 }, "cities": ["New York", "Los Angeles", "Miami", "Chicago"], "categories": ["tech", "politics", "sports", "entertainment", "news", "crypto"] }

Filters Applied

The scraper automatically filters out: ❌ Suspended or deactivated accounts ❌ Protected (private) accounts ❌ Profiles with < 500 followers (configurable) ❌ Non-existent usernames ❌ Already scraped entries (deduplication)

Anti-Detection

The scraper uses multiple anti-detection techniques: Browser fingerprinting β€” 4 rotating fingerprint profiles (viewport, user agent, timezone, WebGL, etc.) Stealth JavaScript β€” Hides navigator.webdriver, spoofs plugins/languages/hardware, canvas noise, fake chrome object Human behavior simulation β€” Random delays, mouse movements, scrolling patterns Network randomization β€” Variable timing between requests Login wall handling β€” Automatically dismisses Twitter's login prompts and overlays

No Profiles Discovered

Check Google API key and quota Verify Search Engine ID is configured for x.com and twitter.com Try different location/category combinations If Google fails, DuckDuckGo fallback is used automatically

Rate Limiting

Reduce scraping speed (increase delays in config) Run during off-peak hours Use a residential proxy (see below)

Login Wall Issues

The scraper automatically dismisses login prompts If content is blocked, try running with --headless disabled to debug visually

Why Use a Residential Proxy?

Running a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes: AdvantageDescriptionAvoid IP BansResidential IPs look like real household users, not data-center bots. Twitter/X is far less likely to flag them.Automatic IP RotationEach request (or session) gets a fresh IP, so rate-limits never stack up on one address.Geo-TargetingRoute traffic through a specific country/city so scraped content matches the target audience's locale.Sticky SessionsKeep the same IP for a configurable window (e.g. 10 min) β€” critical for maintaining a consistent browsing session.Higher Success RateRotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on Twitter/X.Long-Running ScrapesScrape thousands of profiles over hours or days without interruption.Concurrent ScrapingRun multiple browser instances across different IPs simultaneously.

Recommended Proxy Providers

We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill: ProviderBest ForSign UpBright DataWorld's largest network, 72M+ IPs, enterprise-gradeπŸ‘‰ Get Bright DataIProyalPay-as-you-go, 195+ countries, no traffic expiryπŸ‘‰ Get IProyalStorm ProxiesFast & reliable, developer-friendly API, competitive pricingπŸ‘‰ Get Storm ProxiesNetNutISP-grade network, 52M+ IPs, direct connectivityπŸ‘‰ Get NetNut

Setup Steps

1. Get Your Proxy Credentials Sign up with any provider above, then grab: Username (from your provider dashboard) Password (from your provider dashboard) Host and Port are pre-configured per provider (or use custom) 2. Configure via Environment Variables export PROXY_ENABLED=true export PROXY_PROVIDER=brightdata # brightdata | iproyal | stormproxies | netnut | custom export PROXY_USERNAME=your_user export PROXY_PASSWORD=your_pass export PROXY_COUNTRY=us # optional: two-letter country code export PROXY_STICKY=true # optional: keep same IP per session 3. Provider-Specific Host/Port Defaults These are auto-configured when you set the provider name: ProviderHostPortBright Databrd.superproxy.io22225IProyalproxy.iproyal.com12321Storm Proxiesrotating.stormproxies.com9999NetNutgw-resi.netnut.io5959 Override with PROXY_HOST / PROXY_PORT env vars if your plan uses a different gateway. 4. Custom Proxy Provider For any other proxy service, set provider to custom and supply host/port manually: { "proxy": { "enabled": true, "provider": "custom", "host": "your.proxy.host", "port": 8080, "username": "user", "password": "pass" } }

Running the Scraper with Proxy

Once configured, the scraper picks up the proxy automatically β€” no extra flags needed: # Discover and scrape as usual β€” proxy is applied automatically python main.py discover --location "Miami" --category "tech" python main.py scrape --username elonmusk # The log will confirm proxy is active: # INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225> # INFO - Browser using proxy: brightdata β†’ brd.superproxy.io:22225

Using the Proxy Manager Programmatically

from proxy_manager import ProxyManager # From config (auto-reads config/scraper_config.json) pm = ProxyManager.from_config() # From environment variables pm = ProxyManager.from_env() # Manual construction pm = ProxyManager( provider="brightdata", username="your_user", password="your_pass", country="us", sticky=True ) # For Playwright browser context proxy = pm.get_playwright_proxy() # β†’ {"server": "http://brd.superproxy.io:22225", "username": "user-country-us-session-abc123", "password": "pass"} # For requests / aiohttp proxies = pm.get_requests_proxy() # β†’ {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"} # Force new IP (rotates session ID) pm.rotate_session() # Debug info print(pm.info())

Best Practices for Long-Running Scrapes

Use sticky sessions β€” Twitter requires consistent IPs during a browsing session. Set "sticky": true. Target the right country β€” Set "country": "us" (or your target region) so Twitter serves content in the expected locale. Combine with existing anti-detection β€” This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer. Rotate sessions between batches β€” Call pm.rotate_session() between large batches of profiles to get a fresh IP. Use delays β€” Even with proxies, respect delay_between_profiles in config (default 4-8s) to avoid aggressive patterns. Monitor your proxy dashboard β€” All providers have dashboards showing bandwidth usage and success rates.

Notes

No login required β€” Only scrapes publicly visible content Checkpoint/resume β€” Queue files track progress; interrupted scrapes can be resumed with --resume Rate limiting β€” Waits 60s on rate limit, stops on daily limit detection Twitter selectors β€” Uses data-testid attributes (stable across UI changes) with fallbacks to aria-label and structural selectors

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package
1 Docs
  • SKILL.md Primary doc