Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Discover and scrape public Facebook pages and groups by location and category with browser simulation and export data in JSON or CSV formats.
Discover and scrape public Facebook pages and groups by location and category with browser simulation and export data in JSON or CSV formats.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
Part of ScrapeClaw โ a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required. A browser-based Facebook page and group discovery and scraping tool. --- name: facebook-scraper description: Discover and scrape Facebook pages and public groups from your browser. emoji: ๐ version: 1.0.0 author: influenza tags: - facebook - scraping - social-media - page-discovery - group-discovery - business-pages metadata: clawdbot: requires: bins: - python3 - chromium config: stateDirs: - data/output - data/queue - thumbnails outputFormats: - json - csv ---
This skill provides a two-phase Facebook scraping system: Page/Group Discovery Browser Scraping
๐ - Discover Facebook pages and groups by location and category ๐ - Full browser simulation for accurate scraping ๐ก๏ธ - Browser fingerprinting, human behavior simulation, and stealth scripts ๐ - Page/group info, stats, images, and engagement data ๐พ - JSON/CSV export with downloaded thumbnails ๐ - Resume interrupted scraping sessions โก - Auto-skip private groups, low-like pages, empty profiles ๐ - Supports pages, groups, and public profiles via --type flag Getting Google API Credentials (Optional) Go to Google Cloud Console Create a new project or select existing Enable "Custom Search API" Create API credentials โ API Key Go to Programmable Search Engine Create a search engine with facebook.com as the site to search Copy the Search Engine ID
For OpenClaw agent integration, the skill provides JSON output: # Discover Facebook pages (returns JSON) discover --location "Miami" --category "restaurant" --type page --output json # Discover Facebook groups (returns JSON) discover --location "New York" --category "fitness" --type group --output json # Scrape single page (returns JSON) scrape --page-name examplebusiness --output json # Scrape single group (returns JSON) scrape --page-name examplegroup --type group --output json
{ "page_name": "example_business", "display_name": "Example Business", "entity_type": "page", "category": "Restaurant", "subcategory": "Italian Restaurant", "about": "Family-owned Italian restaurant since 1985", "followers": 45000, "page_likes": 42000, "location": "Miami, FL", "address": "123 Main St, Miami, FL 33101", "phone": "+1-555-0123", "email": "info@example.com", "website": "https://example.com", "hours": "Mon-Sat 11AM-10PM", "is_verified": false, "page_tier": "mid", "profile_pic_local": "thumbnails/example_business/profile_abc123.jpg", "cover_photo_local": "thumbnails/example_business/cover_def456.jpg", "recent_posts": [ {"post_url": "https://facebook.com/example_business/posts/123", "reactions": 320, "comments": 45, "shares": 12} ], "scrape_timestamp": "2026-02-20T14:30:00" }
{ "page_name": "example_group", "display_name": "Miami Fitness Community", "entity_type": "group", "about": "A community for fitness enthusiasts in Miami", "members": 15000, "privacy": "Public", "posts_per_day": 25, "location": "Miami", "page_tier": "mid", "profile_pic_local": "thumbnails/example_group/profile_abc123.jpg", "cover_photo_local": "thumbnails/example_group/cover_def456.jpg", "scrape_timestamp": "2026-02-20T14:30:00" }
TierLikes/Members Rangenano< 1,000micro1,000 - 10,000mid10,000 - 100,000macro100,000 - 1Mmega> 1,000,000
Queue files: data/queue/{location}_{category}_{type}_{timestamp}.json Scraped data: data/output/{page_name}.json Thumbnails: thumbnails/{page_name}/profile_*.jpg, thumbnails/{page_name}/cover_*.jpg Export files: data/export_{timestamp}.json, data/export_{timestamp}.csv
Edit config/scraper_config.json: { "google_search": { "enabled": true, "api_key": "", "search_engine_id": "", "queries_per_location": 3 }, "scraper": { "headless": false, "min_likes": 1000, "download_thumbnails": true, "max_thumbnails": 6 }, "cities": ["New York", "Los Angeles", "Miami", "Chicago"], "categories": ["restaurant", "retail", "fitness", "real-estate", "healthcare", "beauty"] }
The scraper automatically filters out: โ Private groups โ Pages with < 1,000 likes (configurable) โ Deactivated or removed pages โ Non-existent pages/groups โ Already scraped entries (deduplication)
Ensure credentials are correct Handle verification codes when prompted Wait if rate limited (the script will auto-retry)
Check Google API key and quota Verify Search Engine ID is configured for facebook.com Try different location/category combinations
Reduce scraping speed (increase delays) Use multiple Facebook accounts Run during off-peak hours Use a residential proxy (see below)
Running a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes: AdvantageDescriptionAvoid IP BansResidential IPs look like real household users, not data-center bots. Facebook is far less likely to flag them.Automatic IP RotationEach request (or session) gets a fresh IP, so rate-limits never stack up on one address.Geo-TargetingRoute traffic through a specific country/city so scraped content matches the target audience's locale.Sticky SessionsKeep the same IP for a configurable window (e.g. 10 min) โ critical for maintaining a Facebook login session.Higher Success RateRotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on Facebook.Long-Running ScrapesScrape thousands of pages/groups over hours or days without interruption.Concurrent ScrapingRun multiple browser instances across different IPs simultaneously.
We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill: ProviderBest ForSign UpBright DataWorld's largest residential network, 72M+ IPs, enterprise-grade๐ Sign Up for Bright DataIProyalPremium residential pool, pay-as-you-go, 195+ countries๐ Sign Up for IProyalStorm ProxiesFast & reliable residential IPs, developer-friendly API๐ Sign Up for Storm ProxiesNetNutISP-grade residential network, 52M+ IPs, direct connectivity๐ Sign Up for NetNut
1. Get Your Proxy Credentials Sign up with any provider above, then grab: Username (from your provider dashboard) Password (from your provider dashboard) Host and Port are pre-configured per provider (or use custom) 2. Configure Entirely via Environment Variables export PROXY_ENABLED=true export PROXY_PROVIDER=netnut # brightdata | iproyal | stormproxies | netnut | custom export PROXY_USERNAME=your_user export PROXY_PASSWORD=your_pass export PROXY_COUNTRY=us # optional: two-letter country code export PROXY_STICKY=true # optional: keep same IP per session 3. Provider-Specific Host/Port Defaults These are auto-configured when you set the provider name: ProviderHostPortBright Databrd.superproxy.io22225IProyalproxy.iproyal.com12321Storm Proxiesrotating.stormproxies.com9999NetNutgw-resi.netnut.io5959 Override with "host" and "port" in config or PROXY_HOST / PROXY_PORT env vars if your plan uses a different gateway. 4. Custom Proxy Provider For any other proxy service, set provider to custom and supply host/port manually: { "proxy": { "enabled": true, "provider": "custom", "host": "your.proxy.host", "port": 8080, "username": "user", "password": "pass" } }
Once configured, the scraper picks up the proxy automatically โ no extra flags needed: # Discover and scrape as usual โ proxy is applied automatically python main.py discover --location "Miami" --category "restaurant" --type page python main.py scrape --page-name examplebusiness # The log will confirm proxy is active: # INFO - Proxy enabled: <ProxyManager provider=netnut enabled host=gw-resi.netnut.io:5959> # INFO - Browser using proxy: netnut โ gw-resi.netnut.io:5959
from proxy_manager import ProxyManager # From config (auto-reads config/scraper_config.json) pm = ProxyManager.from_config() # From environment variables pm = ProxyManager.from_env() # Manual construction pm = ProxyManager( provider="netnut", username="your_user", password="your_pass", country="us", sticky=True ) # For Playwright browser context proxy = pm.get_playwright_proxy() # โ {"server": "http://gw-resi.netnut.io:5959", "username": "user-country-us-session-abc123", "password": "pass"} # For requests / aiohttp proxies = pm.get_requests_proxy() # โ {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"} # Force new IP (rotates session ID) pm.rotate_session() # Debug info print(pm.info())
Always use sticky sessions โ Facebook requires consistent IPs during a login session. Set "sticky": true. Target the right country โ Set "country": "us" (or your target region) so Facebook serves content in the expected locale. Combine with existing anti-detection โ This scraper already has fingerprinting, stealth scripts, and human behavior simulation. The proxy is the final layer. Rotate sessions between accounts โ Call pm.rotate_session() when switching Facebook accounts to get a fresh IP. Use delays โ Even with proxies, respect delay_between_profiles in config (default 5-10s) to avoid aggressive patterns. Monitor your proxy dashboard โ All providers (Bright Data, IProyal, Storm Proxies, NetNut) have dashboards showing bandwidth usage and success rates.
Messaging, meetings, inboxes, CRM, and teammate communication surfaces.
Largest current source with strong distribution and engagement signals.