Tencent SkillHub · Developer Tools

Search Intelligence Skill

Advanced AI-powered search skill using SearXNG as the universal search backend. Multi-engine dork generation, 90+ search engines, intelligent search strategi...

skill openclawclawhub Free

0 Downloads

0 Stars

0 Installs

0 Score

High Signal

Advanced AI-powered search skill using SearXNG as the universal search backend. Multi-engine dork generation, 90+ search engines, intelligent search strategi...

⬇ 0 downloads ★ 0 stars Unverified but indexed

Install for OpenClaw

Quick setup

Download the package from Yavira.
Extract the archive and review SKILL.md first.
Import or place the package into your OpenClaw setup.

Requirements

Target platform: OpenClaw
Install method: Manual import
Extraction: Extract archive
Prerequisites: OpenClaw
Primary doc: SKILL.md

Package facts

Download mode: Yavira redirect
Package format: ZIP package
Source platform: Tencent SkillHub
What's included: README.md, SKILL.md, examples/usage.py, requirements.txt, search_dorks_skill/__init__.py, search_dorks_skill/analyzer.py

Validation

Use the Yavira download entry.
Review SKILL.md after the package is downloaded.
Confirm the extracted package contains the expected setup assets.

Install with your agent

Agent handoff

Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.

Download the package from Yavira.
Extract it into a folder your agent can access.
Paste one of the prompts below and point your agent at the extracted folder.

New install

I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.

Upgrade existing

I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.

Open Send to Agent page Open JSON manifest Open Markdown brief

Trust & source

Release facts

Source: Tencent SkillHub
Verification: Indexed source record
Version: 0.1.1

Provenance

Publisher: elmaslouhymouaad
Source page: View original listing
Canonical URL: Open canonical page

Documentation

ClawHub primary doc Primary doc: SKILL.md 14 sections Open source page

search-intelligence-skill

Use search-intelligence-skill to give any AI agent the ability to search the entire internet like an expert OSINT analyst, SEO engineer, and security researcher combined. All searches flow through your SearXNG instance — zero API keys, full privacy, 90+ engines. The skill generates optimized dork queries, selects intelligent multi-step search strategies, translates operators across engines, routes queries to the best SearXNG engines, scores results by multi-signal relevance, and learns from results to refine searches automatically.

Setup (once)

Install the package # From source (recommended) git clone https://github.com/mouaad-ops/search-intelligence-skill.git cd search-intelligence-skill pip install -e . # Or direct pip pip install search-intelligence-skill # NOT yet working Start a SearXNG instance (if you don't have one) # Docker (quickest) docker run -d \ --name searxng \ -p 8888:8080 \ -e SEARXNG_SECRET=your-secret-key \ searxng/searxng:latest # Verify it's running curl http://localhost:8888/healthz Enable JSON API in SearXNG settings # In searxng/settings.yml — ensure search formats include json search: formats: - html - json Initialize in code from search_intelligence_skill import SearchSkill # Default — localhost:8888 skill = SearchSkill() # Custom instance skill = SearchSkill( searxng_url="http://localhost:8888", timeout=30.0, max_retries=2, rate_limit=0.5, verify_ssl=True, auto_refine=True, max_refine_rounds=1, ) # Verify connection if skill.health_check(): print("✓ SearXNG is reachable") else: print("✗ Cannot reach SearXNG — check URL and port")

Common Commands

Natural language search (the main interface) from search_intelligence_skill import SearchSkill skill = SearchSkill(searxng_url="http://localhost:8888") # Just describe what you want — the skill handles everything: # intent detection, dork generation, engine selection, scoring report = skill.search("find exposed .env files on example.com") # Print LLM-ready formatted output print(report.to_context()) # Access structured results for r in report.top(5): print(f"[{r.relevance:.1f}] {r.title}") print(f" {r.url}") print(f" {r.snippet[:200]}") Control search depth from search_intelligence_skill import Depth # Quick — 1-2 queries, single step, fast lookup report = skill.search("what is CORS", depth="quick") # Standard — 3-6 queries, multi-engine, good default report = skill.search("python async frameworks comparison", depth="standard") # Deep — 6-12 queries, multi-step strategies, thorough research report = skill.search("security audit of target.com", depth="deep") # Exhaustive — 12+ queries, full OSINT chains, complete sweep report = skill.search("full recon on suspect-domain.com", depth="exhaustive") Security scanning — exposed files and panels report = skill.search( "find exposed .env files, admin panels, and directory listings on example.com", depth="deep", ) print(f"Intent: {report.intent.category.value}/{report.intent.subcategory}") # → Intent: security/exposed_files print(f"Strategy: {report.strategy.name}") # → Strategy: multi_angle print(f"Results: {len(report.results)}") for r in report.top(10): print(f" [{r.relevance:.1f}] {r.title} — {r.url}") Security scanning — vulnerability research # CVE research report = skill.search("CVE-2024-3094 xz backdoor exploit details", depth="deep") # Technology-specific vulnerabilities report = skill.search( "Apache Struts remote code execution vulnerabilities 2024", depth="standard", ) # Exposed API endpoints report = skill.search( "find exposed swagger API docs on target.com", depth="deep", ) # Git repository exposure report = skill.search( "exposed .git directories on example.com", depth="deep", ) OSINT investigation — people # By name report = skill.search( 'OSINT investigation on "John Doe" — social media, email, profiles', depth="deep", ) # By email report = skill.search( "investigate john.doe@example.com — find all accounts and mentions", depth="exhaustive", ) # By username report = skill.search( "find all accounts for username @johndoe42", depth="deep", ) # By phone number report = skill.search( "lookup phone number +1-555-123-4567", depth="standard", ) OSINT investigation — domains and companies # Domain reconnaissance report = skill.search( "full domain recon on target.com — subdomains, DNS, certificates, technology stack", depth="exhaustive", ) # Company investigation report = skill.search( 'investigate company "Acme Corp" — employees, filings, data breaches', depth="deep", ) # IP address lookup report = skill.search( "investigate IP 192.168.1.1 — open ports, services, abuse reports", depth="standard", ) SEO analysis # Site indexation check report = skill.search( "SEO indexation analysis of example.com", depth="standard", ) # Backlink research report = skill.search( "find backlinks pointing to example.com", depth="deep", ) # Competitor analysis report = skill.search( "SEO competitor analysis for example.com — related sites, ranking keywords", depth="deep", ) # Technical SEO audit report = skill.search( "technical SEO check on example.com — sitemap, robots.txt, canonical, hreflang", depth="deep", ) Academic research # Find papers report = skill.search( "latest research papers on transformer architecture scaling laws 2024", depth="standard", ) # Find datasets report = skill.search( "download dataset for sentiment analysis benchmark CSV", depth="standard", ) # Find authors and their work report = skill.search( 'research publications by author "Yann LeCun" on deep learning', depth="deep", ) Code and developer search # Find repositories report = skill.search( "python library for PDF text extraction with OCR support", depth="standard", ) # Find packages report = skill.search( "npm package for real-time WebSocket pub/sub", depth="standard", ) # Debug errors report = skill.search( "RuntimeError: CUDA out of memory pytorch solution", depth="standard", ) # Find documentation report = skill.search( "FastAPI dependency injection documentation examples", depth="quick", ) File hunting # Find specific file types report = skill.search( "machine learning cheat sheet filetype:pdf", depth="standard", ) # Find datasets report = skill.search( "US census data 2023 download CSV", depth="standard", ) # Find configuration files report = skill.search( "docker-compose example microservices filetype:yaml", depth="standard", ) News search # Recent news report = skill.search( "latest news on AI regulation this week", depth="standard", ) # Breaking news report = skill.search( "breaking news today cybersecurity", depth="quick", ) # News analysis report = skill.search( "analysis of EU AI Act implications for startups", depth="standard", ) Image and video search # Images report = skill.search( "high resolution photos of Mars surface NASA", depth="standard", ) # Videos report = skill.search( "video tutorial on Kubernetes deployment strategies", depth="standard", ) Social media search # Reddit discussions report = skill.search( "reddit discussion about best self-hosted alternatives to Google Photos", depth="standard", ) # Forum threads report = skill.search( "forum thread comparing Proxmox vs ESXi for home lab", depth="standard", ) Direct dork query (no intent parsing) # Execute a raw dork you've written yourself report = skill.search_dork( 'site:github.com "API_KEY" filetype:env', engines=["google", "bing"], ) print(report.to_context()) Preview queries without executing them # See what dork queries would be generated dorks = skill.suggest_queries( "find SQL injection vulnerabilities on target.com" ) for d in dorks: print(f" Query: {d.query}") print(f" Operators: {d.operators_used}") print(f" Purpose: {d.purpose}") print() Build a custom dork from parameters dork = skill.build_dork( keyword="confidential", domain="example.com", filetype="pdf", intitle="report", exclude=["public", "template"], exact_match=True, ) print(f"Generated: {dork.query}") # → site:example.com filetype:pdf intitle:"report" -public -template "confidential" # Execute it report = skill.search_dork(dork.query) Execute a named strategy against a target # Full OSINT chain report = skill.execute_strategy( strategy_name="osint_chain", target="suspect-domain.com", depth="exhaustive", ) # Deep security dive report = skill.execute_strategy( strategy_name="deep_dive", target="target.com", depth="deep", ) # File hunting report = skill.execute_strategy( strategy_name="file_hunt", target="example.com", depth="deep", ) # Temporal trend analysis report = skill.execute_strategy( strategy_name="temporal", target="AI regulation news", depth="deep", ) Batch search — multiple queries at once queries = [ "python FastAPI vs Flask performance", "rust web frameworks comparison 2024", "go gin framework documentation", ] reports = skill.search_batch(queries, depth="quick") for report in reports: print(f"Query: {report.query}") print(f" Results: {len(report.results)}") print(f" Best: {report.top(1)[0].title if report.results else 'None'}") print() Override engine and category selection # Force specific engines report = skill.search( "quantum computing breakthroughs", engines=["google_scholar", "arxiv", "semantic_scholar"], ) # Force specific categories report = skill.search( "kubernetes tutorial", categories=["it", "general"], ) # Force time range report = skill.search( "zero-day vulnerabilities", time_range="week", ) # Force language report = skill.search( "machine learning tutorials", language="en", ) Working with the SearchReport object report = skill.search("advanced persistent threats 2024", depth="standard") # LLM-ready text (for injecting into AI agent context) context = report.to_context(max_results=20) # Top N results sorted by relevance top5 = report.top(5) # Full result list all_results = report.results # What was detected print(f"Intent: {report.intent.category.value}") # e.g. "security" print(f"Subcategory: {report.intent.subcategory}") # e.g. "general" print(f"Entities: {report.intent.entities}") # e.g. {"year": "2024"} print(f"Keywords: {report.intent.keywords}") # e.g. ["advanced", "persistent", "threats"] print(f"Confidence: {report.intent.confidence:.0%}") # e.g. "80%" # What strategy ran print(f"Strategy: {report.strategy.name}") # e.g. "multi_angle" print(f"Steps: {len(report.strategy.steps)}") # e.g. 2 # Performance metrics print(f"Total found: {report.total_found}") # before dedup print(f"Final results: {len(report.results)}") # after dedup+scoring print(f"Time: {report.timing_seconds:.2f}s") print(f"Engines used: {report.engines_used}") # Suggested refinements print(f"Suggestions: {report.suggestions}") # Errors (if any) print(f"Errors: {report.errors}") Working with individual SearchResult objects for r in report.top(10): print(f"Title: {r.title}") print(f"URL: {r.url}") print(f"Snippet: {r.snippet[:300]}") print(f"Relevance: {r.relevance:.2f} / 10.0") print(f"Engines: {r.engines}") # which SearXNG engines returned this print(f"Score: {r.score}") # raw SearXNG score print(f"Category: {r.category}") # SearXNG result category print(f"Positions: {r.positions}") # rank positions across engines print(f"Metadata: {r.metadata}") # publishedDate, thumbnail, etc. print()

AI Agent Integration

Basic tool handler from search_intelligence_skill import SearchSkill skill = SearchSkill(searxng_url="http://localhost:8888") def handle_search_tool(user_query: str) -> str: """Called by the AI agent when it needs to search the web.""" report = skill.search(user_query, depth="standard") return report.to_context() With depth control from agent def handle_search_tool(user_query: str, depth: str = "standard") -> str: report = skill.search(user_query, depth=depth) return report.to_context() Returning structured data to agent def handle_search_tool(user_query: str) -> dict: report = skill.search(user_query, depth="standard") return { "query": report.query, "intent": f"{report.intent.category.value}/{report.intent.subcategory}", "confidence": report.intent.confidence, "result_count": len(report.results), "results": [ { "title": r.title, "url": r.url, "snippet": r.snippet[:500], "relevance": round(r.relevance, 2), "engines": r.engines, } for r in report.top(10) ], "suggestions": report.suggestions, "engines_used": report.engines_used, "time_seconds": round(report.timing_seconds, 2), } OpenAI function calling / tool definition search_tool_schema = { "type": "function", "function": { "name": "web_search", "description": ( "Search the internet using advanced dork queries and multi-engine strategies. " "Supports security scanning, OSINT, SEO analysis, academic research, " "code search, file hunting, and general web search. " "Describe what you want to find in natural language." ), "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "Natural language search query describing what to find", }, "depth": { "type": "string", "enum": ["quick", "standard", "deep", "exhaustive"], "description": "Search thoroughness: quick (1-2 queries), standard (3-6), deep (6-12), exhaustive (12+)", "default": "standard", }, }, "required": ["query"], }, }, } LangChain tool wrapper from langchain.tools import Tool from search_intelligence_skill import SearchSkill skill = SearchSkill(searxng_url="http://localhost:8888") search_tool = Tool( name="web_search", description=( "Advanced web search with dork generation and multi-engine strategies. " "Input a natural language query. Supports security, OSINT, SEO, academic, " "code, file, and general searches." ), func=lambda q: skill.search(q, depth="standard").to_context(), ) Context manager for clean resource handling with SearchSkill(searxng_url="http://localhost:8888") as skill: report = skill.search("find open redirects on example.com") print(report.to_context()) # HTTP client is automatically closed

Using Individual Components Directly

IntentParser — analyze queries without searching from search_intelligence_skill import IntentParser parser = IntentParser() intent = parser.parse("find exposed .env files on example.com") print(f"Category: {intent.category.value}") # security print(f"Subcategory: {intent.subcategory}") # exposed_files print(f"Entities: {intent.entities}") # {"domain": "example.com"} print(f"Keywords: {intent.keywords}") # ["exposed", "env", "files"] print(f"Depth: {intent.depth.value}") # standard print(f"Time range: {intent.time_range}") # "" print(f"Confidence: {intent.confidence:.0%}") # 95% print(f"Constraints: {intent.constraints}") # {} DorkGenerator — generate queries without searching from search_intelligence_skill import DorkGenerator, IntentParser parser = IntentParser() gen = DorkGenerator() intent = parser.parse("OSINT investigation on john@example.com") dorks = gen.generate(intent) for d in dorks: print(f" [{', '.join(d.operators_used)}] {d.query}") print(f" Purpose: {d.purpose}") # Build a custom dork manually custom = gen.generate_custom( keyword="secret", domain="example.com", filetype="env", intitle="config", exclude=["test", "demo"], exact_match=True, ) print(f"Custom: {custom.query}") # Translate a Google dork to Yandex syntax yandex_dork = gen.translate(custom, target_engine="yandex") print(f"Yandex: {yandex_dork.query}") # Translate to Bing bing_dork = gen.translate(custom, target_engine="bing") print(f"Bing: {bing_dork.query}") ResultAnalyzer — score and analyze results from search_intelligence_skill import ResultAnalyzer, IntentParser, SearXNGClient client = SearXNGClient(base_url="http://localhost:8888") parser = IntentParser() analyzer = ResultAnalyzer() intent = parser.parse("python web frameworks comparison") raw = client.search("python web frameworks comparison", engines=["google", "bing"]) results = client.parse_results(raw) # Full analysis pipeline: deduplicate → score → sort analyzed = analyzer.analyze(results, intent) for r in analyzed[:5]: print(f"[{r.relevance:.2f}] {r.title}") # Generate refinement suggestions suggestions = analyzer.generate_refinements(analyzed, intent) print(f"Suggestions: {suggestions}") # Get a text summary summary = analyzer.summarize(analyzed, intent) print(summary) client.close() SearXNGClient — direct API access from search_intelligence_skill import SearXNGClient client = SearXNGClient(base_url="http://localhost:8888") # Single search raw = client.search( query='site:github.com "fastapi" filetype:py', engines=["google", "bing", "duckduckgo"], categories=["general"], time_range="month", language="en", pageno=1, safesearch=0, ) # Parse results into SearchResult objects results = client.parse_results(raw) # Get SearXNG suggestions suggestions = client.get_suggestions(raw) # Get spelling corrections corrections = client.get_corrections(raw) # See which engines failed unresponsive = client.get_unresponsive(raw) # Batch search responses = client.search_batch( queries=["query 1", "query 2", "query 3"], engines=["google"], ) # Health check if client.health_check(): print("SearXNG is online") client.close()

Quick Reference

Search Depths from search_intelligence_skill import Depth Depth.QUICK # 1-2 queries, single step, fast lookups Depth.STANDARD # 3-6 queries, multi-engine, general searching Depth.DEEP # 6-12 queries, multi-step, thorough research Depth.EXHAUSTIVE # 12+ queries, full sweep, complete investigations Intent Categories (auto-detected) from search_intelligence_skill import IntentCategory IntentCategory.GENERAL # General web search IntentCategory.SECURITY # Vulnerabilities, exposed files, pentesting IntentCategory.SEO # Indexation, backlinks, competitors, technical SEO IntentCategory.OSINT # People, emails, usernames, domains, companies IntentCategory.ACADEMIC # Papers, datasets, authors, journals IntentCategory.CODE # Repositories, packages, docs, bugs IntentCategory.FILES # Documents, data files, archives, media IntentCategory.NEWS # Breaking news, analysis, trends IntentCategory.IMAGES # Image search IntentCategory.VIDEOS # Video search IntentCategory.SOCIAL # Reddit, forums, discussions IntentCategory.SHOPPING # Products, prices, comparisons IntentCategory.LEGAL # Law, regulations, patents IntentCategory.MEDICAL # Health, diseases, clinical research Search Strategies (auto-selected by depth + intent) # Strategies are selected automatically, but you can also invoke them directly: skill.execute_strategy("quick", target="example.com") # 1 step, top engines skill.execute_strategy("broad_to_narrow", target="example.com") # Wide then focused skill.execute_strategy("multi_angle", target="example.com") # Same topic, different formulations skill.execute_strategy("deep_dive", target="example.com") # Exhaustive dork coverage skill.execute_strategy("osint_chain", target="example.com") # Progressive recon skill.execute_strategy("verify", target="some claim") # Cross-reference sources skill.execute_strategy("file_hunt", target="example.com") # Targeted file search skill.execute_strategy("temporal", target="AI news") # Across time periods Supported SearXNG Engines (90+) # General: google, bing, duckduckgo, brave, qwant, startpage, mojeek, # yandex, yahoo, presearch, wiby, stract, yep, baidu, naver ... # # IT/Dev: github, stackoverflow, gitlab, npm, pypi, dockerhub, # arch_linux_wiki, crates_io, packagist, pkg_go_dev ... # # Science: arxiv, google_scholar, semantic_scholar, crossref, pubmed, # base, openalex, core, wolfram_alpha ... # # News: google_news, bing_news, yahoo_news, brave_news, wikinews ... # # Social: reddit, lemmy, mastodon, hacker_news, lobsters ... # # Images: google_images, bing_images, flickr, unsplash, openverse ... # # Videos: youtube, google_videos, dailymotion, vimeo, piped, odysee ... # # Files: piratebay, 1337x, annas_archive, z_library ... # # Music: bandcamp, genius, soundcloud, youtube_music ... # # Maps: openstreetmap, photon ... # # Wikis: wikipedia, wikidata, wikimedia_commons ... Dork Operators (auto-translated across engines) # Google operators: # site: filetype: intitle: allintitle: inurl: allinurl: # intext: allintext: inanchor: cache: related: info: define: # before: after: AROUND(N) "exact" -exclude OR * N..M # # Bing operators: # site: filetype: intitle: inurl: inbody: contains: ip: # language: loc: prefer: feed: "exact" -exclude OR NEAR:N # # DuckDuckGo operators: # site: filetype: intitle: inurl: "exact" -exclude OR # # Yandex operators: # site: mime: title: inurl: host: domain: lang: date: # "exact" -exclude | # # Brave operators: # site: filetype: intitle: inurl: "exact" -exclude OR # # The skill auto-translates between engines: # filetype: → mime: (Yandex) # intitle: → title: (Yandex) # intext: → inbody: (Bing)

Dork Template Library

Security dorks available (by subcategory) exposed_files — .env, .log, .sql, .bak, .conf, .pem, .key, .json directory_listing — "index of", "directory listing", "parent directory" admin_panels — /admin, /login, /dashboard, wp-admin, phpmyadmin, cpanel sensitive_data — passwords, RSA keys, AWS keys, database URLs, SMTP creds exposed_apis — /api/, swagger, api-docs, graphql, openapi subdomains — site:*.domain, external references, inurl:domain git_exposed — .git, .git/config, .svn, .hg technology_stack — "powered by", wp-content, X-Powered-By general — CVE, exploit, PoC, security advisory OSINT dorks available (by subcategory) person — LinkedIn, Twitter/X, Facebook, Instagram, GitHub, Medium, resume, CV email — email mentions, cross-site, leaks, LinkedIn, GitHub username — GitHub, Reddit, Twitter, Instagram, YouTube, Keybase, StackOverflow domain — site:, subdomains, whois, Shodan, DNS, SSL, Censys, crt.sh company — LinkedIn company, Crunchbase, Glassdoor, SEC filings, employees phone — whitepages, truecaller, Facebook, name/address ip — Shodan, abuse/blacklist, open ports, whois SEO dorks available (by subcategory) indexation — site:, sitemap, blog, tag/category pages backlinks — external mentions, anchor text, link: competitors — related:, same-keyword competitors content_audit — intitle/inurl/intext keyword matching technical_seo — sitemap XML, robots.txt, noindex, canonical, hreflang, schema Academic dorks available (by subcategory) papers — arxiv, ResearchGate, academia.edu, DOI, .edu PDFs datasets — CSV, JSON, Kaggle, HuggingFace, Zenodo authors — Google Scholar, ORCID, ResearchGate, publication lists Code dorks available (by subcategory) repositories — GitHub, GitLab, Bitbucket, Codeberg, Sourcehut packages — npm, PyPI, crates.io, RubyGems, Packagist, pkg.go.dev documentation — ReadTheDocs, README, API references issues_bugs — GitHub issues, StackOverflow errors

Advanced Usage

Cross-engine dork translation from search_intelligence_skill import DorkGenerator gen = DorkGenerator() # Build a Google dork dork = gen.generate_custom( keyword="secret", domain="example.com", filetype="env", intitle="config", ) print(f"Google: {dork.query}") # → site:example.com filetype:env intitle:"config" secret # Translate to Yandex (filetype → mime, intitle → title) yandex = gen.translate(dork, "yandex") print(f"Yandex: {yandex.query}") # → site:example.com mime:env title:"config" secret # Translate to Bing bing = gen.translate(dork, "bing") print(f"Bing: {bing.query}") # Translate to DuckDuckGo (drops unsupported operators) ddg = gen.translate(dork, "duckduckgo") print(f"DDG: {ddg.query}") # Translate to an engine without operator support (strips all operators) plain = gen.translate(dork, "wikipedia") print(f"Plain: {plain.query}") Result scoring details # Each result is scored on 7 signals (0-10 scale): # # 1. SearXNG base score (normalized) — weight: 2.0 # 2. Keyword match in title + snippet — weight: 3.0 # 3. Multi-engine agreement (appeared in N) — weight: 0.5/engine, max 2.0 # 4. Position rank (lower = better) — weight: 1.5 # 5. Source credibility (.gov +1.5, .edu +1.4, arxiv +1.4, etc.) # 6. Content quality (snippet length, HTTPS, URL sanity) # 7. Intent-specific boost (arxiv for academic, github for code, etc.) # # Credibility penalties: spam (-0.7), "click here" (-0.5), "free download" (-0.4) Auto-refinement behavior # When auto_refine=True (default) and results < 5: # 1. Analyzer generates refined queries (broader, different keywords) # 2. Skill executes up to 3 refinement queries # 3. New results are merged with originals # 4. Full dedup + re-scoring runs # 5. Process repeats up to max_refine_rounds skill = SearchSkill( searxng_url="http://localhost:8888", auto_refine=True, max_refine_rounds=2, # Try refining up to 2 times ) # Disable auto-refinement for speed-critical paths skill_fast = SearchSkill( searxng_url="http://localhost:8888", auto_refine=False, ) Entity extraction capabilities from search_intelligence_skill import IntentParser parser = IntentParser() # Domains intent = parser.parse("scan example.com for vulnerabilities") # entities: {"domain": "example.com"} # Emails intent = parser.parse("investigate user@company.com") # entities: {"email": "user@company.com", "email_domain": "company.com"} # IPs intent = parser.parse("lookup 192.168.1.1") # entities: {"ip": "192.168.1.1"} # CVEs intent = parser.parse("details on CVE-2024-3094") # entities: {"cve": "CVE-2024-3094"} # Phone numbers intent = parser.parse("find owner of +1-555-123-4567") # entities: {"phone": "+1-555-123-4567"} # Usernames intent = parser.parse("find accounts for @johndoe42") # entities: {"username": "johndoe42"} # Names (quoted) intent = parser.parse('investigate "John Smith"') # entities: {"name": "John Smith"} # Names (capitalized pattern) intent = parser.parse("find information about Jane Doe") # entities: {"name": "Jane Doe"} # File types intent = parser.parse("find documents filetype:pdf") # entities: {"filetype": "pdf"} # Years intent = parser.parse("research papers from 2024") # entities: {"year": "2024"} # Multiple entities combined intent = parser.parse('CVE-2024-3094 on example.com "John Doe"') # entities: {"cve": "CVE-2024-3094", "domain": "example.com", "name": "John Doe"} Time range detection from search_intelligence_skill import IntentParser parser = IntentParser() parser.parse("news today").time_range # "day" parser.parse("what happened this week").time_range # "week" parser.parse("articles from last month").time_range # "month" parser.parse("publications this year").time_range # "year" parser.parse("latest updates on AI").time_range # "month" (heuristic) parser.parse("history of computing").time_range # "" (no time constraint) Constraint extraction from search_intelligence_skill import IntentParser parser = IntentParser() # Language constraints intent = parser.parse("machine learning tutorials in spanish") # constraints: {"language": "es"} # Exhaustive hints intent = parser.parse("find everything about this vulnerability") # constraints: {"exhaustive": True} # Result limits intent = parser.parse("top 20 python frameworks") # constraints: {"limit": 20} # Exclusion hints intent = parser.parse("web frameworks except Django without Flask") # constraints: {"exclude": ["django", "flask"]} Pagination from search_intelligence_skill import SearXNGClient client = SearXNGClient(base_url="http://localhost:8888") # Fetch multiple pages all_results = [] for page in range(1, 4): raw = client.search("python frameworks", pageno=page) results = client.parse_results(raw) all_results.extend(results) if not results: break print(f"Total across 3 pages: {len(all_results)}") client.close() Rate limiting and retries # Built-in rate limiting between requests skill = SearchSkill( searxng_url="http://localhost:8888", rate_limit=1.0, # 1 second minimum between requests max_retries=3, # Retry failed requests up to 3 times timeout=30.0, # 30 second timeout per request ) # Rate limiting is automatic — no manual sleep() needed # Retries use increasing delays on 429 (Too Many Requests) Logging for debugging import logging # See everything the skill does logging.basicConfig(level=logging.DEBUG) # Or just info-level logging.basicConfig(level=logging.INFO) skill = SearchSkill(searxng_url="http://localhost:8888") report = skill.search("test query", depth="standard") # Logs will show: # INFO — Intent: security/exposed_files (confidence=0.95) — entities: {"domain": "..."} # INFO — Strategy: multi_angle — 2 steps # DEBUG — Executing step 1: Search angle 1 # DEBUG — Search 'site:... filetype:env' returned 12 results # DEBUG — Executing step 2: Search angle 2 # INFO — Search complete: 23 results, 4.21s, 4 engines

API Methods

MethodPurposeReturnsskill.search(query, depth, ...)Full intelligent search pipelineSearchReportskill.search_dork(dork, ...)Execute raw dork query directlySearchReportskill.suggest_queries(query)Preview dorks without executinglist[DorkQuery]skill.build_dork(keyword, ...)Build custom dork from parametersDorkQueryskill.execute_strategy(name, target)Run named strategy against targetSearchReportskill.search_batch(queries, ...)Execute multiple searcheslist[SearchReport]skill.health_check()Check SearXNG connectivityboolskill.close()Close HTTP clientNone

SearchReport Properties

PropertyTypeDescription.querystrOriginal natural language query.intentSearchIntentParsed intent with category, entities, keywords.strategySearchStrategyStrategy that was used (name, steps).resultslist[SearchResult]Scored and deduplicated results.total_foundintTotal results before deduplication.suggestionslist[str]Refinement suggestions.refined_querieslist[str]Auto-refinement queries used.errorslist[str]Errors encountered during search.timing_secondsfloatTotal wall-clock time.engines_usedlist[str]Engines that returned results.to_context(max_results)strLLM-formatted text output.top(n)list[SearchResult]Top N by relevance score

SearchResult Properties

PropertyTypeDescription.titlestrResult title.urlstrResult URL.snippetstrContent snippet / description.engineslist[str]Which SearXNG engines returned it.scorefloatRaw SearXNG score.relevancefloatComputed multi-signal relevance (0-10).categorystrSearXNG result category.positionslist[int]Rank positions across engines.metadatadictExtra fields: publishedDate, thumbnail, img_src

Troubleshooting

SearXNG not reachable # Check the instance is running curl http://localhost:8888/healthz # Check JSON API is enabled curl "http://localhost:8888/search?q=test&format=json" # Common fixes: # 1. Ensure port mapping is correct (docker: -p 8888:8080) # 2. Ensure search.formats includes "json" in settings.yml # 3. Check firewall rules if not skill.health_check(): print("SearXNG unreachable — check URL, port, and settings") No results returned report = skill.search("very specific obscure query") if not report.results: print("No results. Try:") print(" 1. Broader keywords") print(" 2. Different depth: depth='deep'") print(" 3. Check suggestions:", report.suggestions) print(" 4. Check errors:", report.errors) print(" 5. Try different engines:", report.engines_used) # Manual broader search report2 = skill.search("broader version of query", depth="deep") Timeout errors # Increase timeout for complex queries skill = SearchSkill( searxng_url="http://localhost:8888", timeout=60.0, # 60 seconds max_retries=3, # More retries ) Rate limiting (429 errors) # Increase delay between requests skill = SearchSkill( searxng_url="http://localhost:8888", rate_limit=2.0, # 2 seconds between requests ) SSL errors (local development only) skill = SearchSkill( searxng_url="https://localhost:8888", verify_ssl=False, # ONLY for local dev — never in production ) Wrong intent detected # If the auto-detection picks the wrong category, use direct dork: report = skill.search_dork( 'site:example.com filetype:pdf "annual report"', engines=["google", "bing"], ) # Or force engines/categories: report = skill.search( "some ambiguous query", engines=["google_scholar", "arxiv"], categories=["science"], ) Memory usage with large result sets # Limit results to control memory report = skill.search("broad query", depth="exhaustive", max_results=50) # Process results in a streaming fashion for r in report.results: process(r) # handle one at a time

How It All Works Together

User Query │ ▼ ┌─────────────────┐ │ IntentParser │──→ category, subcategory, entities, keywords └────────┬────────┘ │ ▼ ┌─────────────────┐ │ DorkGenerator │──→ 5-20 optimized dork queries with operators └────────┬────────┘ │ ▼ ┌─────────────────┐ │ StrategyPlanner │──→ multi-step plan (which dorks, which engines, what order) └────────┬────────┘ │ ▼ ┌─────────────────┐ │ SearXNGClient │──→ executes queries against your instance (retries, rate limit) └────────┬────────┘ │ ▼ ┌─────────────────┐ │ ResultAnalyzer │──→ dedup, score, rank, credibility check └────────┬────────┘ │ (if results poor) │ ▼ ┌─────────────────┐ │ Auto-Refine │──→ generate new queries, re-search, re-analyze └────────┬────────┘ │ ▼ SearchReport .to_context() → LLM-ready text .top(n) → best results .results → full list

Notes

Privacy All searches route through YOUR SearXNG instance Zero API keys required for any engine No data sent to third-party services (except through SearXNG's engine requests) SearXNG strips tracking parameters and anonymizes requests Performance tips Reuse the SearchSkill instance across searches (connection pooling) Use depth="quick" for simple lookups, reserve "deep" / "exhaustive" for research Set auto_refine=False for speed-critical paths Use skill.suggest_queries() to preview before executing expensive searches Batch independent queries with skill.search_batch() Accuracy tips Include specific entities in your query (domains, emails, CVEs, names) Use quoted phrases for exact matching: 'find "exact phrase"' Specify time ranges when freshness matters: "latest news this week" Use depth="deep" or "exhaustive" for comprehensive coverage Check report.suggestions for refinement ideas Check report.intent to verify the skill understood your query correctly Extending the skill Add new dork templates in config.py → DORK_TEMPLATES Add new intent signals in config.py → INTENT_SIGNALS Add new engines in config.py → ENGINE_CATEGORIES Add new operator translations in config.py → OPERATOR_SUPPORT Add new strategies in config.py → STRATEGY_DEFINITIONS Add new subcategory detection in intent.py → SUBCATEGORY_PATTERNS Confirm before sensitive operations Security scanning dorks may trigger alerts on target domains OSINT queries may involve personal information — use responsibly Always validate that the target domain/entity is authorized for testing This tool is for legitimate research, authorized security testing, and SEO analysis

Category context

Code helpers, APIs, CLIs, browser automation, testing, and developer operations.

Source: Tencent SkillHub

Largest current source with strong distribution and engagement signals.

Package contents

Included in package

3 Scripts2 Docs1 Files

SKILL.md Primary doc
README.md Docs
examples/usage.py Scripts
search_dorks_skill/__init__.py Scripts
search_dorks_skill/analyzer.py Scripts
requirements.txt Files