Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Design and manage structured evidence-based interviews, including scorecards, question banks, rubrics, panel coordination, evaluation, and offer decision sup...
Design and manage structured evidence-based interviews, including scorecards, question banks, rubrics, panel coordination, evaluation, and offer decision sup...
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Complete hiring interview system β from job scorecard design through structured question banks, live evaluation rubrics, panel coordination, and offer decisions. Eliminates gut-feel hiring with evidence-based frameworks that predict on-the-job performance.
Tell me what you need: "Design interviews for [role]" β Full interview plan (scorecard + questions + rubrics) "Create a scorecard for [role]" β A-Player definition with measurable outcomes "Generate questions for [skill/competency]" β Targeted question bank "Build a take-home assignment for [role]" β Technical assessment with rubric "Evaluate this candidate" β Structured debrief with scoring "Audit our interview process" β Bias check + effectiveness review
Rule: Never look at a resume before defining what "great" looks like.
scorecard: role: "[Title]" level: "[Junior/Mid/Senior/Staff/Principal/Director/VP]" team: "[Team name]" hiring_manager: "[Name]" created: "YYYY-MM-DD" mission: statement: "[One sentence: why does this role exist?]" success_metric: "[How we'll know this hire was successful in 12 months]" outcomes: # 3-5 specific, measurable results expected in first 12 months - outcome: "[e.g., Reduce deployment time from 45min to <10min]" measure: "[Metric: deployment duration, measured via CI/CD logs]" timeline: "Q1-Q2" priority: "critical" - outcome: "[e.g., Ship v2 API with 99.9% uptime]" measure: "[Uptime %, error rate, customer adoption]" timeline: "Q2-Q3" priority: "critical" - outcome: "[e.g., Mentor 2 junior engineers to mid-level competency]" measure: "[Promotion readiness assessment, PR quality metrics]" timeline: "Q3-Q4" priority: "important" competencies: technical: must_have: - name: "[e.g., System design]" level: "[Novice/Competent/Proficient/Expert]" evidence: "[What demonstrates this: e.g., designed systems handling 10K+ RPS]" - name: "[e.g., TypeScript/React]" level: "Proficient" evidence: "[Shipped production TS/React apps, not just tutorials]" nice_to_have: - name: "[e.g., Kubernetes]" level: "Competent" behavioral: must_have: - name: "Ownership" definition: "Takes responsibility for outcomes, not just tasks. Doesn't wait to be told." anti_pattern: "Says 'that's not my job' or 'I was told to do X'" - name: "Communication" definition: "Explains complex ideas simply. Writes clear docs. Raises issues early." anti_pattern: "Surprises stakeholders. Can't explain their own work." - name: "Growth mindset" definition: "Seeks feedback. Admits mistakes. Improves from failure." anti_pattern: "Defensive about criticism. Repeats same mistakes." nice_to_have: - name: "[e.g., Cross-functional leadership]" cultural: values_alignment: - "[Company value 1: what this looks like in practice]" - "[Company value 2: what this looks like in practice]" anti_signals: - "[Red flag behavior 1]" - "[Red flag behavior 2]" compensation: band: "[min - max]" equity: "[range if applicable]" flexibility: "[What's negotiable]" deal_breakers: # Hard no's β instant disqualification - "[e.g., Cannot start within 4 weeks]" - "[e.g., No experience with production systems at scale]" - "[e.g., Requires >30% above band]"
Before proceeding, verify: Mission statement is one sentence (not a paragraph) Each outcome has a specific number or metric (not "improve" or "help with") Competencies distinguish must-have from nice-to-have Anti-patterns defined for each behavioral competency Deal breakers are objective (not subjective feelings) Band is realistic for the market (check levels.fyi, Glassdoor)
interview_loop: role: "[from scorecard]" total_duration: "[X hours across Y sessions]" stages: - stage: "Resume Screen" duration: "5-10 min" who: "Recruiter or hiring manager" evaluates: ["deal_breakers", "basic_qualification"] pass_rate_target: "30-40%" - stage: "Phone Screen" duration: "30 min" who: "Hiring manager" evaluates: ["communication", "motivation", "outcome_1_capability"] format: "Structured conversation" pass_rate_target: "50%" - stage: "Technical Assessment" duration: "60-90 min" who: "Senior engineer" evaluates: ["technical_competencies"] format: "Live coding OR take-home (see Phase 4)" pass_rate_target: "40-50%" - stage: "System Design" duration: "45-60 min" who: "Staff+ engineer" evaluates: ["system_design", "trade_off_thinking", "communication"] format: "Whiteboard/collaborative design" pass_rate_target: "50%" applies_to: "Senior+ only" - stage: "Behavioral Deep-Dive" duration: "45-60 min" who: "Hiring manager + cross-functional partner" evaluates: ["behavioral_competencies", "cultural_values"] format: "STAR-based structured interview" pass_rate_target: "60%" - stage: "Team Fit / Reverse Interview" duration: "30 min" who: "2-3 potential teammates" evaluates: ["collaboration_style", "candidate_questions"] format: "Informal but structured" pass_rate_target: "80%" - stage: "Hiring Manager Final" duration: "30 min" who: "Hiring manager" evaluates: ["remaining_concerns", "motivation", "offer_readiness"] format: "Conversation" timeline: screen_to_onsite: "< 5 business days" onsite_to_decision: "< 2 business days" decision_to_offer: "< 1 business day" total_process: "< 3 weeks"
LevelSkipAddEmphasisJunior (0-2 yr)System designPair programming, learning abilityPotential > experienceMid (2-5 yr)ββBalanced: execution + growthSenior (5-8 yr)βArchitecture discussionImpact, ownership, mentoringStaff (8+ yr)Basic codingDesign doc review, strategyInfluence, technical visionPrincipalBasic codingVision presentation, exec interviewOrg-wide impactManagerLive codingSkip-level, cross-functionalPeople outcomes, strategyDirector+All IC technicalBoard/exec presentationBusiness impact, org building
For each question below: Ask the main question Then probe with: "Walk me through specifically what YOU did" (not the team) Then probe with: "What was the measurable result?" Watch for: vague answers, "we" without "I", unable to recall specifics Ownership & Initiative Q: "Tell me about a time you identified a problem no one asked you to fix, and you fixed it anyway." Probe: "How did you discover it? What did you do first? What was the outcome?" Green signal: Specific problem, proactive action, measurable impact Red flag: Can't recall an example, or problem was trivial Q: "Describe a project that failed or didn't meet expectations. What was your role?" Probe: "What would you do differently? What did you learn?" Green signal: Owns their part, specific lessons, changed behavior afterward Red flag: Blames others, no learning, defensive Q: "Tell me about the last time you disagreed with your manager's technical decision." Probe: "How did you raise it? What happened? Would you do it differently?" Green signal: Respectful pushback with data, compromise or acceptance Red flag: Never disagrees, or went around manager, or still bitter Communication & Collaboration Q: "Describe the most complex technical concept you had to explain to a non-technical audience." Probe: "How did you know they understood? What would you change?" Green signal: Adapts language, checks understanding, uses analogies Red flag: Talks down, uses jargon anyway, frustrated by the need Q: "Tell me about a cross-team project that had conflicting priorities." Probe: "How did you align the teams? What trade-offs were made?" Green signal: Proactive communication, documented agreements, escalated appropriately Red flag: Waited for someone else to resolve, or steamrolled Q: "Give me an example of written communication that had significant impact." Probe: "What was the context? Who was the audience? What resulted?" Green signal: Design doc, RFC, post-mortem that changed decisions Red flag: Can't think of one, or only Slack messages Technical Excellence Q: "What's the best piece of code or system you've built? Walk me through it." Probe: "What trade-offs did you make? What would you change now?" Green signal: Deep understanding, clear trade-off reasoning, honest about flaws Red flag: Can't go deep, no awareness of trade-offs Q: "Tell me about a production incident you were involved in resolving." Probe: "How did you diagnose it? What was root cause? What prevented recurrence?" Green signal: Systematic debugging, root cause fix (not band-aid), prevention measures Red flag: Only applied quick fix, blamed infrastructure, no follow-up Q: "Describe a time you had to make a technical decision with incomplete information." Probe: "What did you know? What didn't you know? How did you decide?" Green signal: Explicit about unknowns, gathered what they could, made reversible decision Red flag: Paralyzed, or overconfident without data Leadership & Mentoring (Senior+) Q: "Tell me about someone you helped grow significantly in their career." Probe: "What did you specifically do? How did you know it was working?" Green signal: Specific actions (pair programming, stretch assignments, feedback), measurable growth Red flag: "I told them what to do" or can't name anyone Q: "Describe a technical strategy or vision you set for your team." Probe: "How did you get buy-in? How did you measure progress?" Green signal: Clear rationale, stakeholder alignment, adapted based on feedback Red flag: Top-down mandate, or never set direction Q: "Tell me about a time you had to say no to a stakeholder or product request." Probe: "How did you explain it? What was the outcome?" Green signal: Data-driven reasoning, offered alternatives, maintained relationship Red flag: Just said no, or always says yes
For each resume highlight, design verification questions: Pattern: "[Impressive claim on resume]" β "Walk me through [specific project]. What was the state when you joined?" β "What was YOUR specific contribution vs the team's?" β "What was the hardest technical problem YOU solved?" β "If I called your manager from that time, what would they say was your biggest weakness?" Pattern: "Led team of X" β "How many people reported to you directly?" β "Name someone you had to give tough feedback to. What happened?" β "Who was the weakest performer? What did you do about it?" Pattern: "Improved X by Y%" β "What was the baseline measurement? How did you measure it?" β "What was it before you started? After? How long did it take?" β "What else changed during that period that could explain the improvement?" Pattern: "Short tenure (< 1 year)" β "Walk me through your decision to leave [company]." β "What would your manager there say about your departure?" β "What did you learn from that experience?" Pattern: "Gap in employment" β Ask once, move on. Don't dwell. Valid reasons: health, family, travel, learning, job market. β Red flag only if: story keeps changing, or they're evasive about a very long gap
Design scenario questions based on the actual role's outcomes: Template: "In this role, one of your first challenges will be [outcome from scorecard]. The current situation is [honest context]. Walk me through how you'd approach this in your first [timeframe]." Example (Senior Backend): "Our API currently handles 2K RPS but we need to scale to 50K by Q3. The codebase is a 3-year-old Node.js monolith with PostgreSQL. Budget for infrastructure is $10K/mo. Team is 4 engineers including you. How would you approach this?" Probe sequence: 1. "What would you do in week 1?" (Information gathering) 2. "What data would you need?" (Analytical thinking) 3. "What are the biggest risks?" (Risk awareness) 4. "If [constraint changes], how does your approach change?" (Adaptability) 5. "How would you communicate progress to stakeholders?" (Communication) Scoring: 5 β Structured approach, asks clarifying questions, identifies trade-offs, realistic timeline 4 β Good approach with minor gaps 3 β Reasonable but generic, doesn't probe assumptions 2 β Jumps to solution without understanding problem 1 β No coherent approach, or unrealistic
coding_assessment: duration: "60 min" structure: warm_up: "5 min β environment setup, introduce the problem" problem_1: "20 min β core implementation" problem_2: "25 min β extension or new problem" debrief: "10 min β trade-offs discussion" problem_design_rules: - Solvable in the time limit (test it yourself first β halve your time) - Multiple valid approaches (no single "right answer") - Extension points for stronger candidates - Relevant to actual work (not algorithm puzzles unless role requires it) - Candidate chooses their language - Provide starter code / boilerplate to reduce setup time evaluation_rubric: problem_solving: 5: "Breaks down problem, considers edge cases upfront, efficient approach" 3: "Gets to solution but misses edge cases or takes indirect path" 1: "Struggles to break down problem, no clear approach" code_quality: 5: "Clean, readable, well-named, handles errors, testable" 3: "Works but messy, some error handling, reasonable naming" 1: "Barely works, no error handling, unclear naming" communication: 5: "Thinks aloud, explains trade-offs, asks clarifying questions" 3: "Some explanation, responds to prompts" 1: "Silent, defensive about suggestions, doesn't explain reasoning" testing_awareness: 5: "Writes tests unprompted, considers edge cases, talks about test strategy" 3: "Writes tests when prompted, covers happy path" 1: "No testing consideration" speed_and_fluency: 5: "Fast, clearly experienced, language/tooling fluent" 3: "Reasonable pace, occasional lookups" 1: "Very slow, struggles with syntax/tooling" do_not: - Ask trick questions or gotchas - Time pressure beyond reasonable - Penalize for looking things up - Judge IDE/editor choice - Ask questions that require proprietary knowledge
take_home: time_limit: "3-4 hours (honor system, state clearly)" deadline: "5-7 days from send" problem_design: - Real-world scenario (not academic) - Clear requirements with defined scope - Extension section for candidates who want to show more - Starter repo with CI, linting, test framework pre-configured deliverables: required: - Working solution - Tests (at minimum: happy path + 2 edge cases) - README explaining approach, trade-offs, what you'd improve optional: - Architecture diagram - Performance analysis - Additional features from extension section evaluation_rubric: functionality: "30% β Does it work? Edge cases handled?" code_quality: "25% β Clean, readable, maintainable, well-structured" testing: "20% β Coverage, meaningful tests, edge cases" documentation: "15% β README quality, trade-off explanations" extras: "10% β Extension features, thoughtful additions" anti_gaming: - Check git history (single mega-commit = suspicious) - Ask about implementation details in follow-up interview - Vary the problem slightly across candidates - Time the follow-up discussion: over-engineered solutions + can't explain = red flag
system_design: duration: "45-60 min" structure: requirements: "10 min β clarify scope, constraints, scale" high_level: "15 min β components, data flow, API design" deep_dive: "15 min β pick 1-2 areas to go deep" trade_offs: "10 min β discuss alternatives, failure modes" extensions: "5 min β how would this evolve?" evaluation: requirements_gathering: 5: "Asks about scale, users, latency requirements, budget before designing" 3: "Some clarifying questions but misses key constraints" 1: "Jumps straight to drawing boxes" high_level_design: 5: "Clear components with well-defined boundaries, data flows make sense" 3: "Reasonable architecture but some unclear responsibilities" 1: "Vague boxes with arrows, can't explain data flow" depth: 5: "Deep knowledge in chosen area, considers failure modes, cites real experience" 3: "Good knowledge but stays surface level" 1: "Can't go deep on any component" trade_off_awareness: 5: "Explicitly names trade-offs, compares alternatives, knows when each fits" 3: "Acknowledges trade-offs when prompted" 1: "Presents one approach as the only option" scalability: 5: "Considers growth path, bottleneck identification, realistic scaling strategy" 3: "Basic scaling awareness" 1: "No consideration of scale or unrealistic assumptions"
interviewer_scorecard: candidate: "[name]" interviewer: "[name]" stage: "[which interview]" date: "YYYY-MM-DD" # Score BEFORE reading other interviewers' feedback overall: 1-5 # 1=Strong No, 2=Lean No, 3=Neutral, 4=Lean Yes, 5=Strong Yes competency_scores: - competency: "[from scorecard]" score: 1-5 evidence: "[Specific quote or behavior observed]" - competency: "[from scorecard]" score: 1-5 evidence: "[Specific quote or behavior observed]" green_signals: - "[Specific positive indicator with evidence]" red_flags: - "[Specific concern with evidence]" questions_for_next_interviewer: - "[What to probe further]" # IMPORTANT: Submit before debrief. Do not change after discussion.
1. BEFORE debrief: - All interviewers submit scorecards independently - Hiring manager collects but does NOT share scores 2. DEBRIEF structure (30-45 min): a. Each interviewer states their overall vote FIRST (no explanation yet) β This prevents anchoring bias from persuasive speakers b. Lowest scorer goes first (explain concerns) β Prevents positive bias from drowning out concerns c. Highest scorer responds d. Open discussion β focus on EVIDENCE not feelings β "They seemed smart" is not evidence β "They designed a cache invalidation strategy that handled..." IS evidence e. Address conflicting signals: β If strong yes + strong no on same competency, that's the discussion β Resolve with: "What specific behavior did you observe?" f. Final vote (all interviewers): β Strong Hire / Hire / No Hire / Strong No Hire β Any "Strong No Hire" triggers discussion but NOT automatic rejection β Hiring manager makes final call but must document reasoning 3. AFTER debrief: - Decision recorded with reasoning - Feedback compiled for candidate (regardless of outcome) - Action items assigned (offer prep or rejection with feedback)
Strong Hire (all 4-5): β Make offer within 24 hours β Expedite process β strong candidates have multiple offers Hire (mix of 3-5, no 1s): β Make offer within 48 hours β Address any 3-scores with targeted onboarding plan Borderline (mix of 2-4): β Additional data needed β one more focused interview on weak areas β Set a deadline: if still borderline after additional data β No Hire β "When in doubt, don't hire" β the cost of a bad hire > cost of continuing search No Hire (any 1, or multiple 2s): β Decline with specific, constructive feedback β Document clearly for future reference (candidate may reapply) Strong No Hire (multiple 1s or deal breaker): β Immediate decline β Review: did we miss this in screening? Fix the funnel.
Before each interview, remind yourself: β‘ I will evaluate against the SCORECARD, not my "gut feeling" β‘ I will give the same weight to disconfirming evidence as confirming β‘ I will not let one great/terrible answer color the entire evaluation β‘ I will not compare this candidate to the last one β compare to the scorecard β‘ I will note specific behaviors, not general impressions β‘ I will not evaluate "culture fit" as "would I have a beer with them"
BiasWhat It Looks LikeMitigationHalo effectGreat at coding β assume great at everythingScore each competency independentlyHorn effectWeak communication β assume weak technicallySame: score independentlySimilarity bias"Reminds me of me" β favorable ratingEvaluate against scorecard, not selfAnchoringFirst impression sets the toneScore after all questions, not duringConfirmation biasEarly positive β only notice positivesActively look for counter-evidenceContrast effectLooks great after a weak candidateCompare to scorecard, not other candidatesRecency biasRemember last answer, forget firstTake notes during interviewAttribution errorSuccess = skill, failure = circumstancesProbe both: "What went wrong? What helped?"Leniency biasAvoid conflict, rate everyone 3-4Force yourself to use the full 1-5 scaleUrgency bias"We need someone NOW" β lower barNever lower scorecard standards β extend timeline instead
Same questions for same role β every candidate gets the same core questions Score immediately after β before discussing with anyone Evidence-based only β every score needs a specific observation Diverse panel β at least one interviewer from a different team/background Blind resume screen β remove name, school, company names for initial screen (if possible) No leading questions β "You're probably great at X, right?" β "Tell me about your experience with X" Time-boxed β same duration for every candidate (don't cut short or extend based on vibes)
After each stage β within 24 hours: ADVANCING: "Hi [name], thank you for your time today. We enjoyed our conversation about [specific topic]. We'd like to move forward with [next stage]. [Interviewer name] will be speaking with you about [topic]. Available times: [options]. Any questions before then? β [recruiter name]" REJECTION (after phone screen): "Hi [name], thank you for taking the time to speak with us about [role]. After careful consideration, we've decided not to move forward at this stage. [One specific, constructive piece of feedback if appropriate]. We'll keep your information on file and may reach out for future opportunities that align more closely. Wishing you the best in your search. β [name]" REJECTION (after onsite): "Hi [name], thank you for investing [X hours] in our interview process. We were impressed by [specific positive], but ultimately decided to move forward with a candidate whose [specific competency] more closely matches our current needs. Feedback: [1-2 specific, actionable items]. We genuinely appreciated your time and would welcome a future conversation if circumstances change. β [hiring manager name]" OFFER (verbal, then written within 24h): "Hi [name], I'm excited to share that we'd like to offer you the [role] position. We were particularly impressed by [specific evidence from interviews]. Here's what we're proposing: [comp summary]. I'll send the formal offer letter within 24 hours. Do you have any initial questions? β [hiring manager]"
After every hire (and quarterly for all candidates): DimensionTargetHow to MeasureTime to schedule< 48h between stagesTrack in ATSInterviewer preparedness100% read scorecard beforePost-interview surveyCommunication timeliness< 24h responseTrack in ATSFeedback qualitySpecific + actionableCandidate surveyOverall experience4+/5Candidate survey (all, not just hires)Offer acceptance rate> 80%Track in ATS
quarterly_review: period: "Q[N] YYYY" funnel_metrics: applications: N screens_passed: N # β Screen pass rate onsites: N # β Onsite conversion rate offers: N # β Offer rate accepts: N # β Acceptance rate quality_metrics: ninety_day_retention: "X%" manager_satisfaction_90d: "X/5" time_to_productivity: "X weeks" regretted_attrition_1yr: "X%" process_metrics: time_to_fill: "X days (target: <30)" time_in_stage: screen: "X days" onsite: "X days" decision: "X days" offer: "X days" interviewer_calibration: "score variance across interviewers" actions: - "[Improvement 1 based on metrics]" - "[Improvement 2]"
For each question in your bank, track: question_effectiveness: question: "[question text]" times_asked: N signal_quality: strong_differentiator: N # Times this question clearly separated strong/weak no_signal: N # Times everyone answered similarly confusing: N # Times candidates misunderstood # If no_signal > 50% β Replace the question # If confusing > 20% β Reword the question # If strong_differentiator > 70% β Keep and promote
Use SAME scorecard as external (fairness) Different question strategy: focus on future role, not past (you already know their past) If not selected: manager delivers feedback personally, development plan, timeline for re-candidacy Never promise the internal candidate gets special treatment
Add: reference checks (5+ structured, including back-channel) Add: board/exec team dinner (culture, not evaluation) Add: 90-day plan presentation as final stage Extended scorecard: strategic thinking, board management, talent magnetism Use executive search firm for sourcing, but own evaluation internally
Standardize EVERYTHING: same questions, same rubric, same order Use structured scoring sheets, not free-form notes Batch calibration sessions weekly Consider: group assessment centers for initial stages Track: quality variance across hiring managers (should be low)
Test tech setup before the interview (not during) Camera on (both sides) β non-verbal cues matter Record (with consent) for calibration purposes Take-home > live coding for timezone-challenged candidates Bias alert: don't penalize for background noise, accent, or non-native English fluency
Treat as new candidate (things change) Skip: basic company knowledge questions Focus: why they left, what changed, what they learned outside Check: has the team/role changed since they left? Do current team members want them back?
If candidate receives counteroffer: Don't panic-increase. Your offer should already be fair. "We made our best offer based on the value of the role. We'd love to have you, but understand if you decide to stay." Statistics: 80% of people who accept counteroffers leave within 18 months anyway If they stay: respect it, keep the door open
SayI Do"Design interviews for [role]"Full loop: scorecard + structure + questions + rubrics"Create a scorecard for [role]"A-Player definition with outcomes and competencies"Generate behavioral questions for [competency]"STAR questions with probes and scoring"Build a take-home for [role]"Assessment with rubric and anti-gaming measures"Design a system design interview for [level]"Structure + evaluation rubric"Evaluate candidate [name]"Structured debrief template with scoring"Create a phone screen for [role]"30-min structured screen with pass/fail criteria"Write rejection feedback for [candidate]"Specific, constructive rejection message"Audit our interview process"Full process review with metrics and recommendations"Calibrate interviewers"Calibration session plan with scoring alignment"Design interview for [role] at [company stage]"Adjusted for startup/growth/enterprise context"Generate reference check questions for [role]"Structured reference interview guide
Code helpers, APIs, CLIs, browser automation, testing, and developer operations.
Largest current source with strong distribution and engagement signals.