Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Analyze DNA, RNA, and protein sequences with alignment, variant calling, and expression analysis pipelines.
Analyze DNA, RNA, and protein sequences with alignment, variant calling, and expression analysis pipelines.
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Summarize what changed and any follow-up checks I should run.
On first use, read setup.md for integration guidelines. Create ~/bioinformatics/ with user consent to store project context and preferences.
User needs to analyze biological sequences, run genomic pipelines, or interpret sequencing data. Agent handles sequence alignment, variant calling, expression analysis, and format conversions.
Memory lives in ~/bioinformatics/. See memory-template.md for structure. ~/bioinformatics/ ├── memory.md # Projects, preferences, reference genomes ├── pipelines/ # Saved pipeline configurations └── results/ # Analysis outputs and logs
TopicFileSetup processsetup.mdMemory templatememory-template.mdFile formatsformats.mdTool commandstools.mdRNA-seq pipelinernaseq.mdVariant callingvariants.md
Before any analysis, check input data quality: FASTQ: Run FastQC, check per-base quality, adapter content BAM: Verify sorted, indexed (samtools quickcheck) VCF: Validate format (bcftools view -h) Bad input → garbage output. Always QC first.
Track which reference is used per project: Human: GRCh38/hg38 (prefer) or GRCh37/hg19 Mouse: GRCm39/mm39 or GRCm38/mm10 Mixing references = invalid results Store reference info in ~/bioinformatics/memory.md per project.
NEVER modify original FASTQ/BAM files: Work on copies Keep originals read-only Log every transformation step
Bioinformatics commands can consume massive resources: Check file sizes before operations Use streaming when possible (samtools view | ...) Estimate memory needs (BWA: ~6GB for human genome) Warn before operations >10 minutes
Every analysis must be reproducible: Log exact tool versions (samtools --version) Save command parameters Record input file checksums for critical analyses
Wrong chromosome naming — chr1 vs 1 causes silent failures. Check and convert with sed 's/^chr//' Unsorted BAM — Most tools expect sorted input. Symptoms: errors or wrong results with no warning Index missing — BAM needs .bai, VCF needs .tbi. Commands fail cryptically without them Memory exhaustion — Large BAM operations kill the session. Stream or use --threads wisely Stale indices — After modifying BAM/VCF, regenerate index. Old index = corrupt reads 0-based vs 1-based coordinates — BED is 0-based, VCF/GFF is 1-based. Off-by-one bugs are common
FormatPurposeKey ToolFASTAReference sequencessamtools faidxFASTQRaw reads + qualityseqtk, fastpSAM/BAMAligned readssamtoolsVCF/BCFVariantsbcftoolsBEDGenomic intervalsbedtoolsGFF/GTFGene annotationsgffreadBigWigCoverage tracksdeepTools
# FASTQ quality report fastqc sample.fastq.gz -o qc_reports/ # Trim adapters + low quality fastp -i R1.fq.gz -I R2.fq.gz -o R1.clean.fq.gz -O R2.clean.fq.gz # BAM statistics samtools flagstat aligned.bam samtools stats aligned.bam > stats.txt
# Index reference (once) bwa index reference.fa # Align paired-end reads bwa mem -t 8 reference.fa R1.fq.gz R2.fq.gz | \ samtools sort -o aligned.bam - # Index BAM samtools index aligned.bam
# Call variants bcftools mpileup -Ou -f reference.fa aligned.bam | \ bcftools call -mv -Oz -o variants.vcf.gz # Index VCF bcftools index variants.vcf.gz # Filter variants bcftools filter -s LowQual -e 'QUAL<20' variants.vcf.gz
# Extract region samtools view -b aligned.bam chr1:1000000-2000000 > region.bam # Convert BAM to FASTQ samtools fastq -1 R1.fq.gz -2 R2.fq.gz aligned.bam # Merge BAMs samtools merge merged.bam sample1.bam sample2.bam # Subset VCF by region bcftools view -r chr1:1000-2000 variants.vcf.gz
Data access: Only reads files user explicitly provides as input Writes outputs to directories user specifies Stores preferences in ~/bioinformatics/ (with consent) Data that stays local: All sequence data processed locally No external API calls for analysis Pipeline configs in ~/bioinformatics/ This skill does NOT: Upload sequence data anywhere Access files without explicit user instruction Infer or collect data beyond explicit inputs Make network requests during analysis Note: Installing tools (conda, brew) and downloading reference genomes requires internet access. These are user-initiated actions.
Install with clawhub install <slug> if user confirms: data-analysis — statistical interpretation statistics — hypothesis testing science — research methodology
If useful: clawhub star bioinformatics Stay updated: clawhub sync
Data access, storage, extraction, analysis, reporting, and insight generation.
Largest current source with strong distribution and engagement signals.