Requirements
- Target platform
- OpenClaw
- Install method
- Manual import
- Extraction
- Extract archive
- Prerequisites
- OpenClaw
- Primary doc
- SKILL.md
Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.
This skill provides a complete audio-to-MIDI conversion pipeline that: Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output
Audio Input (WAV/M4A/MP3) β βββββββββββββββββββββββββββββββββββββββ β Step 1: Stem Separation (HPSS) β β - Isolate harmonic content β β - Remove drums/percussion β β - Noise gating β βββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββ β Step 2: Pitch Detection β β - Basic Pitch ML model (Spotify) β β - Polyphonic note detection β β - Onset/offset estimation β βββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββ β Step 3: Analysis β β - Pitch class distribution β β - Key detection β β - Dominant note identification β βββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββ β Step 4: Quantization & Cleanup β β - Timing grid snap β β - Key-aware pitch correction β β - Octave pruning (harmonic removal) β β - Overlap-based pruning β β - Note merging (legato) β β - Velocity normalization β βββββββββββββββββββββββββββββββββββββββ β MIDI Output (Standard MIDI File)
Python 3.11+ (Python 3.14+ recommended) FFmpeg (for audio format support) pip
Quick Install (Recommended): cd /path/to/voice-note-to-midi ./setup.sh This automated script will: Check Python 3.11+ is installed Create the ~/melody-pipeline directory Set up the virtual environment Install all dependencies (basic-pitch, librosa, music21, etc.) Download and configure the hum2midi script Add melody-pipeline to your PATH Manual Install: If you prefer manual setup: mkdir -p ~/melody-pipeline cd ~/melody-pipeline python3 -m venv venv-bp source venv-bp/bin/activate pip install basic-pitch librosa soundfile mido music21 chmod +x ~/melody-pipeline/hum2midi Add to your PATH (optional): echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc source ~/.bashrc
cd ~/melody-pipeline ./hum2midi --help
Convert a voice memo to MIDI: ./hum2midi my_humming.wav This creates my_humming.mid with 16th-note quantization.
./hum2midi input.wav output.mid
OptionDescriptionDefault--grid <value>Quantization grid: 1/4, 1/8, 1/16, 1/321/16--min-note <ms>Minimum note duration in milliseconds50--no-quantizeSkip quantization (output raw Basic Pitch MIDI)disabled--key-awareEnable key-aware pitch correctiondisabled--no-analysisSkip pitch analysis and key detectiondisabled
Quantize to eighth notes ./hum2midi melody.wav --grid 1/8 Key-aware quantization (recommended for tonal music) ./hum2midi song.wav --key-aware Require longer minimum notes ./hum2midi humming.wav --min-note 100 Skip analysis for faster processing ./hum2midi quick.wav --no-analysis Combine options ./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80
You can also process existing MIDI files through the quantization pipeline: ./hum2midi input.mid output.mid --grid 1/16 --key-aware This skips the audio processing steps and goes directly to analysis and quantization.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ hum2midi - Melody-to-MIDI Pipeline (Basic Pitch Edition) [Key-Aware Mode Enabled] βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Input: my_humming.wav Output: my_humming.mid β Step 1: Stem Separation (HPSS) Isolating melodic content... Loaded: 5.23s @ 44100Hz β Melody stem extracted β 5.23s β Step 2: Audio-to-MIDI Conversion (Basic Pitch) Running Spotify's Basic Pitch ML model on melody stem... β Raw MIDI generated (Basic Pitch) β Step 3: Pitch Analysis & Key Detection Notes detected: 42 total, 7 unique Note range: C3 - G4 Pitch classes: C3, E3, G3, A3, C4, D4, G4 Dominant note: G3 (23.8% of notes) Detected key: G major β Step 4: Quantization & Cleanup Octave pruning: removed 3 harmonic notes above 67 (median+12) Overlap pruning: removed 2 harmonic notes at overlapping positions Note merging: merged 5 staccato chunks into legato notes (gap<=60 ticks) Grid: 240 ticks (1/16) Notes: 38 notes Key: G major Key-aware: 2 notes corrected to scale Tempo: 120 BPM β Quantized MIDI saved βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Done! Output: my_humming.mid βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ π ANALYSIS SUMMARY βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Detected Notes: C3, E3, G3, A3, C4, D4, G4 Detected Key: G major Quantization: Key-aware mode (notes snapped to scale) MIDI Info: 38 notes, 7 unique pitches, 120 BPM Pitches: C3, E3, G3, A3, C4, D4, G4
Clear, loud melody produces the best results Background noise can cause false note detection Reverb and effects may confuse pitch detection Close-mic'd vocals work significantly better than room recordings
Monophonic sources work best (single melody line) Polyphonic audio (chords, multiple instruments) will produce messy results Vibrato and pitch bends may be quantized to stepped pitches Rapid note passages may be missed or merged
Tempo is fixed at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW) Note velocities are normalized but may need manual adjustment Very short notes (<50ms) may be filtered out by default Extreme pitch ranges may cause octave detection issues
After generating MIDI, you may want to: Import into your DAW and adjust tempo to match your original recording Quantize further if stricter timing is needed Adjust note velocities for dynamics Apply swing/groove templates if the rigid grid sounds too mechanical Edit individual notes that were misdetected (common with fast runs)
Input formats supported via FFmpeg: WAV, AIFF, FLAC (uncompressed, best quality) MP3, M4A, AAC (compressed, acceptable) OGG, OPUS (open source formats) Most other formats FFmpeg supports
Check that input file isn't silent or corrupted Try increasing --min-note threshold Verify audio has clear melodic content (not just noise)
Enable octave pruning and overlap pruning (on by default) Use --key-aware to constrain to musical scale Check for background noise in source audio
Key detection works best with at least 8-10 measures of music Chromatic passages may confuse the detector Manually review and adjust in your DAW if needed
Basic Pitch sometimes detects harmonics instead of fundamentals The pipeline includes pruning, but some may slip through Use your DAW's transpose function for simple octave shifts
Basic Pitch - Spotify's polyphonic pitch detection model librosa HPSS - Harmonic-Percussive Source Separation Krumhansl-Kessler Key Profiles - Key detection algorithm
This skill integrates Basic Pitch by Spotify, which is licensed under Apache 2.0. The pipeline script and documentation are provided under MIT license.
Agent frameworks, memory systems, reasoning layers, and model-native orchestration.
Largest current source with strong distribution and engagement signals.