# Send Voice Note To Midi to your agent
Hand the extracted package to your coding agent with a concrete install brief instead of figuring it out manually.
## Fast path
- Download the package from Yavira.
- Extract it into a folder your agent can access.
- Paste one of the prompts below and point your agent at the extracted folder.
## Suggested prompts
### New install

```text
I downloaded a skill package from Yavira. Read SKILL.md from the extracted folder and install it by following the included instructions. Then review README.md for any prerequisites, environment setup, or post-install checks. Tell me what you changed and call out any manual steps you could not complete.
```
### Upgrade existing

```text
I downloaded an updated skill package from Yavira. Read SKILL.md from the extracted folder, compare it with my current installation, and upgrade it while preserving any custom configuration unless the package docs explicitly say otherwise. Then review README.md for any prerequisites, environment setup, or post-install checks. Summarize what changed and any follow-up checks I should run.
```
## Machine-readable fields
```json
{
  "schemaVersion": "1.0",
  "item": {
    "slug": "voice-note-to-midi",
    "name": "Voice Note To Midi",
    "source": "tencent",
    "type": "skill",
    "category": "AI 智能",
    "sourceUrl": "https://clawhub.ai/DanBennettUK/voice-note-to-midi",
    "canonicalUrl": "https://clawhub.ai/DanBennettUK/voice-note-to-midi",
    "targetPlatform": "OpenClaw"
  },
  "install": {
    "downloadUrl": "/downloads/voice-note-to-midi",
    "sourceDownloadUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=voice-note-to-midi",
    "sourcePlatform": "tencent",
    "targetPlatform": "OpenClaw",
    "packageFormat": "ZIP package",
    "primaryDoc": "SKILL.md",
    "includedAssets": [
      "QUICKSTART.md",
      "README.md",
      "SKILL.md",
      "setup.sh"
    ],
    "downloadMode": "redirect",
    "sourceHealth": {
      "source": "tencent",
      "slug": "voice-note-to-midi",
      "status": "healthy",
      "reason": "direct_download_ok",
      "recommendedAction": "download",
      "checkedAt": "2026-05-04T10:33:54.540Z",
      "expiresAt": "2026-05-11T10:33:54.540Z",
      "httpStatus": 200,
      "finalUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=voice-note-to-midi",
      "contentType": "application/zip",
      "probeMethod": "head",
      "details": {
        "probeUrl": "https://wry-manatee-359.convex.site/api/v1/download?slug=voice-note-to-midi",
        "contentDisposition": "attachment; filename=\"voice-note-to-midi-0.1.0.zip\"",
        "redirectLocation": null,
        "bodySnippet": null,
        "slug": "voice-note-to-midi"
      },
      "scope": "item",
      "summary": "Item download looks usable.",
      "detail": "Yavira can redirect you to the upstream package for this item.",
      "primaryActionLabel": "Download for OpenClaw",
      "primaryActionHref": "/downloads/voice-note-to-midi"
    },
    "validation": {
      "installChecklist": [
        "Use the Yavira download entry.",
        "Review SKILL.md after the package is downloaded.",
        "Confirm the extracted package contains the expected setup assets."
      ],
      "postInstallChecks": [
        "Confirm the extracted package includes the expected docs or setup files.",
        "Validate the skill or prompts are available in your target agent workspace.",
        "Capture any manual follow-up steps the agent could not complete."
      ]
    }
  },
  "links": {
    "detailUrl": "https://openagent3.xyz/skills/voice-note-to-midi",
    "downloadUrl": "https://openagent3.xyz/downloads/voice-note-to-midi",
    "agentUrl": "https://openagent3.xyz/skills/voice-note-to-midi/agent",
    "manifestUrl": "https://openagent3.xyz/skills/voice-note-to-midi/agent.json",
    "briefUrl": "https://openagent3.xyz/skills/voice-note-to-midi/agent.md"
  }
}
```
## Documentation

### 🎵 Voice Note to MIDI

Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.

### What It Does

This skill provides a complete audio-to-MIDI conversion pipeline that:

Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds
ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction
Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles
Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction
Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output

### Pipeline Architecture

Audio Input (WAV/M4A/MP3)
    ↓
┌─────────────────────────────────────┐
│ Step 1: Stem Separation (HPSS)     │
│ - Isolate harmonic content          │
│ - Remove drums/percussion           │
│ - Noise gating                      │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Step 2: Pitch Detection             │
│ - Basic Pitch ML model (Spotify)    │
│ - Polyphonic note detection         │
│ - Onset/offset estimation           │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Step 3: Analysis                    │
│ - Pitch class distribution          │
│ - Key detection                     │
│ - Dominant note identification      │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│ Step 4: Quantization & Cleanup      │
│ - Timing grid snap                  │
│ - Key-aware pitch correction        │
│ - Octave pruning (harmonic removal) │
│ - Overlap-based pruning             │
│ - Note merging (legato)             │
│ - Velocity normalization            │
└─────────────────────────────────────┘
    ↓
MIDI Output (Standard MIDI File)

### Prerequisites

Python 3.11+ (Python 3.14+ recommended)
FFmpeg (for audio format support)
pip

### Installation

Quick Install (Recommended):

cd /path/to/voice-note-to-midi
./setup.sh

This automated script will:

Check Python 3.11+ is installed
Create the ~/melody-pipeline directory
Set up the virtual environment
Install all dependencies (basic-pitch, librosa, music21, etc.)
Download and configure the hum2midi script
Add melody-pipeline to your PATH

Manual Install:

If you prefer manual setup:

mkdir -p ~/melody-pipeline
cd ~/melody-pipeline
python3 -m venv venv-bp
source venv-bp/bin/activate
pip install basic-pitch librosa soundfile mido music21
chmod +x ~/melody-pipeline/hum2midi

Add to your PATH (optional):

echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc
source ~/.bashrc

### Verify Installation

cd ~/melody-pipeline
./hum2midi --help

### Basic Usage

Convert a voice memo to MIDI:

./hum2midi my_humming.wav

This creates my_humming.mid with 16th-note quantization.

### Specify Output File

./hum2midi input.wav output.mid

### Command-Line Options

OptionDescriptionDefault--grid <value>Quantization grid: 1/4, 1/8, 1/16, 1/321/16--min-note <ms>Minimum note duration in milliseconds50--no-quantizeSkip quantization (output raw Basic Pitch MIDI)disabled--key-awareEnable key-aware pitch correctiondisabled--no-analysisSkip pitch analysis and key detectiondisabled

### Usage Examples

Quantize to eighth notes

./hum2midi melody.wav --grid 1/8

Key-aware quantization (recommended for tonal music)

./hum2midi song.wav --key-aware

Require longer minimum notes

./hum2midi humming.wav --min-note 100

Skip analysis for faster processing

./hum2midi quick.wav --no-analysis

Combine options

./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80

### Processing MIDI Input

You can also process existing MIDI files through the quantization pipeline:

./hum2midi input.mid output.mid --grid 1/16 --key-aware

This skips the audio processing steps and goes directly to analysis and quantization.

### Sample Output

═══════════════════════════════════════════════════════════════
  hum2midi - Melody-to-MIDI Pipeline (Basic Pitch Edition)
  [Key-Aware Mode Enabled]
═══════════════════════════════════════════════════════════════

Input:  my_humming.wav
Output: my_humming.mid

→ Step 1: Stem Separation (HPSS)
  Isolating melodic content...
  Loaded: 5.23s @ 44100Hz
  ✓ Melody stem extracted → 5.23s

→ Step 2: Audio-to-MIDI Conversion (Basic Pitch)
  Running Spotify's Basic Pitch ML model on melody stem...
  ✓ Raw MIDI generated (Basic Pitch)

→ Step 3: Pitch Analysis & Key Detection
  Notes detected: 42 total, 7 unique
  Note range: C3 - G4
  Pitch classes: C3, E3, G3, A3, C4, D4, G4
  Dominant note: G3 (23.8% of notes)
  Detected key: G major

→ Step 4: Quantization & Cleanup
  Octave pruning: removed 3 harmonic notes above 67 (median+12)
  Overlap pruning: removed 2 harmonic notes at overlapping positions
  Note merging: merged 5 staccato chunks into legato notes (gap<=60 ticks)
  Grid:   240 ticks (1/16)
  Notes:  38 notes
  Key:    G major
  Key-aware: 2 notes corrected to scale
  Tempo:  120 BPM
  ✓ Quantized MIDI saved

═══════════════════════════════════════════════════════════════
  ✓ Done! Output: my_humming.mid
═══════════════════════════════════════════════════════════════

📊 ANALYSIS SUMMARY
─────────────────────────────────────────────────────────────
  Detected Notes: C3, E3, G3, A3, C4, D4, G4
  Detected Key:   G major
  Quantization:   Key-aware mode (notes snapped to scale)

MIDI Info: 38 notes, 7 unique pitches, 120 BPM
Pitches: C3, E3, G3, A3, C4, D4, G4

### Audio Quality Matters

Clear, loud melody produces the best results
Background noise can cause false note detection
Reverb and effects may confuse pitch detection
Close-mic'd vocals work significantly better than room recordings

### Musical Considerations

Monophonic sources work best (single melody line)
Polyphonic audio (chords, multiple instruments) will produce messy results
Vibrato and pitch bends may be quantized to stepped pitches
Rapid note passages may be missed or merged

### Technical Limitations

Tempo is fixed at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW)
Note velocities are normalized but may need manual adjustment
Very short notes (<50ms) may be filtered out by default
Extreme pitch ranges may cause octave detection issues

### Post-Processing Recommendations

After generating MIDI, you may want to:

Import into your DAW and adjust tempo to match your original recording
Quantize further if stricter timing is needed
Adjust note velocities for dynamics
Apply swing/groove templates if the rigid grid sounds too mechanical
Edit individual notes that were misdetected (common with fast runs)

### Supported Audio Formats

Input formats supported via FFmpeg:

WAV, AIFF, FLAC (uncompressed, best quality)
MP3, M4A, AAC (compressed, acceptable)
OGG, OPUS (open source formats)
Most other formats FFmpeg supports

### No notes detected

Check that input file isn't silent or corrupted
Try increasing --min-note threshold
Verify audio has clear melodic content (not just noise)

### Too many notes / messy output

Enable octave pruning and overlap pruning (on by default)
Use --key-aware to constrain to musical scale
Check for background noise in source audio

### Wrong key detected

Key detection works best with at least 8-10 measures of music
Chromatic passages may confuse the detector
Manually review and adjust in your DAW if needed

### Notes in wrong octave

Basic Pitch sometimes detects harmonics instead of fundamentals
The pipeline includes pruning, but some may slip through
Use your DAW's transpose function for simple octave shifts

### References

Basic Pitch - Spotify's polyphonic pitch detection model
librosa HPSS - Harmonic-Percussive Source Separation
Krumhansl-Kessler Key Profiles - Key detection algorithm

### License

This skill integrates Basic Pitch by Spotify, which is licensed under Apache 2.0. The pipeline script and documentation are provided under MIT license.
## Trust
- Source: tencent
- Verification: Indexed source record
- Publisher: DanBennettUK
- Version: 0.1.0
## Source health
- Status: healthy
- Item download looks usable.
- Yavira can redirect you to the upstream package for this item.
- Health scope: item
- Reason: direct_download_ok
- Checked at: 2026-05-04T10:33:54.540Z
- Expires at: 2026-05-11T10:33:54.540Z
- Recommended action: Download for OpenClaw
## Links
- [Detail page](https://openagent3.xyz/skills/voice-note-to-midi)
- [Send to Agent page](https://openagent3.xyz/skills/voice-note-to-midi/agent)
- [JSON manifest](https://openagent3.xyz/skills/voice-note-to-midi/agent.json)
- [Markdown brief](https://openagent3.xyz/skills/voice-note-to-midi/agent.md)
- [Download page](https://openagent3.xyz/downloads/voice-note-to-midi)