370 lines
8.8 KiB
Markdown
370 lines
8.8 KiB
Markdown
|
|
# Whisper Voice Memo Transcription for Obsidian
|
|||
|
|
|
|||
|
|
## Overview
|
|||
|
|
|
|||
|
|
A simple, free, local transcription setup that:
|
|||
|
|
- Uses OpenAI Whisper (large-v3 model) for high-quality transcription
|
|||
|
|
- Handles French Canadian accent and English seamlessly
|
|||
|
|
- Auto-detects language switches mid-sentence
|
|||
|
|
- Outputs formatted markdown notes to Obsidian
|
|||
|
|
- Runs via conda environment `test_env`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Configuration
|
|||
|
|
|
|||
|
|
| Setting | Value |
|
|||
|
|
|---------|-------|
|
|||
|
|
| Conda Environment | `test_env` |
|
|||
|
|
| Output Directory | `C:\Users\antoi\antoine\My Libraries\Antoine Brain Extension\+\Transcripts` |
|
|||
|
|
| Model | `openai/whisper-large-v3` |
|
|||
|
|
| Supported Formats | mp3, m4a, wav, ogg, flac, webm |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Installation
|
|||
|
|
|
|||
|
|
### Step 1: Activate conda environment
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
conda activate test_env
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 2: Install insanely-fast-whisper
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
pip install insanely-fast-whisper
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 3: Verify installation
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
insanely-fast-whisper --help
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Batch Script: Transcribe.bat
|
|||
|
|
|
|||
|
|
Save this file to your Desktop or a convenient location.
|
|||
|
|
|
|||
|
|
**File:** `Transcribe.bat`
|
|||
|
|
|
|||
|
|
```batch
|
|||
|
|
@echo off
|
|||
|
|
setlocal enabledelayedexpansion
|
|||
|
|
|
|||
|
|
:: ============================================
|
|||
|
|
:: CONFIGURATION - Edit these paths as needed
|
|||
|
|
:: ============================================
|
|||
|
|
set "OUTPUT_DIR=C:\Users\antoi\antoine\My Libraries\Antoine Brain Extension\+\Transcripts"
|
|||
|
|
set "CONDA_ENV=test_env"
|
|||
|
|
set "CONDA_PATH=C:\Users\antoi\anaconda3\Scripts\activate.bat"
|
|||
|
|
|
|||
|
|
:: ============================================
|
|||
|
|
:: MAIN SCRIPT - No edits needed below
|
|||
|
|
:: ============================================
|
|||
|
|
|
|||
|
|
:: Check if file was dragged onto script
|
|||
|
|
if "%~1"=="" (
|
|||
|
|
echo.
|
|||
|
|
echo ========================================
|
|||
|
|
echo Voice Memo Transcriber
|
|||
|
|
echo ========================================
|
|||
|
|
echo.
|
|||
|
|
echo Drag an audio file onto this script!
|
|||
|
|
echo Or paste the full path below:
|
|||
|
|
echo.
|
|||
|
|
set /p "AUDIO_FILE=File path: "
|
|||
|
|
) else (
|
|||
|
|
set "AUDIO_FILE=%~1"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
:: Generate timestamp for filename
|
|||
|
|
for /f "tokens=1-5 delims=/:.- " %%a in ("%date% %time%") do (
|
|||
|
|
set "TIMESTAMP=%%c-%%a-%%b %%d-%%e"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
set "NOTE_NAME=Voice Note %TIMESTAMP%.md"
|
|||
|
|
set "TEMP_FILE=%TEMP%\whisper_output.txt"
|
|||
|
|
|
|||
|
|
echo.
|
|||
|
|
echo ========================================
|
|||
|
|
echo Transcribing: %AUDIO_FILE%
|
|||
|
|
echo Output: %NOTE_NAME%
|
|||
|
|
echo ========================================
|
|||
|
|
echo.
|
|||
|
|
echo This may take a few minutes for long recordings...
|
|||
|
|
echo.
|
|||
|
|
|
|||
|
|
:: Activate conda environment and run whisper
|
|||
|
|
call %CONDA_PATH% %CONDA_ENV%
|
|||
|
|
insanely-fast-whisper --file-name "%AUDIO_FILE%" --transcript-path "%TEMP_FILE%" --model-name openai/whisper-large-v3
|
|||
|
|
|
|||
|
|
:: Check if transcription succeeded
|
|||
|
|
if not exist "%TEMP_FILE%" (
|
|||
|
|
echo.
|
|||
|
|
echo ERROR: Transcription failed!
|
|||
|
|
echo Check that the audio file exists and is valid.
|
|||
|
|
echo.
|
|||
|
|
pause
|
|||
|
|
exit /b 1
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
:: Create markdown note with YAML frontmatter
|
|||
|
|
echo --- > "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo created: %date% %time:~0,5% >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo type: voice-note >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo status: raw >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo tags: >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo - transcript >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo - voice-memo >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo --- >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo # Voice Note - %date% at %time:~0,5% >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo ## Metadata >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo - **Source file:** `%~nx1` >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo - **Transcribed:** %date% %time:~0,5% >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo --- >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo ## Raw Transcript >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
type "%TEMP_FILE%" >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo --- >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo ## Notes distillees >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo ^<!-- Coller le transcript dans Claude pour organiser et distiller --^> >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
echo. >> "%OUTPUT_DIR%\%NOTE_NAME%"
|
|||
|
|
|
|||
|
|
:: Cleanup temp file
|
|||
|
|
del "%TEMP_FILE%" 2>nul
|
|||
|
|
|
|||
|
|
echo.
|
|||
|
|
echo ========================================
|
|||
|
|
echo DONE!
|
|||
|
|
echo Created: %NOTE_NAME%
|
|||
|
|
echo Location: %OUTPUT_DIR%
|
|||
|
|
echo ========================================
|
|||
|
|
echo.
|
|||
|
|
pause
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
|
|||
|
|
### Method 1: Drag and Drop
|
|||
|
|
1. Record your voice memo (any app)
|
|||
|
|
2. Drag the audio file onto `Transcribe.bat`
|
|||
|
|
3. Wait for transcription (few minutes for 30min audio)
|
|||
|
|
4. Find your note in Obsidian
|
|||
|
|
|
|||
|
|
### Method 2: Double-click and Paste Path
|
|||
|
|
1. Double-click `Transcribe.bat`
|
|||
|
|
2. Paste the full path to your audio file
|
|||
|
|
3. Press Enter
|
|||
|
|
4. Wait for transcription
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Output Format
|
|||
|
|
|
|||
|
|
Each transcription creates a markdown file like this:
|
|||
|
|
|
|||
|
|
```markdown
|
|||
|
|
---
|
|||
|
|
created: 2026-01-15 14:30
|
|||
|
|
type: voice-note
|
|||
|
|
status: raw
|
|||
|
|
tags:
|
|||
|
|
- transcript
|
|||
|
|
- voice-memo
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Voice Note - 2026-01-15 at 14:30
|
|||
|
|
|
|||
|
|
## Metadata
|
|||
|
|
|
|||
|
|
- **Source file:** `recording.m4a`
|
|||
|
|
- **Transcribed:** 2026-01-15 14:30
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Raw Transcript
|
|||
|
|
|
|||
|
|
[Your transcribed text appears here...]
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Notes distillees
|
|||
|
|
|
|||
|
|
<!-- Coller le transcript dans Claude pour organiser et distiller -->
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Processing with Claude
|
|||
|
|
|
|||
|
|
After transcription, use this prompt template to organize your notes:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Voici un transcript de notes vocales en français/anglais.
|
|||
|
|
Peux-tu:
|
|||
|
|
|
|||
|
|
1. Corriger les erreurs de transcription évidentes
|
|||
|
|
2. Organiser par thèmes/sujets
|
|||
|
|
3. Extraire les points clés et action items
|
|||
|
|
4. Reformatter en notes structurées
|
|||
|
|
|
|||
|
|
Garde le contenu original mais rends-le plus lisible.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
[COLLER LE TRANSCRIPT ICI]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Troubleshooting
|
|||
|
|
|
|||
|
|
### "conda is not recognized"
|
|||
|
|
- Verify conda path: `where conda`
|
|||
|
|
- Update `CONDA_PATH` in the script to match your installation
|
|||
|
|
|
|||
|
|
### Transcription takes too long
|
|||
|
|
- The `large-v3` model is accurate but slow on CPU
|
|||
|
|
- For faster (less accurate) results, change model to:
|
|||
|
|
```
|
|||
|
|
--model-name openai/whisper-medium
|
|||
|
|
```
|
|||
|
|
or
|
|||
|
|
```
|
|||
|
|
--model-name openai/whisper-small
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### GPU acceleration
|
|||
|
|
If you have an NVIDIA GPU, install CUDA support:
|
|||
|
|
```bash
|
|||
|
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Wrong language detected
|
|||
|
|
Add language hint to the whisper command:
|
|||
|
|
```bash
|
|||
|
|
insanely-fast-whisper --file-name "audio.mp3" --transcript-path "output.txt" --model-name openai/whisper-large-v3 --language fr
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Alternative: Python Script Version
|
|||
|
|
|
|||
|
|
For more control or integration with other tools:
|
|||
|
|
|
|||
|
|
**File:** `transcribe.py`
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import subprocess
|
|||
|
|
import sys
|
|||
|
|
from datetime import datetime
|
|||
|
|
from pathlib import Path
|
|||
|
|
|
|||
|
|
# Configuration
|
|||
|
|
OUTPUT_DIR = Path(r"C:\Users\antoi\antoine\My Libraries\Antoine Brain Extension\+\Transcripts")
|
|||
|
|
MODEL = "openai/whisper-large-v3"
|
|||
|
|
|
|||
|
|
def transcribe(audio_path: str):
|
|||
|
|
audio_file = Path(audio_path)
|
|||
|
|
timestamp = datetime.now().strftime("%Y-%m-%d %H-%M")
|
|||
|
|
note_name = f"Voice Note {timestamp}.md"
|
|||
|
|
temp_file = Path.home() / "AppData/Local/Temp/whisper_output.txt"
|
|||
|
|
|
|||
|
|
print(f"\n🎙️ Transcribing: {audio_file.name}")
|
|||
|
|
print(f"📝 Output: {note_name}\n")
|
|||
|
|
|
|||
|
|
# Run whisper
|
|||
|
|
subprocess.run([
|
|||
|
|
"insanely-fast-whisper",
|
|||
|
|
"--file-name", str(audio_file),
|
|||
|
|
"--transcript-path", str(temp_file),
|
|||
|
|
"--model-name", MODEL
|
|||
|
|
])
|
|||
|
|
|
|||
|
|
# Read transcript
|
|||
|
|
transcript = temp_file.read_text(encoding="utf-8")
|
|||
|
|
|
|||
|
|
# Create markdown note
|
|||
|
|
note_content = f"""---
|
|||
|
|
created: {datetime.now().strftime("%Y-%m-%d %H:%M")}
|
|||
|
|
type: voice-note
|
|||
|
|
status: raw
|
|||
|
|
tags:
|
|||
|
|
- transcript
|
|||
|
|
- voice-memo
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Voice Note - {datetime.now().strftime("%Y-%m-%d")} at {datetime.now().strftime("%H:%M")}
|
|||
|
|
|
|||
|
|
## Metadata
|
|||
|
|
|
|||
|
|
- **Source file:** `{audio_file.name}`
|
|||
|
|
- **Transcribed:** {datetime.now().strftime("%Y-%m-%d %H:%M")}
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Raw Transcript
|
|||
|
|
|
|||
|
|
{transcript}
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Notes distillees
|
|||
|
|
|
|||
|
|
<!-- Coller le transcript dans Claude pour organiser et distiller -->
|
|||
|
|
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
output_path = OUTPUT_DIR / note_name
|
|||
|
|
output_path.write_text(note_content, encoding="utf-8")
|
|||
|
|
|
|||
|
|
print(f"\n✅ Done! Created: {note_name}")
|
|||
|
|
print(f"📁 Location: {OUTPUT_DIR}")
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
if len(sys.argv) > 1:
|
|||
|
|
transcribe(sys.argv[1])
|
|||
|
|
else:
|
|||
|
|
audio = input("Enter audio file path: ").strip('"')
|
|||
|
|
transcribe(audio)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Run with:
|
|||
|
|
```bash
|
|||
|
|
conda activate test_env
|
|||
|
|
python transcribe.py "path/to/audio.mp3"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Next Steps
|
|||
|
|
|
|||
|
|
- [ ] Install `insanely-fast-whisper` in `test_env`
|
|||
|
|
- [ ] Save `Transcribe.bat` to Desktop
|
|||
|
|
- [ ] Test with a short audio clip
|
|||
|
|
- [ ] Pin to taskbar for quick access
|
|||
|
|
- [ ] Set up Claude prompt template for processing
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Resources
|
|||
|
|
|
|||
|
|
- [insanely-fast-whisper GitHub](https://github.com/Vaibhavs10/insanely-fast-whisper)
|
|||
|
|
- [OpenAI Whisper](https://github.com/openai/whisper)
|
|||
|
|
- [Whisper model comparison](https://github.com/openai/whisper#available-models-and-languages)
|