257 lines
5.7 KiB
Markdown
257 lines
5.7 KiB
Markdown
|
|
# Voice Recorder Android Port - Brainstorm
|
||
|
|
|
||
|
|
## Current App Features to Port
|
||
|
|
|
||
|
|
1. **Audio Recording** - Record with pause/resume
|
||
|
|
2. **Whisper Transcription** - Local AI transcription
|
||
|
|
3. **Note Type Selection** - Meeting, Todo, Idea, Review, Journal
|
||
|
|
4. **Obsidian Export** - Markdown files with YAML frontmatter
|
||
|
|
5. **Claude Processing** - AI-powered note organization
|
||
|
|
6. **Folder Organization** - Auto-sort by note type
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Approach Options
|
||
|
|
|
||
|
|
### Option 1: React Native + Expo (Recommended)
|
||
|
|
|
||
|
|
**Pros:**
|
||
|
|
- Cross-platform (iOS + Android)
|
||
|
|
- Large ecosystem, good documentation
|
||
|
|
- Hot reload for fast development
|
||
|
|
- Can use `expo-av` for audio recording
|
||
|
|
- Good integration with file systems
|
||
|
|
|
||
|
|
**Cons:**
|
||
|
|
- Whisper would need cloud API or native module
|
||
|
|
- Claude CLI not available, would need API
|
||
|
|
|
||
|
|
**Stack:**
|
||
|
|
```
|
||
|
|
- React Native / Expo
|
||
|
|
- expo-av (recording)
|
||
|
|
- expo-file-system (file management)
|
||
|
|
- OpenAI Whisper API or whisper.cpp native module
|
||
|
|
- Claude API (not CLI)
|
||
|
|
- Obsidian sync via shared folder or plugin
|
||
|
|
```
|
||
|
|
|
||
|
|
**Effort:** Medium (2-3 weeks for MVP)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Option 2: Flutter
|
||
|
|
|
||
|
|
**Pros:**
|
||
|
|
- Single codebase for Android/iOS
|
||
|
|
- Fast performance with native compilation
|
||
|
|
- Good audio packages (record, just_audio)
|
||
|
|
- Material Design 3 built-in
|
||
|
|
|
||
|
|
**Cons:**
|
||
|
|
- Dart learning curve
|
||
|
|
- Whisper integration more complex
|
||
|
|
- Smaller ecosystem than React Native
|
||
|
|
|
||
|
|
**Stack:**
|
||
|
|
```
|
||
|
|
- Flutter / Dart
|
||
|
|
- record package (audio)
|
||
|
|
- whisper_flutter or cloud API
|
||
|
|
- Claude API
|
||
|
|
- path_provider for file storage
|
||
|
|
```
|
||
|
|
|
||
|
|
**Effort:** Medium-High (3-4 weeks for MVP)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Option 3: Native Kotlin (Android Only)
|
||
|
|
|
||
|
|
**Pros:**
|
||
|
|
- Best performance
|
||
|
|
- Full Android API access
|
||
|
|
- Can integrate whisper.cpp directly
|
||
|
|
- Better battery optimization
|
||
|
|
- Works offline
|
||
|
|
|
||
|
|
**Cons:**
|
||
|
|
- Android only (no iOS)
|
||
|
|
- More code to maintain
|
||
|
|
- Longer development time
|
||
|
|
|
||
|
|
**Stack:**
|
||
|
|
```
|
||
|
|
- Kotlin + Jetpack Compose
|
||
|
|
- MediaRecorder API
|
||
|
|
- whisper.cpp via JNI (local transcription)
|
||
|
|
- Claude API
|
||
|
|
- Storage Access Framework for Obsidian folder
|
||
|
|
```
|
||
|
|
|
||
|
|
**Effort:** High (4-6 weeks for MVP)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Option 4: PWA (Progressive Web App)
|
||
|
|
|
||
|
|
**Pros:**
|
||
|
|
- Works on any device with browser
|
||
|
|
- No app store needed
|
||
|
|
- Shared codebase with potential web app
|
||
|
|
- Easy updates
|
||
|
|
|
||
|
|
**Cons:**
|
||
|
|
- Limited audio recording capabilities
|
||
|
|
- No background processing
|
||
|
|
- Can't access file system directly
|
||
|
|
- Requires internet for Whisper
|
||
|
|
|
||
|
|
**Stack:**
|
||
|
|
```
|
||
|
|
- Vue.js or React
|
||
|
|
- MediaRecorder Web API
|
||
|
|
- Whisper API (cloud)
|
||
|
|
- Claude API
|
||
|
|
- Download files or sync via Obsidian plugin
|
||
|
|
```
|
||
|
|
|
||
|
|
**Effort:** Low-Medium (1-2 weeks for MVP)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Whisper Integration Options
|
||
|
|
|
||
|
|
### Cloud-based (Easier)
|
||
|
|
1. **OpenAI Whisper API** - $0.006/min, reliable
|
||
|
|
2. **Replicate** - Pay per use, hosted models
|
||
|
|
3. **Self-hosted** - Run whisper on home server/NAS
|
||
|
|
|
||
|
|
### On-device (Harder but offline)
|
||
|
|
1. **whisper.cpp** - C++ port, works on Android via JNI
|
||
|
|
2. **whisper-android** - Pre-built Android bindings
|
||
|
|
3. **ONNX Runtime** - Run whisper.onnx model
|
||
|
|
|
||
|
|
**Recommendation:** Start with OpenAI API, add offline later
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Obsidian Sync Options
|
||
|
|
|
||
|
|
### Option A: Direct File Access
|
||
|
|
- Use Android's Storage Access Framework
|
||
|
|
- User grants access to Obsidian vault folder
|
||
|
|
- Write markdown files directly
|
||
|
|
- Works with Obsidian Sync, Syncthing, etc.
|
||
|
|
|
||
|
|
### Option B: Obsidian Plugin
|
||
|
|
- Create companion plugin for Obsidian
|
||
|
|
- App sends notes via local HTTP server
|
||
|
|
- Plugin receives and saves notes
|
||
|
|
- More complex but cleaner UX
|
||
|
|
|
||
|
|
### Option C: Share Intent
|
||
|
|
- Use Android share functionality
|
||
|
|
- Share transcribed note to Obsidian
|
||
|
|
- User manually saves
|
||
|
|
- Simplest but requires user action
|
||
|
|
|
||
|
|
**Recommendation:** Option A (direct file access)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Recommended MVP Approach
|
||
|
|
|
||
|
|
### Phase 1: Core Recording (Week 1)
|
||
|
|
- React Native + Expo setup
|
||
|
|
- Basic UI matching desktop app style
|
||
|
|
- Audio recording with pause/resume
|
||
|
|
- Timer display
|
||
|
|
- Note type selection
|
||
|
|
|
||
|
|
### Phase 2: Transcription (Week 2)
|
||
|
|
- OpenAI Whisper API integration
|
||
|
|
- Loading states and error handling
|
||
|
|
- Transcript preview
|
||
|
|
|
||
|
|
### Phase 3: Export & Processing (Week 3)
|
||
|
|
- File system access setup
|
||
|
|
- Markdown generation
|
||
|
|
- Claude API integration
|
||
|
|
- Folder organization
|
||
|
|
|
||
|
|
### Phase 4: Polish (Week 4)
|
||
|
|
- Offline queue for transcription
|
||
|
|
- Settings screen
|
||
|
|
- Obsidian folder picker
|
||
|
|
- Widget for quick recording
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Technical Considerations for Pixel 7
|
||
|
|
|
||
|
|
### Hardware Advantages
|
||
|
|
- Tensor G2 chip - could run small whisper models
|
||
|
|
- Good microphone array
|
||
|
|
- Large battery
|
||
|
|
|
||
|
|
### Android-Specific Features
|
||
|
|
- Material You theming
|
||
|
|
- Quick Settings tile
|
||
|
|
- Home screen widget
|
||
|
|
- Voice Assistant integration potential
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternative: Termux + Python
|
||
|
|
|
||
|
|
For a quick hack without building a full app:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Install Termux from F-Droid
|
||
|
|
pkg install python
|
||
|
|
pip install openai-whisper sounddevice
|
||
|
|
|
||
|
|
# Run existing Python script (modified)
|
||
|
|
python voice_recorder_android.py
|
||
|
|
```
|
||
|
|
|
||
|
|
**Pros:** Reuse existing code, fast to test
|
||
|
|
**Cons:** Requires Termux, not user-friendly
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Decision Matrix
|
||
|
|
|
||
|
|
| Criteria | React Native | Flutter | Kotlin | PWA |
|
||
|
|
|----------|-------------|---------|--------|-----|
|
||
|
|
| Dev Speed | Fast | Medium | Slow | Fastest |
|
||
|
|
| Performance | Good | Great | Best | OK |
|
||
|
|
| Offline | Possible | Possible | Yes | No |
|
||
|
|
| iOS Support | Yes | Yes | No | Yes |
|
||
|
|
| Learning Curve | Low | Medium | Medium | Low |
|
||
|
|
| Maintenance | Easy | Easy | More | Easy |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Recommended Path
|
||
|
|
|
||
|
|
1. **Start with React Native + Expo** for fastest MVP
|
||
|
|
2. **Use OpenAI Whisper API** initially
|
||
|
|
3. **Direct file access** to Obsidian vault
|
||
|
|
4. **Claude API** (not CLI) for processing
|
||
|
|
5. **Add offline whisper.cpp** later if needed
|
||
|
|
|
||
|
|
This approach gets a working app fastest while leaving room for optimization.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
- [ ] Set up React Native + Expo project
|
||
|
|
- [ ] Design mobile UI mockups
|
||
|
|
- [ ] Get OpenAI API key for Whisper
|
||
|
|
- [ ] Get Claude API key
|
||
|
|
- [ ] Test file system access on Pixel 7
|
||
|
|
- [ ] Create basic recording prototype
|