Initial project scaffold - Phase 1 MVP structure

Core modules:
- cli.py: Command-line interface with Click
- pipeline.py: Main orchestrator
- video_processor.py: Frame extraction with ffmpeg
- audio_analyzer.py: Whisper transcription
- vision_analyzer.py: Component detection (placeholder)
- doc_generator.py: Markdown + PDF output

Also includes:
- pyproject.toml with uv/hatch config
- Prompts for AI analysis
- Basic tests
- ROADMAP.md with 4-week plan
This commit is contained in:
Mario Lavoie
2026-01-27 20:05:34 +00:00
parent 621234cbdf
commit 1e94a98e5b
16 changed files with 1062 additions and 1 deletions

47
.gitignore vendored Normal file
View File

@@ -0,0 +1,47 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# Virtual environments
.venv/
venv/
ENV/
# IDE
.idea/
.vscode/
*.swp
*.swo
# Testing
.pytest_cache/
.coverage
htmlcov/
# Output
*_docs/
output/
frames/
*.wav
# OS
.DS_Store
Thumbs.db

View File

@@ -1,3 +1,93 @@
# CAD-Documenter
Video walkthrough → Complete engineering documentation
**One video → Complete engineering documentation.**
Transform video walkthroughs of CAD models into comprehensive, structured documentation — ready for CDRs, FEA setups, and integration with the Atomaste engineering ecosystem.
## The Problem
- Documentation is tedious — Engineers spend hours documenting CAD models manually
- Knowledge lives in heads — Verbal explanations during reviews aren't captured
- CDR prep is painful — Gathering images, writing descriptions, creating BOMs
- FEA setup requires context — Atomizer needs model understanding that's often verbal
## The Solution
### Input
- 📹 Video of engineer explaining a CAD model
- Optional: CAD file references, existing P/N databases
### Output
- 📄 **Markdown documentation** — Structured, version-controlled
- 📊 **Bill of Materials** — With standardized P/N
- 🔧 **Component registry** — Parts, functions, materials, specs
- 🎯 **Atomizer hints** — Parameters, constraints, objectives for FEA
- 📑 **CDR-ready PDF** — Via Atomaste Report Standard
## Installation
```bash
# Clone the repo
git clone http://100.80.199.40:3000/Antoine/CAD-Documenter.git
cd CAD-Documenter
# Install dependencies (using uv)
uv sync
```
### Requirements
- Python 3.12+
- ffmpeg (for video/audio processing)
- Whisper (for transcription)
## Usage
```bash
# Basic documentation
cad-doc video.mp4
# Full pipeline with all integrations
cad-doc video.mp4 \
--output docs/my_assembly/ \
--atomizer-hints \
--bom \
--pdf
# Just extract frames
cad-doc video.mp4 --frames-only --output frames/
```
## Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ CAD-DOCUMENTER │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Video │───►│ Frame │───►│ Vision │───►│ Struct │ │
│ │ Input │ │ Extract │ │ Analysis │ │ Output │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Audio │───►│ Whisper │───►│ Correlate│───►│ Generate │ │
│ │ Track │ │Transcribe│ │ Timeline │ │ Docs │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Integrations
- **Atomizer** → FEA setup instructions from verbal explanations
- **Part Manager** → Standardized P/N lookup
- **Atomaste Report Standard** → Professional PDF generation
## Project Status
🚧 **Phase 1: Core Pipeline (MVP)** — In Progress
See [ROADMAP.md](ROADMAP.md) for full implementation plan.
## License
MIT

49
ROADMAP.md Normal file
View File

@@ -0,0 +1,49 @@
# CAD-Documenter Roadmap
## Phase 1: Core Pipeline (MVP) — Week 1
- [ ] Video frame extraction (ffmpeg)
- [ ] Audio transcription (Whisper)
- [ ] Basic vision analysis (component identification)
- [ ] Markdown generation (simple template)
- [ ] CLI skeleton
**Goal:** `cad-doc video.mp4` produces basic markdown output
## Phase 2: Smart Analysis — Week 2
- [ ] Scene change detection
- [ ] Timestamp correlation (frames ↔ transcript)
- [ ] Feature extraction (holes, threads, etc.)
- [ ] Material estimation
- [ ] Improved prompts for vision
**Goal:** Output correlates visual and verbal information intelligently
## Phase 3: Integrations — Week 3
- [ ] Part Manager API integration
- [ ] BOM generation with P/N lookup
- [ ] Atomizer hints generation
- [ ] Atomaste Report Standard PDF
**Goal:** Full integration with Atomaste ecosystem
## Phase 4: Polish — Week 4
- [ ] Interactive mode
- [ ] Gitea auto-publish
- [ ] Error handling and recovery
- [ ] Documentation and examples
- [ ] Tests
**Goal:** Production-ready tool
---
## Success Metrics
- ⏱️ **Time saved:** 4+ hours per assembly documentation
- 📊 **Completeness:** 90%+ of components captured
- 🎯 **Accuracy:** P/N matching >95% with Part Manager
- 📄 **CDR ready:** PDF passes quality review without edits

View File

@@ -0,0 +1,47 @@
You are analyzing a CAD model video walkthrough. Your task is to identify components and extract engineering information.
## Context
The engineer is walking through a CAD model, explaining the design. You will receive:
1. A frame from the video showing the CAD model
2. The transcript text around this timestamp
3. Any previously identified components
## Your Task
Analyze the frame and identify:
1. **Components visible** - What parts/assemblies can you see?
2. **Features** - Holes, threads, fillets, chamfers, ribs, etc.
3. **Materials** - Based on appearance or transcript mentions
4. **Functions** - What does each component do? (from transcript)
5. **Relationships** - How do components connect/interface?
## Output Format
Return a JSON object:
```json
{
"components": [
{
"name": "Component Name",
"confidence": 0.95,
"description": "Brief description",
"function": "What it does (from transcript)",
"material": "Material if mentioned or estimated",
"features": ["feature1", "feature2"],
"bounding_box": [x1, y1, x2, y2] // optional
}
],
"assembly_relationships": [
{"from": "Component A", "to": "Component B", "type": "bolted"}
],
"transcript_matches": [
{"component": "Component Name", "excerpt": "relevant quote from transcript"}
]
}
```
## Guidelines
- Be specific with component names (not just "part 1")
- Note standard parts (fasteners, bearings) with specifications if visible
- Extract material from transcript mentions ("aluminum", "steel", etc.)
- Identify function from verbal explanations
- Note manufacturing features (machined, cast, printed, etc.)

View File

@@ -0,0 +1,33 @@
You are generating an executive summary for CAD model documentation.
## Input
You will receive:
1. List of identified components with their functions and materials
2. Full transcript of the engineer's walkthrough
3. Assembly name (if identified)
## Your Task
Write a professional engineering summary that:
1. **Opens with purpose** - What is this assembly for?
2. **Describes key components** - Main functional elements
3. **Notes design decisions** - Why certain choices were made (from transcript)
4. **Mentions critical interfaces** - Key connection points
5. **Highlights constraints** - Size, weight, performance requirements mentioned
## Style Guidelines
- Professional but concise
- Use engineering terminology appropriately
- 2-4 paragraphs maximum
- Focus on information useful for:
- Design reviews (CDR)
- FEA setup (Atomizer)
- Manufacturing handoff
- Future maintenance
## Example Output
"This assembly provides structural support for a NEMA 23 stepper motor in a CNC gantry application. The design prioritizes vibration damping and thermal management while maintaining a compact envelope under 200mm.
The primary load path transfers motor torque reaction through the main bracket (6061 aluminum) into the gantry frame via 4x M6 fasteners. Neoprene isolators between the motor and bracket provide vibration decoupling at motor operating frequencies.
Key design decisions include the cantilevered configuration for belt access and the use of aluminum over steel to reduce gantry moving mass for improved acceleration performance."

38
pyproject.toml Normal file
View File

@@ -0,0 +1,38 @@
[project]
name = "cad-documenter"
version = "0.1.0"
description = "Video walkthrough → Complete engineering documentation"
readme = "README.md"
requires-python = ">=3.12"
license = {text = "MIT"}
authors = [
{name = "Antoine Letarte", email = "antoine.letarte@gmail.com"},
]
dependencies = [
"click>=8.1.0",
"rich>=13.0.0",
"pydantic>=2.0.0",
"jinja2>=3.1.0",
"openai-whisper>=20231117",
"pillow>=10.0.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"ruff>=0.1.0",
]
[project.scripts]
cad-doc = "cad_documenter.cli:main"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/cad_documenter"]
[tool.ruff]
line-length = 100
target-version = "py312"

View File

@@ -0,0 +1,3 @@
"""CAD-Documenter: Video walkthrough → Complete engineering documentation."""
__version__ = "0.1.0"

View File

@@ -0,0 +1,104 @@
"""Audio analysis module - transcription via Whisper."""
from pathlib import Path
from dataclasses import dataclass
import subprocess
import tempfile
@dataclass
class TranscriptSegment:
"""A segment of transcribed audio."""
start: float # seconds
end: float
text: str
@dataclass
class Transcript:
"""Full transcript with segments."""
segments: list[TranscriptSegment]
full_text: str
def get_text_at(self, timestamp: float, window: float = 5.0) -> str:
"""Get transcript text around a specific timestamp."""
relevant = []
for seg in self.segments:
if seg.start <= timestamp + window and seg.end >= timestamp - window:
relevant.append(seg.text)
return " ".join(relevant)
class AudioAnalyzer:
"""Handles audio transcription using Whisper."""
def __init__(self, video_path: Path, model: str = "base"):
self.video_path = video_path
self.model = model
def transcribe(self) -> Transcript:
"""
Transcribe audio from video using Whisper.
Returns:
Transcript object with segments and full text
"""
# Extract audio to temp file
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
audio_path = Path(f.name)
# Extract audio using ffmpeg
cmd = [
"ffmpeg", "-y",
"-i", str(self.video_path),
"-vn", "-acodec", "pcm_s16le",
"-ar", "16000", "-ac", "1",
str(audio_path)
]
subprocess.run(cmd, capture_output=True)
# Run Whisper
try:
import whisper
model = whisper.load_model(self.model)
result = model.transcribe(str(audio_path), word_timestamps=True)
segments = []
for seg in result.get("segments", []):
segments.append(TranscriptSegment(
start=seg["start"],
end=seg["end"],
text=seg["text"].strip()
))
return Transcript(
segments=segments,
full_text=result.get("text", "").strip()
)
finally:
# Cleanup temp file
audio_path.unlink(missing_ok=True)
def extract_keywords(self, transcript: Transcript) -> list[str]:
"""Extract likely component names and technical terms."""
# Simple keyword extraction - can be enhanced with NLP
keywords = []
indicator_phrases = [
"this is the", "this is a", "here we have",
"the main", "called the", "known as",
"this part", "this component", "this assembly"
]
text_lower = transcript.full_text.lower()
for phrase in indicator_phrases:
if phrase in text_lower:
# Find what comes after the phrase
idx = text_lower.find(phrase)
after = transcript.full_text[idx + len(phrase):idx + len(phrase) + 50]
# Take first few words
words = after.strip().split()[:3]
if words:
keywords.append(" ".join(words).strip(",.;:"))
return list(set(keywords))

86
src/cad_documenter/cli.py Normal file
View File

@@ -0,0 +1,86 @@
"""CAD-Documenter CLI - Main entry point."""
import click
from pathlib import Path
from rich.console import Console
from .pipeline import DocumentationPipeline
console = Console()
@click.command()
@click.argument("video", type=click.Path(exists=True, path_type=Path))
@click.option("-o", "--output", type=click.Path(path_type=Path), help="Output directory")
@click.option("--frames-only", is_flag=True, help="Only extract frames, skip documentation")
@click.option("--atomizer-hints", is_flag=True, help="Generate Atomizer FEA hints")
@click.option("--bom", is_flag=True, help="Generate Bill of Materials")
@click.option("--pdf", is_flag=True, help="Generate PDF via Atomaste Report Standard")
@click.option("--frame-interval", default=2.0, help="Seconds between frame extractions")
@click.option("--whisper-model", default="base", help="Whisper model size (tiny/base/small/medium/large)")
@click.version_option()
def main(
video: Path,
output: Path | None,
frames_only: bool,
atomizer_hints: bool,
bom: bool,
pdf: bool,
frame_interval: float,
whisper_model: str,
):
"""
Generate engineering documentation from a CAD walkthrough video.
VIDEO: Path to the video file (.mp4, .mov, .avi, etc.)
"""
console.print(f"[bold blue]CAD-Documenter[/bold blue] v0.1.0")
console.print(f"Processing: [cyan]{video}[/cyan]")
# Default output directory
if output is None:
output = video.parent / f"{video.stem}_docs"
output.mkdir(parents=True, exist_ok=True)
# Run pipeline
pipeline = DocumentationPipeline(
video_path=video,
output_dir=output,
frame_interval=frame_interval,
whisper_model=whisper_model,
)
if frames_only:
console.print("[yellow]Extracting frames only...[/yellow]")
pipeline.extract_frames()
console.print(f"[green]✓[/green] Frames saved to {output / 'frames'}")
return
# Full pipeline
console.print("[yellow]Step 1/4:[/yellow] Extracting frames...")
frames = pipeline.extract_frames()
console.print(f" [green]✓[/green] Extracted {len(frames)} frames")
console.print("[yellow]Step 2/4:[/yellow] Transcribing audio...")
transcript = pipeline.transcribe_audio()
console.print(f" [green]✓[/green] Transcribed {len(transcript.segments)} segments")
console.print("[yellow]Step 3/4:[/yellow] Analyzing components...")
analysis = pipeline.analyze_components(frames, transcript)
console.print(f" [green]✓[/green] Identified {len(analysis.components)} components")
console.print("[yellow]Step 4/4:[/yellow] Generating documentation...")
doc_path = pipeline.generate_documentation(analysis, atomizer_hints=atomizer_hints, bom=bom)
console.print(f" [green]✓[/green] Documentation saved to {doc_path}")
if pdf:
console.print("[yellow]Generating PDF...[/yellow]")
pdf_path = pipeline.generate_pdf(doc_path)
console.print(f" [green]✓[/green] PDF saved to {pdf_path}")
console.print(f"\n[bold green]Done![/bold green] Output: {output}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,218 @@
"""Documentation generator - produces markdown and PDF output."""
from pathlib import Path
from datetime import datetime
from jinja2 import Environment, FileSystemLoader, BaseLoader
from .vision_analyzer import ComponentAnalysis, Component
# Default template embedded in code (can be overridden by files)
DEFAULT_TEMPLATE = '''# {{ analysis.assembly_name }} - Technical Documentation
**Generated:** {{ timestamp }}
**Source:** Video walkthrough documentation
---
## Executive Summary
{{ analysis.summary }}
---
## Components
{% for component in analysis.components %}
### {{ loop.index }}. {{ component.name }}
{% if component.description %}
{{ component.description }}
{% endif %}
{% if component.function %}
- **Function:** {{ component.function }}
{% endif %}
{% if component.material %}
- **Material:** {{ component.material }}
{% endif %}
{% if component.part_number %}
- **Part Number:** {{ component.part_number }}
{% endif %}
{% if component.features %}
**Key Features:**
{% for feature in component.features %}
- {{ feature }}
{% endfor %}
{% endif %}
{% if component.best_frame %}
![{{ component.name }}](frames/{{ component.best_frame.path.name }})
{% endif %}
{% if component.transcript_excerpt %}
> *From walkthrough:* "{{ component.transcript_excerpt }}"
{% endif %}
---
{% endfor %}
{% if bom %}
## Bill of Materials
| Item | P/N | Name | Qty | Material | Notes |
|------|-----|------|-----|----------|-------|
{% for component in analysis.components %}
| {{ loop.index }} | {{ component.part_number or 'TBD' }} | {{ component.name }} | 1 | {{ component.material or 'TBD' }} | {{ component.function }} |
{% endfor %}
{% endif %}
{% if analysis.assembly_notes %}
## Assembly Notes
{{ analysis.assembly_notes }}
{% endif %}
{% if atomizer_hints %}
## Atomizer FEA Hints
Based on the video walkthrough, the following optimization parameters are suggested:
```json
{
"model_understanding": {
"components": {{ component_names | tojson }},
"materials_mentioned": {{ materials | tojson }}
},
"suggested_study": {
"objectives": [
{"name": "mass", "direction": "minimize"}
],
"constraints_likely": []
}
}
```
{% endif %}
---
## Raw Transcript
<details>
<summary>Click to expand full transcript</summary>
{{ analysis.raw_transcript }}
</details>
---
*Documentation generated by CAD-Documenter*
'''
class DocGenerator:
"""Generates documentation from analysis results."""
def __init__(self, output_dir: Path, template_dir: Path | None = None):
self.output_dir = output_dir
self.output_dir.mkdir(parents=True, exist_ok=True)
# Setup Jinja environment
if template_dir and template_dir.exists():
self.env = Environment(loader=FileSystemLoader(template_dir))
else:
self.env = Environment(loader=BaseLoader())
def generate(
self,
analysis: ComponentAnalysis,
atomizer_hints: bool = False,
bom: bool = False,
template_name: str | None = None,
) -> Path:
"""Generate markdown documentation."""
# Load template
if template_name:
template = self.env.get_template(f"{template_name}.md.j2")
else:
template = self.env.from_string(DEFAULT_TEMPLATE)
# Prepare template context
context = {
"analysis": analysis,
"timestamp": datetime.now().strftime("%Y-%m-%d %H:%M"),
"atomizer_hints": atomizer_hints,
"bom": bom,
"component_names": [c.name for c in analysis.components],
"materials": list(set(c.material for c in analysis.components if c.material)),
}
# Render
content = template.render(**context)
# Write output
output_path = self.output_dir / "documentation.md"
output_path.write_text(content)
return output_path
def generate_pdf(self, markdown_path: Path) -> Path:
"""
Generate PDF from markdown using Atomaste Report Standard.
Requires the atomaste-reports skill/Typst to be available.
"""
import subprocess
pdf_path = markdown_path.with_suffix(".pdf")
# Try to use Atomaste Report Standard if available
# Otherwise fall back to pandoc
try:
# Check if atomaste build script exists
build_script = Path("/home/papa/Atomaste/Templates/Atomaste_Report_Standard/scripts/build-report.py")
if build_script.exists():
cmd = ["python3", str(build_script), str(markdown_path), "-o", str(pdf_path)]
else:
# Fallback to pandoc
cmd = ["pandoc", str(markdown_path), "-o", str(pdf_path)]
subprocess.run(cmd, capture_output=True, check=True)
except subprocess.CalledProcessError as e:
raise RuntimeError(f"PDF generation failed: {e}")
return pdf_path
def generate_atomizer_hints(self, analysis: ComponentAnalysis) -> Path:
"""Generate standalone Atomizer hints JSON file."""
import json
hints = {
"model_understanding": {
"assembly_name": analysis.assembly_name,
"components": [c.name for c in analysis.components],
"materials_mentioned": list(set(c.material for c in analysis.components if c.material)),
"functions": {c.name: c.function for c in analysis.components if c.function},
},
"suggested_spec": {
"objectives": [
{"name": "mass", "direction": "minimize"}
],
"parameters_likely": [],
"constraints_likely": [],
},
"transcript_highlights": [],
}
output_path = self.output_dir / "atomizer_hints.json"
output_path.write_text(json.dumps(hints, indent=2))
return output_path

View File

@@ -0,0 +1,64 @@
"""Main documentation pipeline orchestrator."""
from pathlib import Path
from dataclasses import dataclass, field
from .video_processor import VideoProcessor, FrameInfo
from .audio_analyzer import AudioAnalyzer, Transcript
from .vision_analyzer import VisionAnalyzer, ComponentAnalysis
from .doc_generator import DocGenerator
@dataclass
class PipelineConfig:
"""Pipeline configuration."""
frame_interval: float = 2.0
whisper_model: str = "base"
vision_model: str = "gpt-4o" # or local model
@dataclass
class DocumentationPipeline:
"""Orchestrates the full documentation pipeline."""
video_path: Path
output_dir: Path
frame_interval: float = 2.0
whisper_model: str = "base"
def __post_init__(self):
self.video_processor = VideoProcessor(self.video_path, self.output_dir / "frames")
self.audio_analyzer = AudioAnalyzer(self.video_path, self.whisper_model)
self.vision_analyzer = VisionAnalyzer()
self.doc_generator = DocGenerator(self.output_dir)
def extract_frames(self) -> list[FrameInfo]:
"""Extract key frames from video."""
return self.video_processor.extract_frames(interval=self.frame_interval)
def transcribe_audio(self) -> Transcript:
"""Transcribe audio track."""
return self.audio_analyzer.transcribe()
def analyze_components(
self, frames: list[FrameInfo], transcript: Transcript
) -> ComponentAnalysis:
"""Analyze frames + transcript to identify components."""
return self.vision_analyzer.analyze(frames, transcript)
def generate_documentation(
self,
analysis: ComponentAnalysis,
atomizer_hints: bool = False,
bom: bool = False,
) -> Path:
"""Generate markdown documentation."""
return self.doc_generator.generate(
analysis,
atomizer_hints=atomizer_hints,
bom=bom,
)
def generate_pdf(self, markdown_path: Path) -> Path:
"""Generate PDF from markdown using Atomaste Report Standard."""
return self.doc_generator.generate_pdf(markdown_path)

View File

@@ -0,0 +1,112 @@
"""Video processing module - frame extraction and scene detection."""
import subprocess
import json
from pathlib import Path
from dataclasses import dataclass
@dataclass
class FrameInfo:
"""Information about an extracted frame."""
path: Path
timestamp: float # seconds
frame_number: int
class VideoProcessor:
"""Handles video frame extraction using ffmpeg."""
def __init__(self, video_path: Path, output_dir: Path):
self.video_path = video_path
self.output_dir = output_dir
self.output_dir.mkdir(parents=True, exist_ok=True)
def get_duration(self) -> float:
"""Get video duration in seconds."""
cmd = [
"ffprobe", "-v", "quiet",
"-print_format", "json",
"-show_format",
str(self.video_path)
]
result = subprocess.run(cmd, capture_output=True, text=True)
data = json.loads(result.stdout)
return float(data["format"]["duration"])
def extract_frames(self, interval: float = 2.0) -> list[FrameInfo]:
"""
Extract frames at regular intervals.
Args:
interval: Seconds between frame extractions
Returns:
List of FrameInfo objects for extracted frames
"""
duration = self.get_duration()
frames = []
# Use ffmpeg to extract frames at interval
output_pattern = self.output_dir / "frame_%04d.jpg"
cmd = [
"ffmpeg", "-y",
"-i", str(self.video_path),
"-vf", f"fps=1/{interval}",
"-q:v", "2", # High quality JPEG
str(output_pattern)
]
subprocess.run(cmd, capture_output=True)
# Collect extracted frames
for i, frame_path in enumerate(sorted(self.output_dir.glob("frame_*.jpg"))):
timestamp = i * interval
frames.append(FrameInfo(
path=frame_path,
timestamp=timestamp,
frame_number=i
))
return frames
def extract_audio(self, output_path: Path | None = None) -> Path:
"""Extract audio track from video."""
if output_path is None:
output_path = self.output_dir.parent / "audio.wav"
cmd = [
"ffmpeg", "-y",
"-i", str(self.video_path),
"-vn", # No video
"-acodec", "pcm_s16le",
"-ar", "16000", # 16kHz for Whisper
"-ac", "1", # Mono
str(output_path)
]
subprocess.run(cmd, capture_output=True)
return output_path
def detect_scene_changes(self, threshold: float = 0.3) -> list[float]:
"""
Detect scene changes in video.
Returns list of timestamps where significant visual changes occur.
"""
cmd = [
"ffmpeg", "-i", str(self.video_path),
"-vf", f"select='gt(scene,{threshold})',showinfo",
"-f", "null", "-"
]
result = subprocess.run(cmd, capture_output=True, text=True)
# Parse scene change timestamps from ffmpeg output
timestamps = []
for line in result.stderr.split("\n"):
if "pts_time:" in line:
# Extract timestamp
parts = line.split("pts_time:")
if len(parts) > 1:
ts = float(parts[1].split()[0])
timestamps.append(ts)
return timestamps

View File

@@ -0,0 +1,111 @@
"""Vision analysis module - component detection and feature extraction."""
from pathlib import Path
from dataclasses import dataclass, field
from .video_processor import FrameInfo
from .audio_analyzer import Transcript
@dataclass
class Component:
"""A detected component from the CAD model."""
name: str
description: str
function: str = ""
material: str = ""
features: list[str] = field(default_factory=list)
best_frame: FrameInfo | None = None
transcript_excerpt: str = ""
part_number: str = "" # For Part Manager integration
@dataclass
class ComponentAnalysis:
"""Complete analysis results."""
assembly_name: str
summary: str
components: list[Component]
assembly_notes: str = ""
raw_transcript: str = ""
class VisionAnalyzer:
"""Analyzes frames to identify components and features."""
def __init__(self, model: str = "gpt-4o"):
self.model = model
def analyze(
self, frames: list[FrameInfo], transcript: Transcript
) -> ComponentAnalysis:
"""
Analyze frames and transcript to identify components.
This is where the AI magic happens - correlating visual and verbal info.
"""
# For MVP, we'll use a multi-modal approach:
# 1. Send key frames to vision model with transcript context
# 2. Ask it to identify components and correlate with verbal descriptions
# Placeholder implementation - will be enhanced with actual AI calls
components = self._identify_components(frames, transcript)
summary = self._generate_summary(components, transcript)
return ComponentAnalysis(
assembly_name=self._extract_assembly_name(transcript),
summary=summary,
components=components,
assembly_notes=self._extract_assembly_notes(transcript),
raw_transcript=transcript.full_text,
)
def _identify_components(
self, frames: list[FrameInfo], transcript: Transcript
) -> list[Component]:
"""Identify individual components from frames + transcript."""
# TODO: Implement vision API calls
# For now, return empty list - will be implemented in Phase 1
return []
def _generate_summary(
self, components: list[Component], transcript: Transcript
) -> str:
"""Generate executive summary of the assembly."""
# TODO: Implement with LLM
return f"Assembly documentation generated from video walkthrough. {len(components)} components identified."
def _extract_assembly_name(self, transcript: Transcript) -> str:
"""Try to extract assembly name from transcript."""
# Look for common patterns
text = transcript.full_text.lower()
patterns = ["this is the", "presenting the", "looking at the", "reviewing the"]
for pattern in patterns:
if pattern in text:
idx = text.find(pattern) + len(pattern)
name = transcript.full_text[idx:idx + 50].strip().split(".")[0]
return name.strip()
return "Untitled Assembly"
def _extract_assembly_notes(self, transcript: Transcript) -> str:
"""Extract assembly-related notes from transcript."""
# Look for assembly instructions in transcript
keywords = ["assemble", "install", "mount", "attach", "connect"]
notes = []
for seg in transcript.segments:
if any(kw in seg.text.lower() for kw in keywords):
notes.append(seg.text)
return " ".join(notes) if notes else ""
def analyze_single_frame(self, frame: FrameInfo, context: str = "") -> dict:
"""
Analyze a single frame for components and features.
Returns dict with detected components, features, and confidence.
"""
# TODO: Implement with vision API
return {
"components": [],
"features": [],
"confidence": 0.0
}

4
templates/.gitkeep Normal file
View File

@@ -0,0 +1,4 @@
# Template files go here
# - component.md.j2
# - assembly.md.j2
# - bom.md.j2

1
tests/__init__.py Normal file
View File

@@ -0,0 +1 @@
# CAD-Documenter tests

54
tests/test_pipeline.py Normal file
View File

@@ -0,0 +1,54 @@
"""Basic tests for CAD-Documenter pipeline."""
import pytest
from pathlib import Path
def test_imports():
"""Test that all modules can be imported."""
from cad_documenter import __version__
from cad_documenter.cli import main
from cad_documenter.pipeline import DocumentationPipeline
from cad_documenter.video_processor import VideoProcessor
from cad_documenter.audio_analyzer import AudioAnalyzer
from cad_documenter.vision_analyzer import VisionAnalyzer
from cad_documenter.doc_generator import DocGenerator
assert __version__ == "0.1.0"
def test_transcript_dataclass():
"""Test Transcript dataclass functionality."""
from cad_documenter.audio_analyzer import Transcript, TranscriptSegment
segments = [
TranscriptSegment(start=0.0, end=5.0, text="This is the main bracket"),
TranscriptSegment(start=5.0, end=10.0, text="It holds the motor"),
TranscriptSegment(start=10.0, end=15.0, text="Made of aluminum"),
]
transcript = Transcript(segments=segments, full_text="This is the main bracket. It holds the motor. Made of aluminum.")
# Test get_text_at
text = transcript.get_text_at(7.0, window=3.0)
assert "holds the motor" in text
assert "main bracket" in text
def test_component_dataclass():
"""Test Component dataclass."""
from cad_documenter.vision_analyzer import Component
component = Component(
name="Main Bracket",
description="Primary structural member",
function="Holds the motor",
material="Aluminum 6061-T6",
features=["4x M6 holes", "Fillet radii"],
)
assert component.name == "Main Bracket"
assert len(component.features) == 2
# TODO: Add integration tests with sample videos