313 lines
8.5 KiB
Markdown
313 lines
8.5 KiB
Markdown
|
|
# Phase 1.3: Error Handling & Logging - Implementation Plan
|
||
|
|
|
||
|
|
**Goal**: Implement production-ready logging and error handling system for MVP stability.
|
||
|
|
|
||
|
|
**Status**: MVP Complete (2025-11-24)
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
Phase 1.3 establishes a consistent, professional logging system across all Atomizer optimization studies. This replaces ad-hoc `print()` statements with structured logging that supports:
|
||
|
|
|
||
|
|
- File and console output
|
||
|
|
- Color-coded log levels (Windows 10+ and Unix)
|
||
|
|
- Trial-specific logging methods
|
||
|
|
- Automatic log rotation
|
||
|
|
- Zero external dependencies (stdlib only)
|
||
|
|
|
||
|
|
## Problem Analysis
|
||
|
|
|
||
|
|
### Current State (Before Phase 1.3)
|
||
|
|
|
||
|
|
Analyzed the codebase and found:
|
||
|
|
- **1416 occurrences** of logging/print across 79 files (mostly ad-hoc `print()` statements)
|
||
|
|
- **411 occurrences** of `try:/except/raise` across 59 files
|
||
|
|
- Mixed error handling approaches:
|
||
|
|
- Some studies use traceback.print_exc()
|
||
|
|
- Some use simple print() for errors
|
||
|
|
- No consistent logging format
|
||
|
|
- No file logging in most studies
|
||
|
|
- Some studies have `--resume` capability, but implementation varies
|
||
|
|
|
||
|
|
### Requirements
|
||
|
|
|
||
|
|
1. **Drop-in Replacement**: Minimal code changes to adopt
|
||
|
|
2. **Production-Ready**: File logging with rotation, timestamps, proper levels
|
||
|
|
3. **Dashboard-Friendly**: Structured trial logging for future integration
|
||
|
|
4. **Windows-Compatible**: ANSI color support on Windows 10+
|
||
|
|
5. **No Dependencies**: Use only Python stdlib
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## ✅ Phase 1.3 MVP - Completed (2025-11-24)
|
||
|
|
|
||
|
|
### Task 1: Structured Logging System ✅ DONE
|
||
|
|
|
||
|
|
**File Created**: `optimization_engine/logger.py` (330 lines)
|
||
|
|
|
||
|
|
**Features Implemented**:
|
||
|
|
|
||
|
|
1. **AtomizerLogger Class** - Extended logger with trial-specific methods:
|
||
|
|
```python
|
||
|
|
logger.trial_start(trial_number=5, design_vars={"thickness": 2.5})
|
||
|
|
logger.trial_complete(trial_number=5, objectives={"mass": 120})
|
||
|
|
logger.trial_failed(trial_number=5, error="Simulation failed")
|
||
|
|
logger.study_start(study_name="test", n_trials=30, sampler="TPESampler")
|
||
|
|
logger.study_complete(study_name="test", n_trials=30, n_successful=28)
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Color-Coded Console Output** - ANSI colors for Windows and Unix:
|
||
|
|
- DEBUG: Cyan
|
||
|
|
- INFO: Green
|
||
|
|
- WARNING: Yellow
|
||
|
|
- ERROR: Red
|
||
|
|
- CRITICAL: Magenta
|
||
|
|
|
||
|
|
3. **File Logging with Rotation**:
|
||
|
|
- Automatically creates `{study_dir}/optimization.log`
|
||
|
|
- 50MB max file size
|
||
|
|
- 3 backup files (optimization.log.1, .2, .3)
|
||
|
|
- UTF-8 encoding
|
||
|
|
- Detailed format: `timestamp | level | module | message`
|
||
|
|
|
||
|
|
4. **Simple API**:
|
||
|
|
```python
|
||
|
|
# Basic logger
|
||
|
|
from optimization_engine.logger import get_logger
|
||
|
|
logger = get_logger(__name__)
|
||
|
|
logger.info("Starting optimization...")
|
||
|
|
|
||
|
|
# Study logger with file output
|
||
|
|
logger = get_logger(
|
||
|
|
"drone_gimbal_arm",
|
||
|
|
study_dir=Path("studies/drone_gimbal_arm/2_results")
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Testing**: Successfully tested on Windows with color output and file logging.
|
||
|
|
|
||
|
|
### Task 2: Documentation ✅ DONE
|
||
|
|
|
||
|
|
**File Created**: This implementation plan
|
||
|
|
|
||
|
|
**Docstrings**: Comprehensive docstrings in `logger.py` with usage examples
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🔨 Remaining Tasks (Phase 1.3.1+)
|
||
|
|
|
||
|
|
### Phase 1.3.1: Integration with Existing Studies
|
||
|
|
|
||
|
|
**Priority**: HIGH | **Effort**: 1-2 days
|
||
|
|
|
||
|
|
1. **Update drone_gimbal_arm_optimization study** (Reference implementation)
|
||
|
|
- Replace print() statements with logger calls
|
||
|
|
- Add file logging to 2_results/
|
||
|
|
- Use trial-specific logging methods
|
||
|
|
- Test to ensure colors work, logs rotate
|
||
|
|
|
||
|
|
2. **Create Migration Guide**
|
||
|
|
- Document how to convert existing studies
|
||
|
|
- Provide before/after examples
|
||
|
|
- Add to DEVELOPMENT.md
|
||
|
|
|
||
|
|
3. **Update create-study Claude Skill**
|
||
|
|
- Include logger setup in generated run_optimization.py
|
||
|
|
- Add logging best practices
|
||
|
|
|
||
|
|
### Phase 1.3.2: Enhanced Error Recovery
|
||
|
|
|
||
|
|
**Priority**: MEDIUM | **Effort**: 2-3 days
|
||
|
|
|
||
|
|
1. **Study Checkpoint Manager**
|
||
|
|
- Automatic checkpointing every N trials
|
||
|
|
- Save study state to `2_results/checkpoint.json`
|
||
|
|
- Resume from last checkpoint on crash
|
||
|
|
- Clean up old checkpoints
|
||
|
|
|
||
|
|
2. **Enhanced Error Context**
|
||
|
|
- Capture design variables on failure
|
||
|
|
- Log simulation command that failed
|
||
|
|
- Include FEA solver output in error log
|
||
|
|
- Structured error reporting for dashboard
|
||
|
|
|
||
|
|
3. **Graceful Degradation**
|
||
|
|
- Fallback when file logging fails
|
||
|
|
- Handle disk full scenarios
|
||
|
|
- Continue optimization if dashboard unreachable
|
||
|
|
|
||
|
|
### Phase 1.3.3: Notification System (Future)
|
||
|
|
|
||
|
|
**Priority**: LOW | **Effort**: 1-2 days
|
||
|
|
|
||
|
|
1. **Study Completion Notifications**
|
||
|
|
- Optional email notification when study completes
|
||
|
|
- Configurable via environment variables
|
||
|
|
- Include summary (best trial, success rate, etc.)
|
||
|
|
|
||
|
|
2. **Error Alerts**
|
||
|
|
- Optional notifications on critical failures
|
||
|
|
- Threshold-based (e.g., >50% trials failing)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Migration Strategy
|
||
|
|
|
||
|
|
### Priority 1: New Studies (Immediate)
|
||
|
|
|
||
|
|
All new studies created via create-study skill should use the new logging system by default.
|
||
|
|
|
||
|
|
**Action**: Update `.claude/skills/create-study.md` to generate run_optimization.py with logger.
|
||
|
|
|
||
|
|
### Priority 2: Reference Study (Phase 1.3.1)
|
||
|
|
|
||
|
|
Update `drone_gimbal_arm_optimization` as the reference implementation.
|
||
|
|
|
||
|
|
**Before**:
|
||
|
|
```python
|
||
|
|
print(f"Trial #{trial.number}")
|
||
|
|
print(f"Design Variables:")
|
||
|
|
for name, value in design_vars.items():
|
||
|
|
print(f" {name}: {value:.3f}")
|
||
|
|
```
|
||
|
|
|
||
|
|
**After**:
|
||
|
|
```python
|
||
|
|
logger.trial_start(trial.number, design_vars)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Priority 3: Other Studies (Phase 1.3.2)
|
||
|
|
|
||
|
|
Migrate remaining studies (bracket_stiffness, simple_beam, etc.) gradually.
|
||
|
|
|
||
|
|
**Timeline**: After drone_gimbal reference implementation is validated.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## API Reference
|
||
|
|
|
||
|
|
### Basic Usage
|
||
|
|
|
||
|
|
```python
|
||
|
|
from optimization_engine.logger import get_logger
|
||
|
|
|
||
|
|
# Module logger
|
||
|
|
logger = get_logger(__name__)
|
||
|
|
logger.info("Starting optimization")
|
||
|
|
logger.warning("Design variable out of range")
|
||
|
|
logger.error("Simulation failed", exc_info=True)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Study Logger
|
||
|
|
|
||
|
|
```python
|
||
|
|
from optimization_engine.logger import get_logger
|
||
|
|
from pathlib import Path
|
||
|
|
|
||
|
|
# Create study logger with file logging
|
||
|
|
logger = get_logger(
|
||
|
|
name="drone_gimbal_arm",
|
||
|
|
study_dir=Path("studies/drone_gimbal_arm/2_results")
|
||
|
|
)
|
||
|
|
|
||
|
|
# Study lifecycle
|
||
|
|
logger.study_start("drone_gimbal_arm", n_trials=30, sampler="NSGAIISampler")
|
||
|
|
|
||
|
|
# Trial logging
|
||
|
|
logger.trial_start(1, {"thickness": 2.5, "width": 10.0})
|
||
|
|
logger.info("Running FEA simulation...")
|
||
|
|
logger.trial_complete(
|
||
|
|
1,
|
||
|
|
objectives={"mass": 120, "stiffness": 1500},
|
||
|
|
constraints={"max_stress": 85},
|
||
|
|
feasible=True
|
||
|
|
)
|
||
|
|
|
||
|
|
# Error handling
|
||
|
|
try:
|
||
|
|
result = run_simulation()
|
||
|
|
except Exception as e:
|
||
|
|
logger.trial_failed(trial_number=2, error=str(e))
|
||
|
|
logger.error("Full traceback:", exc_info=True)
|
||
|
|
raise
|
||
|
|
|
||
|
|
logger.study_complete("drone_gimbal_arm", n_trials=30, n_successful=28)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Log Levels
|
||
|
|
|
||
|
|
```python
|
||
|
|
import logging
|
||
|
|
|
||
|
|
# Set logger level
|
||
|
|
logger = get_logger(__name__, level=logging.DEBUG)
|
||
|
|
|
||
|
|
logger.debug("Detailed debugging information")
|
||
|
|
logger.info("General information")
|
||
|
|
logger.warning("Warning message")
|
||
|
|
logger.error("Error occurred")
|
||
|
|
logger.critical("Critical failure")
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## File Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
optimization_engine/
|
||
|
|
├── logger.py # ✅ NEW - Structured logging system
|
||
|
|
└── config_manager.py # Phase 1.2
|
||
|
|
|
||
|
|
docs/07_DEVELOPMENT/
|
||
|
|
├── Phase_1_2_Implementation_Plan.md # Phase 1.2
|
||
|
|
└── Phase_1_3_Implementation_Plan.md # ✅ NEW - This file
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Testing Checklist
|
||
|
|
|
||
|
|
- [x] Logger creates file at correct location
|
||
|
|
- [x] Color output works on Windows 10
|
||
|
|
- [x] Log rotation works (max 50MB, 3 backups)
|
||
|
|
- [x] Trial-specific methods format correctly
|
||
|
|
- [x] UTF-8 encoding handles special characters
|
||
|
|
- [ ] Integration test with real optimization study
|
||
|
|
- [ ] Verify dashboard can parse structured logs
|
||
|
|
- [ ] Test error scenarios (disk full, permission denied)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Success Metrics
|
||
|
|
|
||
|
|
**Phase 1.3 MVP** (Complete):
|
||
|
|
- [x] Structured logging system implemented
|
||
|
|
- [x] Zero external dependencies
|
||
|
|
- [x] Works on Windows and Unix
|
||
|
|
- [x] File + console logging
|
||
|
|
- [x] Trial-specific methods
|
||
|
|
|
||
|
|
**Phase 1.3.1** (Next):
|
||
|
|
- [ ] At least one study uses new logging
|
||
|
|
- [ ] Migration guide written
|
||
|
|
- [ ] create-study skill updated
|
||
|
|
|
||
|
|
**Phase 1.3.2** (Later):
|
||
|
|
- [ ] Checkpoint/resume system
|
||
|
|
- [ ] Enhanced error reporting
|
||
|
|
- [ ] All studies migrated
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- **Phase 1.2**: [Configuration Management](./Phase_1_2_Implementation_Plan.md)
|
||
|
|
- **MVP Plan**: [12-Week Development Plan](./Today_Todo.md)
|
||
|
|
- **Python Logging**: https://docs.python.org/3/library/logging.html
|
||
|
|
- **Log Rotation**: https://docs.python.org/3/library/logging.handlers.html#rotatingfilehandler
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Questions?
|
||
|
|
|
||
|
|
For MVP development questions, refer to [DEVELOPMENT.md](../../DEVELOPMENT.md) or the main plan in `docs/07_DEVELOPMENT/Today_Todo.md`.
|