Atomizer/docs/PHASE_3_3_VISUALIZATION_AND_CLEANUP.md

# Phase 3.3: Visualization & Model Cleanup System

**Status**: ✅ Complete
**Date**: 2025-11-17

## Overview

Phase 3.3 adds automated post-processing capabilities to Atomizer, including publication-quality visualization and intelligent model cleanup to manage disk space.

---

## Features Implemented

### 1. Automated Visualization System

**File**: `optimization_engine/visualizer.py`

**Capabilities**:
- **Convergence Plots**: Objective value vs trial number with running best
- **Design Space Exploration**: Parameter evolution colored by performance
- **Parallel Coordinate Plots**: High-dimensional visualization
- **Sensitivity Heatmaps**: Parameter correlation analysis
- **Constraint Violations**: Track constraint satisfaction over trials
- **Multi-Objective Breakdown**: Individual objective contributions

**Output Formats**:
- PNG (high-resolution, 300 DPI)
- PDF (vector graphics, publication-ready)
- Customizable via configuration

**Example Usage**:
```bash
# Standalone visualization
python optimization_engine/visualizer.py studies/beam/substudies/opt1 png pdf

# Automatic during optimization (configured in JSON)
```

### 2. Model Cleanup System

**File**: `optimization_engine/model_cleanup.py`

**Purpose**: Reduce disk usage by deleting large CAD/FEM files from non-optimal trials

**Strategy**:
- Keep top-N best trials (configurable)
- Delete large files: `.prt`, `.sim`, `.fem`, `.op2`, `.f06`
- Preserve ALL `results.json` (small, critical data)
- Dry-run mode for safety

**Example Usage**:
```bash
# Standalone cleanup
python optimization_engine/model_cleanup.py studies/beam/substudies/opt1 --keep-top-n 10

# Dry run (preview without deleting)
python optimization_engine/model_cleanup.py studies/beam/substudies/opt1 --dry-run

# Automatic during optimization (configured in JSON)
```

### 3. Optuna Dashboard Integration

**File**: `docs/OPTUNA_DASHBOARD.md`

**Capabilities**:
- Real-time monitoring during optimization
- Interactive parallel coordinate plots
- Parameter importance analysis (fANOVA)
- Multi-study comparison

**Usage**:
```bash
# Launch dashboard for a study
cd studies/beam/substudies/opt1
optuna-dashboard sqlite:///optuna_study.db

# Access at http://localhost:8080
```

---

## Configuration

### JSON Configuration Format

Add `post_processing` section to optimization config:

```json
{
  "study_name": "my_optimization",
  "design_variables": { ... },
  "objectives": [ ... ],
  "optimization_settings": {
    "n_trials": 50,
    ...
  },
  "post_processing": {
    "generate_plots": true,
    "plot_formats": ["png", "pdf"],
    "cleanup_models": true,
    "keep_top_n_models": 10,
    "cleanup_dry_run": false
  }
}
```

### Configuration Options

#### Visualization Settings

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `generate_plots` | boolean | `false` | Enable automatic plot generation |
| `plot_formats` | list | `["png", "pdf"]` | Output formats for plots |

#### Cleanup Settings

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `cleanup_models` | boolean | `false` | Enable model cleanup |
| `keep_top_n_models` | integer | `10` | Number of best trials to keep models for |
| `cleanup_dry_run` | boolean | `false` | Preview cleanup without deleting |

---

## Workflow Integration

### Automatic Post-Processing

When configured, post-processing runs automatically after optimization completes:

```
OPTIMIZATION COMPLETE
===========================================================
...

POST-PROCESSING
===========================================================

Generating visualization plots...
  - Generating convergence plot...
  - Generating design space exploration...
  - Generating parallel coordinate plot...
  - Generating sensitivity heatmap...
  Plots generated: 2 format(s)
  Improvement: 23.1%
  Location: studies/beam/substudies/opt1/plots

Cleaning up trial models...
  Deleted 320 files from 40 trials
  Space freed: 1542.3 MB
  Kept top 10 trial models
===========================================================
```

### Directory Structure After Post-Processing

```
studies/my_optimization/
├── substudies/
│   └── opt1/
│       ├── trial_000/             # Top performer - KEPT
│       │   ├── Beam.prt          # CAD files kept
│       │   ├── Beam_sim1.sim
│       │   └── results.json
│       ├── trial_001/             # Poor performer - CLEANED
│       │   └── results.json      # Only results kept
│       ├── ...
│       ├── plots/                 # NEW: Auto-generated
│       │   ├── convergence.png
│       │   ├── convergence.pdf
│       │   ├── design_space_evolution.png
│       │   ├── design_space_evolution.pdf
│       │   ├── parallel_coordinates.png
│       │   ├── parallel_coordinates.pdf
│       │   └── plot_summary.json
│       ├── history.json
│       ├── best_trial.json
│       ├── cleanup_log.json       # NEW: Cleanup statistics
│       └── optuna_study.pkl
```

---

## Plot Types

### 1. Convergence Plot

**File**: `convergence.png/pdf`

**Shows**:
- Individual trial objectives (scatter)
- Running best (line)
- Best trial highlighted (gold star)
- Improvement percentage annotation

**Use Case**: Assess optimization convergence and identify best trial

### 2. Design Space Exploration

**File**: `design_space_evolution.png/pdf`

**Shows**:
- Each design variable evolution over trials
- Color-coded by objective value (darker = better)
- Best trial highlighted
- Units displayed on y-axis

**Use Case**: Understand how parameters changed during optimization

### 3. Parallel Coordinate Plot

**File**: `parallel_coordinates.png/pdf`

**Shows**:
- High-dimensional view of design space
- Each line = one trial
- Color-coded by objective
- Best trial highlighted

**Use Case**: Visualize relationships between multiple design variables

### 4. Sensitivity Heatmap

**File**: `sensitivity_heatmap.png/pdf`

**Shows**:
- Correlation matrix: design variables vs objectives
- Values: -1 (negative correlation) to +1 (positive)
- Color-coded: red (negative), blue (positive)

**Use Case**: Identify which parameters most influence objectives

### 5. Constraint Violations

**File**: `constraint_violations.png/pdf` (if constraints exist)

**Shows**:
- Constraint values over trials
- Feasibility threshold (red line at y=0)
- Trend of constraint satisfaction

**Use Case**: Verify constraint satisfaction throughout optimization

### 6. Objective Breakdown

**File**: `objective_breakdown.png/pdf` (if multi-objective)

**Shows**:
- Stacked area plot of individual objectives
- Total objective overlay
- Contribution of each objective over trials

**Use Case**: Understand multi-objective trade-offs

---

## Benefits

### Visualization

✅ **Publication-Ready**: High-DPI PNG and vector PDF exports
✅ **Automated**: No manual post-processing required
✅ **Comprehensive**: 6 plot types cover all optimization aspects
✅ **Customizable**: Configurable formats and styling
✅ **Portable**: Plots embedded in reports, papers, presentations

### Model Cleanup

✅ **Disk Space Savings**: 50-90% reduction typical (depends on model size)
✅ **Selective**: Keeps best trials for validation/reproduction
✅ **Safe**: Preserves all critical data (results.json)
✅ **Traceable**: Cleanup log documents what was deleted
✅ **Reversible**: Dry-run mode previews before deletion

### Optuna Dashboard

✅ **Real-Time**: Monitor optimization while it runs
✅ **Interactive**: Zoom, filter, explore data dynamically
✅ **Advanced**: Parameter importance, contour plots
✅ **Comparative**: Multi-study comparison support

---

## Example: Beam Optimization

**Configuration**:
```json
{
  "study_name": "simple_beam_optimization",
  "optimization_settings": {
    "n_trials": 50
  },
  "post_processing": {
    "generate_plots": true,
    "plot_formats": ["png", "pdf"],
    "cleanup_models": true,
    "keep_top_n_models": 10
  }
}
```

**Results**:
- 50 trials completed
- 6 plots generated (× 2 formats = 12 files)
- 40 trials cleaned up
- 1.2 GB disk space freed
- Top 10 trial models retained for validation

**Files Generated**:
- `plots/convergence.{png,pdf}`
- `plots/design_space_evolution.{png,pdf}`
- `plots/parallel_coordinates.{png,pdf}`
- `plots/plot_summary.json`
- `cleanup_log.json`

---

## Future Enhancements

### Potential Additions

1. **Interactive HTML Plots**: Plotly-based interactive visualizations
2. **Automated Report Generation**: Markdown → PDF with embedded plots
3. **Video Animation**: Design evolution as animated GIF/MP4
4. **3D Scatter Plots**: For high-dimensional design spaces
5. **Statistical Analysis**: Confidence intervals, significance tests
6. **Comparison Reports**: Side-by-side substudy comparison

### Configuration Expansion

```json
"post_processing": {
  "generate_plots": true,
  "plot_formats": ["png", "pdf", "html"],  // Add interactive
  "plot_style": "publication",              // Predefined styles
  "generate_report": true,                  // Auto-generate PDF report
  "report_template": "default",             // Custom templates
  "cleanup_models": true,
  "keep_top_n_models": 10,
  "archive_cleaned_trials": false           // Compress instead of delete
}
```

---

## Troubleshooting

### Matplotlib Import Error

**Problem**: `ImportError: No module named 'matplotlib'`

**Solution**: Install visualization dependencies
```bash
conda install -n atomizer matplotlib pandas "numpy<2" -y
```

### Unicode Display Error

**Problem**: Checkmark character displays incorrectly in Windows console

**Status**: Fixed (replaced Unicode with "SUCCESS:")

### Missing history.json

**Problem**: Older substudies don't have `history.json`

**Solution**: Generate from trial results
```bash
python optimization_engine/generate_history_from_trials.py studies/beam/substudies/opt1
```

### Cleanup Deleted Wrong Files

**Prevention**: ALWAYS use dry-run first!
```bash
python optimization_engine/model_cleanup.py <substudy> --dry-run
```

---

## Technical Details

### Dependencies

**Required**:
- `matplotlib >= 3.10`
- `numpy < 2.0` (pyNastran compatibility)
- `pandas >= 2.3`
- `optuna >= 3.0` (for dashboard)

**Optional**:
- `optuna-dashboard` (for real-time monitoring)

### Performance

**Visualization**:
- 50 trials: ~5-10 seconds
- 100 trials: ~10-15 seconds
- 500 trials: ~30-40 seconds

**Cleanup**:
- Depends on file count and sizes
- Typically < 1 minute for 100 trials

---

## Summary

Phase 3.3 completes Atomizer's post-processing capabilities with:

✅ Automated publication-quality visualization
✅ Intelligent model cleanup for disk space management
✅ Optuna dashboard integration for real-time monitoring
✅ Comprehensive configuration options
✅ Full integration with optimization workflow

**Next Phase**: Phase 3.4 - Report Generation & Statistical Analysis