Atomizer/docs/STUDY_ORGANIZATION.md

# Study Organization Guide

**Date**: 2025-11-17
**Purpose**: Document recommended study directory structure and organization principles

---

## Current Organization Analysis

### Study Directory: `studies/simple_beam_optimization/`

**Current Structure**:
```
studies/simple_beam_optimization/
├── model/                           # Base CAD/FEM model (reference)
│   ├── Beam.prt
│   ├── Beam_sim1.sim
│   ├── beam_sim1-solution_1.op2
│   ├── beam_sim1-solution_1.f06
│   └── comprehensive_results_analysis.json
│
├── substudies/                      # All optimization runs
│   ├── benchmarking/
│   │   ├── benchmark_results.json
│   │   └── BENCHMARK_REPORT.md
│   ├── initial_exploration/
│   │   ├── config.json
│   │   └── optimization_config.json
│   ├── validation_3trials/
│   │   ├── trial_000/
│   │   ├── trial_001/
│   │   ├── trial_002/
│   │   ├── best_trial.json
│   │   └── optuna_study.pkl
│   ├── validation_4d_3trials/
│   │   └── [similar structure]
│   └── full_optimization_50trials/
│       ├── trial_000/
│       ├── ... trial_049/
│       ├── plots/                   # NEW: Auto-generated plots
│       ├── history.json
│       ├── best_trial.json
│       └── optuna_study.pkl
│
├── README.md                        # Study overview
├── study_metadata.json              # Study metadata
├── beam_optimization_config.json    # Main configuration
├── baseline_validation.json         # Baseline results
├── COMPREHENSIVE_BENCHMARK_RESULTS.md
├── OPTIMIZATION_RESULTS_50TRIALS.md
└── run_optimization.py              # Study-specific runner

```

---

## Assessment

### ✅ What's Working Well

1. **Substudy Isolation**: Each optimization run (substudy) is self-contained with its own trial directories, making it easy to compare different optimization strategies.

2. **Centralized Model**: The `model/` directory serves as a reference CAD/FEM model, which all substudies copy from.

3. **Configuration at Study Level**: `beam_optimization_config.json` provides the main configuration that substudies inherit from.

4. **Study-Level Documentation**: `README.md` and results markdown files at the study level provide high-level overviews.

5. **Clear Hierarchy**:
   - Study = Overall project (e.g., "optimize this beam")
   - Substudy = Specific optimization run (e.g., "50 trials with TPE sampler")
   - Trial = Individual design evaluation

### ⚠️ Issues Found

1. **Documentation Scattered**: Results documentation is at the study level (`OPTIMIZATION_RESULTS_50TRIALS.md`) but describes a specific substudy (`full_optimization_50trials`).

2. **Benchmarking Placement**: `substudies/benchmarking/` is not really a "substudy" - it's a validation step that should happen before optimization.

3. **Missing Substudy Metadata**: Some substudies lack their own README or summary files to explain what they tested.

4. **Inconsistent Naming**: `validation_3trials` vs `validation_4d_3trials` - unclear what distinguishes them without investigation.

5. **Study Metadata Incomplete**: `study_metadata.json` lists only "initial_exploration" substudy, but there are 5 substudies present.

---

## Recommended Organization

### Proposed Structure

```
studies/simple_beam_optimization/
│
├── 1_setup/                         # NEW: Pre-optimization setup
│   ├── model/                       # Reference CAD/FEM model
│   │   ├── Beam.prt
│   │   ├── Beam_sim1.sim
│   │   └── ...
│   ├── benchmarking/                # Baseline validation
│   │   ├── benchmark_results.json
│   │   └── BENCHMARK_REPORT.md
│   └── baseline_validation.json
│
├── 2_substudies/                    # Optimization runs
│   ├── 01_initial_exploration/
│   │   ├── README.md                # What was tested, why
│   │   ├── config.json
│   │   ├── trial_000/
│   │   ├── ...
│   │   └── results_summary.md       # Substudy-specific results
│   ├── 02_validation_3d_3trials/
│   │   └── [similar structure]
│   ├── 03_validation_4d_3trials/
│   │   └── [similar structure]
│   └── 04_full_optimization_50trials/
│       ├── README.md
│       ├── trial_000/
│       ├── ... trial_049/
│       ├── plots/
│       ├── history.json
│       ├── best_trial.json
│       ├── OPTIMIZATION_RESULTS.md  # Moved from study level
│       └── cleanup_log.json
│
├── 3_reports/                       # NEW: Study-level analysis
│   ├── COMPREHENSIVE_BENCHMARK_RESULTS.md
│   ├── COMPARISON_ALL_SUBSTUDIES.md # NEW: Compare substudies
│   └── final_recommendations.md     # NEW: Engineering insights
│
├── README.md                        # Study overview
├── study_metadata.json              # Updated with all substudies
├── beam_optimization_config.json    # Main configuration
└── run_optimization.py              # Study-specific runner
```

### Key Changes

1. **Numbered Directories**: Indicate workflow sequence (setup → substudies → reports)

2. **Numbered Substudies**: Chronological naming (01_, 02_, 03_) makes progression clear

3. **Moved Benchmarking**: From `substudies/` to `1_setup/` (it's pre-optimization)

4. **Substudy-Level Documentation**: Each substudy has:
   - `README.md` - What was tested, parameters, hypothesis
   - `OPTIMIZATION_RESULTS.md` - Results and analysis

5. **Centralized Reports**: All comparative analysis and final recommendations in `3_reports/`

6. **Updated Metadata**: `study_metadata.json` tracks all substudies with status

---

## Comparison: Current vs Proposed

| Aspect | Current | Proposed | Benefit |
|--------|---------|----------|---------|
| **Substudy naming** | Descriptive only | Numbered + descriptive | Chronological clarity |
| **Documentation** | Mixed levels | Clear hierarchy | Easier to find results |
| **Benchmarking** | In substudies/ | In 1_setup/ | Reflects true purpose |
| **Model location** | study root | 1_setup/model/ | Grouped with setup |
| **Reports** | Study root | 3_reports/ | Centralized analysis |
| **Substudy docs** | Minimal | README + results | Self-documenting |
| **Metadata** | Incomplete | All substudies tracked | Accurate status |

---

## Migration Guide

### Option 1: Reorganize Existing Study (Recommended)

**Steps**:
1. Create new directory structure
2. Move files to new locations
3. Update `study_metadata.json`
4. Update file references in documentation
5. Create missing substudy READMEs

**Commands**:
```bash
# Create new structure
mkdir -p studies/simple_beam_optimization/1_setup/model
mkdir -p studies/simple_beam_optimization/1_setup/benchmarking
mkdir -p studies/simple_beam_optimization/2_substudies
mkdir -p studies/simple_beam_optimization/3_reports

# Move model
mv studies/simple_beam_optimization/model/* studies/simple_beam_optimization/1_setup/model/

# Move benchmarking
mv studies/simple_beam_optimization/substudies/benchmarking/* studies/simple_beam_optimization/1_setup/benchmarking/

# Rename and move substudies
mv studies/simple_beam_optimization/substudies/initial_exploration studies/simple_beam_optimization/2_substudies/01_initial_exploration
mv studies/simple_beam_optimization/substudies/validation_3trials studies/simple_beam_optimization/2_substudies/02_validation_3d_3trials
mv studies/simple_beam_optimization/substudies/validation_4d_3trials studies/simple_beam_optimization/2_substudies/03_validation_4d_3trials
mv studies/simple_beam_optimization/substudies/full_optimization_50trials studies/simple_beam_optimization/2_substudies/04_full_optimization_50trials

# Move reports
mv studies/simple_beam_optimization/COMPREHENSIVE_BENCHMARK_RESULTS.md studies/simple_beam_optimization/3_reports/
mv studies/simple_beam_optimization/OPTIMIZATION_RESULTS_50TRIALS.md studies/simple_beam_optimization/2_substudies/04_full_optimization_50trials/

# Clean up
rm -rf studies/simple_beam_optimization/substudies/
rm -rf studies/simple_beam_optimization/model/
```

### Option 2: Apply to Future Studies Only

Keep existing study as-is, apply new organization to future studies.

**When to Use**:
- Current study is complete and well-understood
- Reorganization would break existing scripts/references
- Want to test new organization before migrating

---

## Best Practices

### Study-Level Files

**Required**:
- `README.md` - High-level overview, purpose, design variables, objectives
- `study_metadata.json` - Metadata, status, substudy registry
- `beam_optimization_config.json` - Main configuration (inheritable)
- `run_optimization.py` - Study-specific runner script

**Optional**:
- `CHANGELOG.md` - Track configuration changes across substudies
- `LESSONS_LEARNED.md` - Engineering insights, dead ends avoided

### Substudy-Level Files

**Required** (Generated by Runner):
- `trial_XXX/` - Trial directories with CAD/FEM files and results.json
- `history.json` - Full optimization history
- `best_trial.json` - Best trial metadata
- `optuna_study.pkl` - Optuna study object
- `config.json` - Substudy-specific configuration

**Required** (User-Created):
- `README.md` - Purpose, hypothesis, parameter choices

**Optional** (Auto-Generated):
- `plots/` - Visualization plots (if post_processing.generate_plots = true)
- `cleanup_log.json` - Model cleanup statistics (if post_processing.cleanup_models = true)

**Optional** (User-Created):
- `OPTIMIZATION_RESULTS.md` - Detailed analysis and interpretation

### Trial-Level Files

**Always Kept** (Small, Critical):
- `results.json` - Extracted objectives, constraints, design variables

**Kept for Top-N Trials** (Large, Useful):
- `Beam.prt` - CAD model
- `Beam_sim1.sim` - Simulation setup
- `beam_sim1-solution_1.op2` - FEA results (binary)
- `beam_sim1-solution_1.f06` - FEA results (text)

**Cleaned for Poor Trials** (Large, Less Useful):
- All `.prt`, `.sim`, `.fem`, `.op2`, `.f06` files deleted
- Only `results.json` preserved

---

## Naming Conventions

### Substudy Names

**Format**: `NN_descriptive_name`

**Examples**:
- `01_initial_exploration` - First exploration of design space
- `02_validation_3d_3trials` - Validate 3 design variables work
- `03_validation_4d_3trials` - Validate 4 design variables work
- `04_full_optimization_50trials` - Full optimization run
- `05_refined_search_30trials` - Refined search in promising region
- `06_sensitivity_analysis` - Parameter sensitivity study

**Guidelines**:
- Start with two-digit number (01, 02, ..., 99)
- Use underscores for spaces
- Be concise but descriptive
- Include trial count if relevant

### Study Names

**Format**: `descriptive_name` (no numbering)

**Examples**:
- `simple_beam_optimization` - Optimize simple beam
- `bracket_displacement_maximizing` - Maximize bracket displacement
- `engine_mount_fatigue` - Engine mount fatigue optimization

**Guidelines**:
- Use underscores for spaces
- Include part name and optimization goal
- Avoid dates (use substudy numbering for chronology)

---

## Metadata Format

### study_metadata.json

**Recommended Format**:
```json
{
  "study_name": "simple_beam_optimization",
  "description": "Minimize displacement and weight of beam with existing loadcases",
  "created": "2025-11-17T10:24:09.613688",
  "status": "active",
  "design_variables": ["beam_half_core_thickness", "beam_face_thickness", "holes_diameter", "hole_count"],
  "objectives": ["minimize_displacement", "minimize_stress", "minimize_mass"],
  "constraints": ["displacement_limit"],
  "substudies": [
    {
      "name": "01_initial_exploration",
      "created": "2025-11-17T10:30:00",
      "status": "completed",
      "trials": 10,
      "purpose": "Explore design space boundaries"
    },
    {
      "name": "02_validation_3d_3trials",
      "created": "2025-11-17T11:00:00",
      "status": "completed",
      "trials": 3,
      "purpose": "Validate 3D parameter updates (without hole_count)"
    },
    {
      "name": "03_validation_4d_3trials",
      "created": "2025-11-17T12:00:00",
      "status": "completed",
      "trials": 3,
      "purpose": "Validate 4D parameter updates (with hole_count)"
    },
    {
      "name": "04_full_optimization_50trials",
      "created": "2025-11-17T13:00:00",
      "status": "completed",
      "trials": 50,
      "purpose": "Full optimization with all 4 design variables"
    }
  ],
  "last_modified": "2025-11-17T15:30:00"
}
```

### Substudy README.md Template

```markdown
# [Substudy Name]

**Date**: YYYY-MM-DD
**Status**: [planned | running | completed | failed]
**Trials**: N

## Purpose

[Why this substudy was created, what hypothesis is being tested]

## Configuration Changes

[Compared to previous substudy or baseline config, what changed?]

- Design variable bounds: [if changed]
- Objective weights: [if changed]
- Sampler settings: [if changed]

## Expected Outcome

[What do you hope to learn or achieve?]

## Actual Results

[Fill in after completion]

- Best objective: X.XX
- Feasible designs: N / N_total
- Key findings: [summary]

## Next Steps

[What substudy should follow based on these results?]
```

---

## Workflow Integration

### Creating a New Substudy

**Steps**:
1. Determine substudy number (next in sequence)
2. Create substudy README.md with purpose and changes
3. Update configuration if needed
4. Run optimization:
   ```bash
   python run_optimization.py --substudy-name "05_refined_search_30trials"
   ```
5. After completion:
   - Review results
   - Update substudy README.md with findings
   - Create OPTIMIZATION_RESULTS.md if significant
   - Update study_metadata.json

### Comparing Substudies

**Create Comparison Report**:
```markdown
# Substudy Comparison

| Substudy | Trials | Best Obj | Feasible | Key Finding |
|----------|--------|----------|----------|-------------|
| 01_initial_exploration | 10 | 1250.3 | 0/10 | Design space too large |
| 02_validation_3d_3trials | 3 | 1180.5 | 0/3 | 3D updates work |
| 03_validation_4d_3trials | 3 | 1120.2 | 0/3 | hole_count updates work |
| 04_full_optimization_50trials | 50 | 842.6 | 0/50 | No feasible designs found |

**Conclusion**: Constraint appears infeasible. Recommend relaxing displacement limit.
```

---

## Benefits of Proposed Organization

### For Users

1. **Clarity**: Numbered substudies show chronological progression
2. **Self-Documenting**: Each substudy explains its purpose
3. **Easy Comparison**: All results in one place (3_reports/)
4. **Less Clutter**: Study root only has essential files

### For Developers

1. **Predictable Structure**: Scripts can rely on consistent paths
2. **Automated Discovery**: Easy to find all substudies programmatically
3. **Version Control**: Clear history through numbered substudies
4. **Scalability**: Works for 5 substudies or 50

### For Collaboration

1. **Onboarding**: New team members can understand study progression quickly
2. **Documentation**: Substudy READMEs explain decisions made
3. **Reproducibility**: Clear configuration history
4. **Communication**: Easy to reference specific substudies in discussions

---

## FAQ

### Q: Should I reorganize my existing study?

**A**: Only if:
- Study is still active (more substudies planned)
- Current organization is causing confusion
- You have time to update documentation references

Otherwise, apply to future studies only.

### Q: What if my substudy doesn't have a fixed trial count?

**A**: Use descriptive name instead:
- `05_refined_search_until_feasible`
- `06_sensitivity_sweep`
- `07_validation_run`

### Q: Can I delete old substudies?

**A**: Generally no. Keep for:
- Historical record
- Lessons learned
- Reproducibility

If disk space is critical:
- Use model cleanup to delete CAD/FEM files
- Archive old substudies to external storage
- Keep metadata and results.json files

### Q: Should benchmarking be a substudy?

**A**: No. Benchmarking validates the baseline model before optimization. It belongs in `1_setup/benchmarking/`.

### Q: How do I handle multi-stage optimizations?

**A**: Create separate substudies:
- `05_stage1_meet_constraint_20trials`
- `06_stage2_minimize_mass_30trials`

Document the relationship in substudy READMEs.

---

## Summary

**Current Organization**: Functional but has room for improvement
- ✅ Substudy isolation works well
- ⚠️ Documentation scattered across levels
- ⚠️ Chronology unclear from names alone

**Proposed Organization**: Clearer hierarchy and progression
- 📁 `1_setup/` - Pre-optimization (model, benchmarking)
- 📁 `2_substudies/` - Numbered optimization runs
- 📁 `3_reports/` - Comparative analysis

**Next Steps**:
1. Decide: Reorganize existing study or apply to future only
2. If reorganizing: Follow migration guide
3. Update `study_metadata.json` with all substudies
4. Create substudy README templates
5. Document lessons learned in study-level docs

**Bottom Line**: The proposed organization makes it easier to understand what was done, why it was done, and what was learned.