Atomizer/.claude/skills/modules/study-disk-optimization.md

# Study Disk Optimization Module

## Atomizer Disk Space Management System

**Version:** 1.0
**Created:** 2025-12-29
**Status:** PRODUCTION READY
**Impact:** Reduced M1_Mirror from 194 GB → 114 GB (80 GB freed, 41% reduction)

---

## Executive Summary

FEA optimization studies consume massive disk space due to per-trial file copying. This module provides:

1. **Local Cleanup** - Remove regenerable files from completed studies (50%+ savings)
2. **Remote Archival** - Archive to dalidou server (14TB available)
3. **On-Demand Restore** - Pull archived studies when needed

### Key Insight

Each trial folder contains ~150 MB, but only **~70 MB is essential** (OP2 results + metadata). The rest are copies of master files that can be regenerated.

---

## Part 1: File Classification

### Essential Files (KEEP)

| Extension | Purpose | Typical Size |
|-----------|---------|--------------|
| `.op2` | Nastran binary results | 68 MB |
| `.json` | Parameters, results, metadata | <1 MB |
| `.npz` | Pre-computed Zernike coefficients | <1 MB |
| `.html` | Generated reports | <1 MB |
| `.png` | Visualization images | <1 MB |
| `.csv` | Exported data tables | <1 MB |

### Deletable Files (REGENERABLE)

| Extension | Purpose | Why Deletable |
|-----------|---------|---------------|
| `.prt` | NX part files | Copy of master in `1_setup/` |
| `.fem` | FEM mesh files | Copy of master |
| `.sim` | Simulation files | Copy of master |
| `.afm` | Assembly FEM | Regenerable |
| `.dat` | Solver input deck | Regenerable from params |
| `.f04` | Nastran output log | Diagnostic only |
| `.f06` | Nastran printed output | Diagnostic only |
| `.log` | Generic logs | Diagnostic only |
| `.diag` | Diagnostic files | Diagnostic only |
| `.txt` | Temp text files | Intermediate data |
| `.exp` | Expression files | Regenerable |
| `.bak` | Backup files | Not needed |

### Protected Folders (NEVER TOUCH)

| Folder | Reason |
|--------|--------|
| `1_setup/` | Master model files (source of truth) |
| `3_results/` | Final database, reports, best designs |
| `best_design_archive/` | Archived optimal configurations |

---

## Part 2: Disk Usage Analysis

### M1_Mirror Project Baseline (Dec 2025)

```
Total: 194 GB across 28 studies, 2000+ trials

By File Type:
  .op2    94 GB (48.5%) - Nastran results [ESSENTIAL]
  .prt    41 GB (21.4%) - NX parts [DELETABLE]
  .fem    22 GB (11.5%) - FEM mesh [DELETABLE]
  .dat    22 GB (11.3%) - Solver input [DELETABLE]
  .sim     9 GB (4.5%)  - Simulation [DELETABLE]
  .afm     5 GB (2.5%)  - Assembly FEM [DELETABLE]
  Other   <1 GB (<1%)   - Logs, configs [MIXED]

By Folder:
  2_iterations/    168 GB (87%) - Per-trial data
  3_results/        22 GB (11%) - Final results
  1_setup/           4 GB (2%)  - Master models
```

### Per-Trial Breakdown (Typical V11+ Structure)

```
iter1/
  assy_m1_assyfem1_sim1-solution_1.op2    68.15 MB  [KEEP]
  M1_Blank.prt                            29.94 MB  [DELETE]
  assy_m1_assyfem1_sim1-solution_1.dat    15.86 MB  [DELETE]
  M1_Blank_fem1.fem                       14.07 MB  [DELETE]
  ASSY_M1_assyfem1_sim1.sim                7.47 MB  [DELETE]
  M1_Blank_fem1_i.prt                      5.20 MB  [DELETE]
  ASSY_M1_assyfem1.afm                     4.13 MB  [DELETE]
  M1_Vertical_Support_Skeleton_fem1.fem    3.76 MB  [DELETE]
  ... (logs, temps)                       <1.00 MB  [DELETE]
  _temp_part_properties.json               0.00 MB  [KEEP]
  -------------------------------------------------------
  TOTAL:                                 149.67 MB
  Essential only:                         68.15 MB
  Savings:                                54.5%
```

---

## Part 3: Implementation

### Core Utility

**Location:** `optimization_engine/utils/study_archiver.py`

```python
from optimization_engine.utils.study_archiver import (
    analyze_study,        # Get disk usage analysis
    cleanup_study,        # Remove deletable files
    archive_to_remote,    # Archive to dalidou
    restore_from_remote,  # Restore from dalidou
    list_remote_archives, # List server archives
)
```

### Command Line Interface

**Batch Script:** `tools/archive_study.bat`

```bash
# Analyze disk usage
archive_study.bat analyze studies\M1_Mirror
archive_study.bat analyze studies\M1_Mirror\m1_mirror_V12

# Cleanup completed study (dry run by default)
archive_study.bat cleanup studies\M1_Mirror\m1_mirror_V12
archive_study.bat cleanup studies\M1_Mirror\m1_mirror_V12 --execute

# Archive to remote server
archive_study.bat archive studies\M1_Mirror\m1_mirror_V12 --execute
archive_study.bat archive studies\M1_Mirror\m1_mirror_V12 --execute --tailscale

# List remote archives
archive_study.bat list
archive_study.bat list --tailscale

# Restore from remote
archive_study.bat restore m1_mirror_V12
archive_study.bat restore m1_mirror_V12 --tailscale
```

### Python API

```python
from pathlib import Path
from optimization_engine.utils.study_archiver import (
    analyze_study,
    cleanup_study,
    archive_to_remote,
)

# Analyze
study_path = Path("studies/M1_Mirror/m1_mirror_V12")
analysis = analyze_study(study_path)
print(f"Total: {analysis['total_size_bytes']/1e9:.2f} GB")
print(f"Essential: {analysis['essential_size']/1e9:.2f} GB")
print(f"Deletable: {analysis['deletable_size']/1e9:.2f} GB")

# Cleanup (dry_run=False to execute)
deleted, freed = cleanup_study(study_path, dry_run=False)
print(f"Freed {freed/1e9:.2f} GB")

# Archive to server
success = archive_to_remote(study_path, use_tailscale=False, dry_run=False)
```

---

## Part 4: Remote Server Configuration

### dalidou Server Specs

| Property | Value |
|----------|-------|
| Hostname | dalidou |
| Local IP | 192.168.86.50 |
| Tailscale IP | 100.80.199.40 |
| SSH User | papa |
| Archive Path | /srv/storage/atomizer-archive/ |
| Available Storage | 3.6 TB (SSD) + 12.7 TB (HDD) |

### First-Time Setup

```bash
# 1. SSH into server and create archive directory
ssh papa@192.168.86.50
mkdir -p /srv/storage/atomizer-archive

# 2. Set up passwordless SSH (on Windows)
ssh-keygen -t ed25519  # If you don't have a key
ssh-copy-id papa@192.168.86.50

# 3. Test connection
ssh papa@192.168.86.50 "echo 'Connection OK'"
```

### Archive Structure on Server

```
/srv/storage/atomizer-archive/
├── m1_mirror_V11_20251229.tar.gz    # Compressed study archive
├── m1_mirror_V12_20251229.tar.gz
├── m1_mirror_flat_back_V3_20251229.tar.gz
└── manifest.json                     # Index of all archives
```

---

## Part 5: Recommended Workflows

### During Active Optimization

**Keep all files** - You may need to:
- Re-run specific failed trials
- Debug mesh issues
- Analyze intermediate results

### After Study Completion

1. **Generate final report** (STUDY_REPORT.md)
2. **Archive best design** to `3_results/best_design_archive/`
3. **Run cleanup:**
   ```bash
   archive_study.bat cleanup studies\M1_Mirror\m1_mirror_V12 --execute
   ```
4. **Verify results still accessible:**
   - Database queries work
   - Best design files intact
   - OP2 files for Zernike extraction present

### For Long-Term Storage

1. **After cleanup**, archive to server:
   ```bash
   archive_study.bat archive studies\M1_Mirror\m1_mirror_V12 --execute
   ```
2. **Optionally delete local** study folder
3. **Keep only** `3_results/best_design_archive/` locally if needed

### When Revisiting Old Study

1. **Check if archived:**
   ```bash
   archive_study.bat list
   ```
2. **Restore:**
   ```bash
   archive_study.bat restore m1_mirror_V12
   ```
3. **If re-running trials needed**, master files in `1_setup/` allow full regeneration

---

## Part 6: Disk Space Targets

### Per-Project Guidelines

| Stage | Expected Size | Notes |
|-------|---------------|-------|
| Active (full) | 100% | All files present |
| Completed (cleaned) | ~50% | Deletables removed |
| Archived (minimal) | ~3% | Best design only locally |

### M1_Mirror Specific

| Stage | Size | Notes |
|-------|------|-------|
| Full | 194 GB | 28 studies, 2000+ trials |
| After cleanup | 114 GB | OP2 + metadata only |
| Minimal local | 5-10 GB | Best designs + database |
| Server archive | ~50 GB | Compressed |

---

## Part 7: Safety Features

### Built-in Protections

1. **Dry run by default** - Must explicitly add `--execute`
2. **Master files untouched** - `1_setup/` is never modified
3. **Results preserved** - `3_results/` is never touched
4. **Essential files preserved** - OP2, JSON, NPZ always kept
5. **Archive verification** - rsync checks integrity

### What Cannot Be Recovered After Cleanup

| File Type | Recovery Method |
|-----------|-----------------|
| `.prt` | Copy from `1_setup/` + update params |
| `.fem` | Regenerate from `.prt` in NX |
| `.sim` | Recreate simulation setup |
| `.dat` | Regenerate from params.json + model |
| `.f04/.f06` | Re-run solver (if needed) |

**Note:** With `1_setup/` master files and `params.json`, ANY trial can be fully reconstructed. The only irreplaceable data is the OP2 results (which we keep).

---

## Part 8: Troubleshooting

### SSH Connection Failed

```bash
# Test connectivity
ping 192.168.86.50

# Test SSH
ssh papa@192.168.86.50 "echo connected"

# If on different network, use Tailscale
ssh papa@100.80.199.40 "echo connected"
```

### Archive Upload Slow

Large studies (50+ GB) take time. Options:
- Run overnight
- Use wired LAN connection
- Pre-cleanup to reduce size

### Out of Disk Space During Archive

Archive is created locally first. Need ~1.5x study size free:
- 20 GB study = ~30 GB temp space required

### Cleanup Removed Wrong Files

If accidentally executed without dry run:
- OP2 files preserved (can still extract results)
- Master files in `1_setup/` intact
- Regenerate other files by re-running trial

---

## Part 9: Integration with Atomizer

### Protocol Reference

**Related Protocol:** `docs/protocols/operations/OP_07_DISK_OPTIMIZATION.md`

### Claude Commands

When user says:
- "analyze disk usage" → Run `analyze_study()`
- "clean up study" → Run `cleanup_study()` with confirmation
- "archive to server" → Run `archive_to_remote()`
- "restore study" → Run `restore_from_remote()`

### Automatic Suggestions

After optimization completion, suggest:
```
Optimization complete! The study is using X GB.
Would you like me to clean up regenerable files to save Y GB?
(This keeps all results but removes intermediate model copies)
```

---

## Part 10: File Inventory

### Files Created

| File | Purpose |
|------|---------|
| `optimization_engine/utils/study_archiver.py` | Core utility module |
| `tools/archive_study.bat` | Windows batch script |
| `docs/protocols/operations/OP_07_DISK_OPTIMIZATION.md` | Full protocol |
| `.claude/skills/modules/study-disk-optimization.md` | This document |

### Dependencies

- Python 3.8+
- rsync (for remote operations, usually pre-installed)
- SSH client (for remote operations)
- Tailscale (optional, for remote access outside LAN)

---

## Appendix A: Cleanup Results Log (Dec 2025)

### Initial Cleanup Run

| Study | Before | After | Freed | Files Deleted |
|-------|--------|-------|-------|---------------|
| m1_mirror_cost_reduction_V11 | 32.24 GB | 15.94 GB | 16.30 GB | 3,403 |
| m1_mirror_cost_reduction_flat_back_V3 | 52.50 GB | 26.87 GB | 25.63 GB | 5,084 |
| m1_mirror_cost_reduction_flat_back_V6 | 33.71 GB | 16.64 GB | 17.08 GB | 3,391 |
| m1_mirror_cost_reduction_V12 | 22.68 GB | 10.60 GB | 12.08 GB | 2,508 |
| m1_mirror_cost_reduction_flat_back_V1 | 8.76 GB | 4.54 GB | 4.22 GB | 813 |
| m1_mirror_cost_reduction_flat_back_V5 | 8.01 GB | 4.09 GB | 3.92 GB | 765 |
| m1_mirror_cost_reduction | 3.58 GB | 3.08 GB | 0.50 GB | 267 |
| **TOTAL** | **161.48 GB** | **81.76 GB** | **79.73 GB** | **16,231** |

### Project-Wide Summary

```
Before cleanup: 193.75 GB
After cleanup:  114.03 GB
Total freed:     79.72 GB (41% reduction)
```

---

## Appendix B: Quick Reference Card

### Commands

```bash
# Analyze
archive_study.bat analyze <path>

# Cleanup (always dry-run first!)
archive_study.bat cleanup <study>           # Dry run
archive_study.bat cleanup <study> --execute # Execute

# Archive
archive_study.bat archive <study> --execute
archive_study.bat archive <study> --execute --tailscale

# Remote
archive_study.bat list
archive_study.bat restore <name>
```

### Python

```python
from optimization_engine.utils.study_archiver import *

# Quick analysis
analysis = analyze_study(Path("studies/M1_Mirror"))
print(f"Deletable: {analysis['deletable_size']/1e9:.2f} GB")

# Cleanup
cleanup_study(Path("studies/M1_Mirror/m1_mirror_V12"), dry_run=False)
```

### Server Access

```bash
# Local
ssh papa@192.168.86.50

# Remote (Tailscale)
ssh papa@100.80.199.40

# Archive location
/srv/storage/atomizer-archive/
```

---

*This module enables efficient disk space management for large-scale FEA optimization studies.*