# Study Disk Optimization Module ## Atomizer Disk Space Management System **Version:** 1.0 **Created:** 2025-12-29 **Status:** PRODUCTION READY **Impact:** Reduced M1_Mirror from 194 GB → 114 GB (80 GB freed, 41% reduction) --- ## Executive Summary FEA optimization studies consume massive disk space due to per-trial file copying. This module provides: 1. **Local Cleanup** - Remove regenerable files from completed studies (50%+ savings) 2. **Remote Archival** - Archive to dalidou server (14TB available) 3. **On-Demand Restore** - Pull archived studies when needed ### Key Insight Each trial folder contains ~150 MB, but only **~70 MB is essential** (OP2 results + metadata). The rest are copies of master files that can be regenerated. --- ## Part 1: File Classification ### Essential Files (KEEP) | Extension | Purpose | Typical Size | |-----------|---------|--------------| | `.op2` | Nastran binary results | 68 MB | | `.json` | Parameters, results, metadata | <1 MB | | `.npz` | Pre-computed Zernike coefficients | <1 MB | | `.html` | Generated reports | <1 MB | | `.png` | Visualization images | <1 MB | | `.csv` | Exported data tables | <1 MB | ### Deletable Files (REGENERABLE) | Extension | Purpose | Why Deletable | |-----------|---------|---------------| | `.prt` | NX part files | Copy of master in `1_setup/` | | `.fem` | FEM mesh files | Copy of master | | `.sim` | Simulation files | Copy of master | | `.afm` | Assembly FEM | Regenerable | | `.dat` | Solver input deck | Regenerable from params | | `.f04` | Nastran output log | Diagnostic only | | `.f06` | Nastran printed output | Diagnostic only | | `.log` | Generic logs | Diagnostic only | | `.diag` | Diagnostic files | Diagnostic only | | `.txt` | Temp text files | Intermediate data | | `.exp` | Expression files | Regenerable | | `.bak` | Backup files | Not needed | ### Protected Folders (NEVER TOUCH) | Folder | Reason | |--------|--------| | `1_setup/` | Master model files (source of truth) | | `3_results/` | Final database, reports, best designs | | `best_design_archive/` | Archived optimal configurations | --- ## Part 2: Disk Usage Analysis ### M1_Mirror Project Baseline (Dec 2025) ``` Total: 194 GB across 28 studies, 2000+ trials By File Type: .op2 94 GB (48.5%) - Nastran results [ESSENTIAL] .prt 41 GB (21.4%) - NX parts [DELETABLE] .fem 22 GB (11.5%) - FEM mesh [DELETABLE] .dat 22 GB (11.3%) - Solver input [DELETABLE] .sim 9 GB (4.5%) - Simulation [DELETABLE] .afm 5 GB (2.5%) - Assembly FEM [DELETABLE] Other <1 GB (<1%) - Logs, configs [MIXED] By Folder: 2_iterations/ 168 GB (87%) - Per-trial data 3_results/ 22 GB (11%) - Final results 1_setup/ 4 GB (2%) - Master models ``` ### Per-Trial Breakdown (Typical V11+ Structure) ``` iter1/ assy_m1_assyfem1_sim1-solution_1.op2 68.15 MB [KEEP] M1_Blank.prt 29.94 MB [DELETE] assy_m1_assyfem1_sim1-solution_1.dat 15.86 MB [DELETE] M1_Blank_fem1.fem 14.07 MB [DELETE] ASSY_M1_assyfem1_sim1.sim 7.47 MB [DELETE] M1_Blank_fem1_i.prt 5.20 MB [DELETE] ASSY_M1_assyfem1.afm 4.13 MB [DELETE] M1_Vertical_Support_Skeleton_fem1.fem 3.76 MB [DELETE] ... (logs, temps) <1.00 MB [DELETE] _temp_part_properties.json 0.00 MB [KEEP] ------------------------------------------------------- TOTAL: 149.67 MB Essential only: 68.15 MB Savings: 54.5% ``` --- ## Part 3: Implementation ### Core Utility **Location:** `optimization_engine/utils/study_archiver.py` ```python from optimization_engine.utils.study_archiver import ( analyze_study, # Get disk usage analysis cleanup_study, # Remove deletable files archive_to_remote, # Archive to dalidou restore_from_remote, # Restore from dalidou list_remote_archives, # List server archives ) ``` ### Command Line Interface **Batch Script:** `tools/archive_study.bat` ```bash # Analyze disk usage archive_study.bat analyze studies\M1_Mirror archive_study.bat analyze studies\M1_Mirror\m1_mirror_V12 # Cleanup completed study (dry run by default) archive_study.bat cleanup studies\M1_Mirror\m1_mirror_V12 archive_study.bat cleanup studies\M1_Mirror\m1_mirror_V12 --execute # Archive to remote server archive_study.bat archive studies\M1_Mirror\m1_mirror_V12 --execute archive_study.bat archive studies\M1_Mirror\m1_mirror_V12 --execute --tailscale # List remote archives archive_study.bat list archive_study.bat list --tailscale # Restore from remote archive_study.bat restore m1_mirror_V12 archive_study.bat restore m1_mirror_V12 --tailscale ``` ### Python API ```python from pathlib import Path from optimization_engine.utils.study_archiver import ( analyze_study, cleanup_study, archive_to_remote, ) # Analyze study_path = Path("studies/M1_Mirror/m1_mirror_V12") analysis = analyze_study(study_path) print(f"Total: {analysis['total_size_bytes']/1e9:.2f} GB") print(f"Essential: {analysis['essential_size']/1e9:.2f} GB") print(f"Deletable: {analysis['deletable_size']/1e9:.2f} GB") # Cleanup (dry_run=False to execute) deleted, freed = cleanup_study(study_path, dry_run=False) print(f"Freed {freed/1e9:.2f} GB") # Archive to server success = archive_to_remote(study_path, use_tailscale=False, dry_run=False) ``` --- ## Part 4: Remote Server Configuration ### dalidou Server Specs | Property | Value | |----------|-------| | Hostname | dalidou | | Local IP | 192.168.86.50 | | Tailscale IP | 100.80.199.40 | | SSH User | papa | | Archive Path | /srv/storage/atomizer-archive/ | | Available Storage | 3.6 TB (SSD) + 12.7 TB (HDD) | ### First-Time Setup ```bash # 1. SSH into server and create archive directory ssh papa@192.168.86.50 mkdir -p /srv/storage/atomizer-archive # 2. Set up passwordless SSH (on Windows) ssh-keygen -t ed25519 # If you don't have a key ssh-copy-id papa@192.168.86.50 # 3. Test connection ssh papa@192.168.86.50 "echo 'Connection OK'" ``` ### Archive Structure on Server ``` /srv/storage/atomizer-archive/ ├── m1_mirror_V11_20251229.tar.gz # Compressed study archive ├── m1_mirror_V12_20251229.tar.gz ├── m1_mirror_flat_back_V3_20251229.tar.gz └── manifest.json # Index of all archives ``` --- ## Part 5: Recommended Workflows ### During Active Optimization **Keep all files** - You may need to: - Re-run specific failed trials - Debug mesh issues - Analyze intermediate results ### After Study Completion 1. **Generate final report** (STUDY_REPORT.md) 2. **Archive best design** to `3_results/best_design_archive/` 3. **Run cleanup:** ```bash archive_study.bat cleanup studies\M1_Mirror\m1_mirror_V12 --execute ``` 4. **Verify results still accessible:** - Database queries work - Best design files intact - OP2 files for Zernike extraction present ### For Long-Term Storage 1. **After cleanup**, archive to server: ```bash archive_study.bat archive studies\M1_Mirror\m1_mirror_V12 --execute ``` 2. **Optionally delete local** study folder 3. **Keep only** `3_results/best_design_archive/` locally if needed ### When Revisiting Old Study 1. **Check if archived:** ```bash archive_study.bat list ``` 2. **Restore:** ```bash archive_study.bat restore m1_mirror_V12 ``` 3. **If re-running trials needed**, master files in `1_setup/` allow full regeneration --- ## Part 6: Disk Space Targets ### Per-Project Guidelines | Stage | Expected Size | Notes | |-------|---------------|-------| | Active (full) | 100% | All files present | | Completed (cleaned) | ~50% | Deletables removed | | Archived (minimal) | ~3% | Best design only locally | ### M1_Mirror Specific | Stage | Size | Notes | |-------|------|-------| | Full | 194 GB | 28 studies, 2000+ trials | | After cleanup | 114 GB | OP2 + metadata only | | Minimal local | 5-10 GB | Best designs + database | | Server archive | ~50 GB | Compressed | --- ## Part 7: Safety Features ### Built-in Protections 1. **Dry run by default** - Must explicitly add `--execute` 2. **Master files untouched** - `1_setup/` is never modified 3. **Results preserved** - `3_results/` is never touched 4. **Essential files preserved** - OP2, JSON, NPZ always kept 5. **Archive verification** - rsync checks integrity ### What Cannot Be Recovered After Cleanup | File Type | Recovery Method | |-----------|-----------------| | `.prt` | Copy from `1_setup/` + update params | | `.fem` | Regenerate from `.prt` in NX | | `.sim` | Recreate simulation setup | | `.dat` | Regenerate from params.json + model | | `.f04/.f06` | Re-run solver (if needed) | **Note:** With `1_setup/` master files and `params.json`, ANY trial can be fully reconstructed. The only irreplaceable data is the OP2 results (which we keep). --- ## Part 8: Troubleshooting ### SSH Connection Failed ```bash # Test connectivity ping 192.168.86.50 # Test SSH ssh papa@192.168.86.50 "echo connected" # If on different network, use Tailscale ssh papa@100.80.199.40 "echo connected" ``` ### Archive Upload Slow Large studies (50+ GB) take time. Options: - Run overnight - Use wired LAN connection - Pre-cleanup to reduce size ### Out of Disk Space During Archive Archive is created locally first. Need ~1.5x study size free: - 20 GB study = ~30 GB temp space required ### Cleanup Removed Wrong Files If accidentally executed without dry run: - OP2 files preserved (can still extract results) - Master files in `1_setup/` intact - Regenerate other files by re-running trial --- ## Part 9: Integration with Atomizer ### Protocol Reference **Related Protocol:** `docs/protocols/operations/OP_07_DISK_OPTIMIZATION.md` ### Claude Commands When user says: - "analyze disk usage" → Run `analyze_study()` - "clean up study" → Run `cleanup_study()` with confirmation - "archive to server" → Run `archive_to_remote()` - "restore study" → Run `restore_from_remote()` ### Automatic Suggestions After optimization completion, suggest: ``` Optimization complete! The study is using X GB. Would you like me to clean up regenerable files to save Y GB? (This keeps all results but removes intermediate model copies) ``` --- ## Part 10: File Inventory ### Files Created | File | Purpose | |------|---------| | `optimization_engine/utils/study_archiver.py` | Core utility module | | `tools/archive_study.bat` | Windows batch script | | `docs/protocols/operations/OP_07_DISK_OPTIMIZATION.md` | Full protocol | | `.claude/skills/modules/study-disk-optimization.md` | This document | ### Dependencies - Python 3.8+ - rsync (for remote operations, usually pre-installed) - SSH client (for remote operations) - Tailscale (optional, for remote access outside LAN) --- ## Appendix A: Cleanup Results Log (Dec 2025) ### Initial Cleanup Run | Study | Before | After | Freed | Files Deleted | |-------|--------|-------|-------|---------------| | m1_mirror_cost_reduction_V11 | 32.24 GB | 15.94 GB | 16.30 GB | 3,403 | | m1_mirror_cost_reduction_flat_back_V3 | 52.50 GB | 26.87 GB | 25.63 GB | 5,084 | | m1_mirror_cost_reduction_flat_back_V6 | 33.71 GB | 16.64 GB | 17.08 GB | 3,391 | | m1_mirror_cost_reduction_V12 | 22.68 GB | 10.60 GB | 12.08 GB | 2,508 | | m1_mirror_cost_reduction_flat_back_V1 | 8.76 GB | 4.54 GB | 4.22 GB | 813 | | m1_mirror_cost_reduction_flat_back_V5 | 8.01 GB | 4.09 GB | 3.92 GB | 765 | | m1_mirror_cost_reduction | 3.58 GB | 3.08 GB | 0.50 GB | 267 | | **TOTAL** | **161.48 GB** | **81.76 GB** | **79.73 GB** | **16,231** | ### Project-Wide Summary ``` Before cleanup: 193.75 GB After cleanup: 114.03 GB Total freed: 79.72 GB (41% reduction) ``` --- ## Appendix B: Quick Reference Card ### Commands ```bash # Analyze archive_study.bat analyze # Cleanup (always dry-run first!) archive_study.bat cleanup # Dry run archive_study.bat cleanup --execute # Execute # Archive archive_study.bat archive --execute archive_study.bat archive --execute --tailscale # Remote archive_study.bat list archive_study.bat restore ``` ### Python ```python from optimization_engine.utils.study_archiver import * # Quick analysis analysis = analyze_study(Path("studies/M1_Mirror")) print(f"Deletable: {analysis['deletable_size']/1e9:.2f} GB") # Cleanup cleanup_study(Path("studies/M1_Mirror/m1_mirror_V12"), dry_run=False) ``` ### Server Access ```bash # Local ssh papa@192.168.86.50 # Remote (Tailscale) ssh papa@100.80.199.40 # Archive location /srv/storage/atomizer-archive/ ``` --- *This module enables efficient disk space management for large-scale FEA optimization studies.*