Files
Atomizer/.claude/skills/modules/study-disk-optimization.md
Anto01 82f36689b7 feat: Pre-migration checkpoint - updated docs and utilities
Updates before optimization_engine migration:
- Updated migration plan to v2.1 with complete file inventory
- Added OP_07 disk optimization protocol
- Added SYS_16 self-aware turbo protocol
- Added study archiver and cleanup utilities
- Added ensemble surrogate module
- Updated NX solver and session manager
- Updated zernike HTML generator
- Added context engineering plan
- LAC session insights updates

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 10:22:45 -05:00

12 KiB

Study Disk Optimization Module

Atomizer Disk Space Management System

Version: 1.0 Created: 2025-12-29 Status: PRODUCTION READY Impact: Reduced M1_Mirror from 194 GB → 114 GB (80 GB freed, 41% reduction)


Executive Summary

FEA optimization studies consume massive disk space due to per-trial file copying. This module provides:

  1. Local Cleanup - Remove regenerable files from completed studies (50%+ savings)
  2. Remote Archival - Archive to dalidou server (14TB available)
  3. On-Demand Restore - Pull archived studies when needed

Key Insight

Each trial folder contains ~150 MB, but only ~70 MB is essential (OP2 results + metadata). The rest are copies of master files that can be regenerated.


Part 1: File Classification

Essential Files (KEEP)

Extension Purpose Typical Size
.op2 Nastran binary results 68 MB
.json Parameters, results, metadata <1 MB
.npz Pre-computed Zernike coefficients <1 MB
.html Generated reports <1 MB
.png Visualization images <1 MB
.csv Exported data tables <1 MB

Deletable Files (REGENERABLE)

Extension Purpose Why Deletable
.prt NX part files Copy of master in 1_setup/
.fem FEM mesh files Copy of master
.sim Simulation files Copy of master
.afm Assembly FEM Regenerable
.dat Solver input deck Regenerable from params
.f04 Nastran output log Diagnostic only
.f06 Nastran printed output Diagnostic only
.log Generic logs Diagnostic only
.diag Diagnostic files Diagnostic only
.txt Temp text files Intermediate data
.exp Expression files Regenerable
.bak Backup files Not needed

Protected Folders (NEVER TOUCH)

Folder Reason
1_setup/ Master model files (source of truth)
3_results/ Final database, reports, best designs
best_design_archive/ Archived optimal configurations

Part 2: Disk Usage Analysis

M1_Mirror Project Baseline (Dec 2025)

Total: 194 GB across 28 studies, 2000+ trials

By File Type:
  .op2    94 GB (48.5%) - Nastran results [ESSENTIAL]
  .prt    41 GB (21.4%) - NX parts [DELETABLE]
  .fem    22 GB (11.5%) - FEM mesh [DELETABLE]
  .dat    22 GB (11.3%) - Solver input [DELETABLE]
  .sim     9 GB (4.5%)  - Simulation [DELETABLE]
  .afm     5 GB (2.5%)  - Assembly FEM [DELETABLE]
  Other   <1 GB (<1%)   - Logs, configs [MIXED]

By Folder:
  2_iterations/    168 GB (87%) - Per-trial data
  3_results/        22 GB (11%) - Final results
  1_setup/           4 GB (2%)  - Master models

Per-Trial Breakdown (Typical V11+ Structure)

iter1/
  assy_m1_assyfem1_sim1-solution_1.op2    68.15 MB  [KEEP]
  M1_Blank.prt                            29.94 MB  [DELETE]
  assy_m1_assyfem1_sim1-solution_1.dat    15.86 MB  [DELETE]
  M1_Blank_fem1.fem                       14.07 MB  [DELETE]
  ASSY_M1_assyfem1_sim1.sim                7.47 MB  [DELETE]
  M1_Blank_fem1_i.prt                      5.20 MB  [DELETE]
  ASSY_M1_assyfem1.afm                     4.13 MB  [DELETE]
  M1_Vertical_Support_Skeleton_fem1.fem    3.76 MB  [DELETE]
  ... (logs, temps)                       <1.00 MB  [DELETE]
  _temp_part_properties.json               0.00 MB  [KEEP]
  -------------------------------------------------------
  TOTAL:                                 149.67 MB
  Essential only:                         68.15 MB
  Savings:                                54.5%

Part 3: Implementation

Core Utility

Location: optimization_engine/utils/study_archiver.py

from optimization_engine.utils.study_archiver import (
    analyze_study,        # Get disk usage analysis
    cleanup_study,        # Remove deletable files
    archive_to_remote,    # Archive to dalidou
    restore_from_remote,  # Restore from dalidou
    list_remote_archives, # List server archives
)

Command Line Interface

Batch Script: tools/archive_study.bat

# Analyze disk usage
archive_study.bat analyze studies\M1_Mirror
archive_study.bat analyze studies\M1_Mirror\m1_mirror_V12

# Cleanup completed study (dry run by default)
archive_study.bat cleanup studies\M1_Mirror\m1_mirror_V12
archive_study.bat cleanup studies\M1_Mirror\m1_mirror_V12 --execute

# Archive to remote server
archive_study.bat archive studies\M1_Mirror\m1_mirror_V12 --execute
archive_study.bat archive studies\M1_Mirror\m1_mirror_V12 --execute --tailscale

# List remote archives
archive_study.bat list
archive_study.bat list --tailscale

# Restore from remote
archive_study.bat restore m1_mirror_V12
archive_study.bat restore m1_mirror_V12 --tailscale

Python API

from pathlib import Path
from optimization_engine.utils.study_archiver import (
    analyze_study,
    cleanup_study,
    archive_to_remote,
)

# Analyze
study_path = Path("studies/M1_Mirror/m1_mirror_V12")
analysis = analyze_study(study_path)
print(f"Total: {analysis['total_size_bytes']/1e9:.2f} GB")
print(f"Essential: {analysis['essential_size']/1e9:.2f} GB")
print(f"Deletable: {analysis['deletable_size']/1e9:.2f} GB")

# Cleanup (dry_run=False to execute)
deleted, freed = cleanup_study(study_path, dry_run=False)
print(f"Freed {freed/1e9:.2f} GB")

# Archive to server
success = archive_to_remote(study_path, use_tailscale=False, dry_run=False)

Part 4: Remote Server Configuration

dalidou Server Specs

Property Value
Hostname dalidou
Local IP 192.168.86.50
Tailscale IP 100.80.199.40
SSH User papa
Archive Path /srv/storage/atomizer-archive/
Available Storage 3.6 TB (SSD) + 12.7 TB (HDD)

First-Time Setup

# 1. SSH into server and create archive directory
ssh papa@192.168.86.50
mkdir -p /srv/storage/atomizer-archive

# 2. Set up passwordless SSH (on Windows)
ssh-keygen -t ed25519  # If you don't have a key
ssh-copy-id papa@192.168.86.50

# 3. Test connection
ssh papa@192.168.86.50 "echo 'Connection OK'"

Archive Structure on Server

/srv/storage/atomizer-archive/
├── m1_mirror_V11_20251229.tar.gz    # Compressed study archive
├── m1_mirror_V12_20251229.tar.gz
├── m1_mirror_flat_back_V3_20251229.tar.gz
└── manifest.json                     # Index of all archives

During Active Optimization

Keep all files - You may need to:

  • Re-run specific failed trials
  • Debug mesh issues
  • Analyze intermediate results

After Study Completion

  1. Generate final report (STUDY_REPORT.md)
  2. Archive best design to 3_results/best_design_archive/
  3. Run cleanup:
    archive_study.bat cleanup studies\M1_Mirror\m1_mirror_V12 --execute
    
  4. Verify results still accessible:
    • Database queries work
    • Best design files intact
    • OP2 files for Zernike extraction present

For Long-Term Storage

  1. After cleanup, archive to server:
    archive_study.bat archive studies\M1_Mirror\m1_mirror_V12 --execute
    
  2. Optionally delete local study folder
  3. Keep only 3_results/best_design_archive/ locally if needed

When Revisiting Old Study

  1. Check if archived:
    archive_study.bat list
    
  2. Restore:
    archive_study.bat restore m1_mirror_V12
    
  3. If re-running trials needed, master files in 1_setup/ allow full regeneration

Part 6: Disk Space Targets

Per-Project Guidelines

Stage Expected Size Notes
Active (full) 100% All files present
Completed (cleaned) ~50% Deletables removed
Archived (minimal) ~3% Best design only locally

M1_Mirror Specific

Stage Size Notes
Full 194 GB 28 studies, 2000+ trials
After cleanup 114 GB OP2 + metadata only
Minimal local 5-10 GB Best designs + database
Server archive ~50 GB Compressed

Part 7: Safety Features

Built-in Protections

  1. Dry run by default - Must explicitly add --execute
  2. Master files untouched - 1_setup/ is never modified
  3. Results preserved - 3_results/ is never touched
  4. Essential files preserved - OP2, JSON, NPZ always kept
  5. Archive verification - rsync checks integrity

What Cannot Be Recovered After Cleanup

File Type Recovery Method
.prt Copy from 1_setup/ + update params
.fem Regenerate from .prt in NX
.sim Recreate simulation setup
.dat Regenerate from params.json + model
.f04/.f06 Re-run solver (if needed)

Note: With 1_setup/ master files and params.json, ANY trial can be fully reconstructed. The only irreplaceable data is the OP2 results (which we keep).


Part 8: Troubleshooting

SSH Connection Failed

# Test connectivity
ping 192.168.86.50

# Test SSH
ssh papa@192.168.86.50 "echo connected"

# If on different network, use Tailscale
ssh papa@100.80.199.40 "echo connected"

Archive Upload Slow

Large studies (50+ GB) take time. Options:

  • Run overnight
  • Use wired LAN connection
  • Pre-cleanup to reduce size

Out of Disk Space During Archive

Archive is created locally first. Need ~1.5x study size free:

  • 20 GB study = ~30 GB temp space required

Cleanup Removed Wrong Files

If accidentally executed without dry run:

  • OP2 files preserved (can still extract results)
  • Master files in 1_setup/ intact
  • Regenerate other files by re-running trial

Part 9: Integration with Atomizer

Protocol Reference

Related Protocol: docs/protocols/operations/OP_07_DISK_OPTIMIZATION.md

Claude Commands

When user says:

  • "analyze disk usage" → Run analyze_study()
  • "clean up study" → Run cleanup_study() with confirmation
  • "archive to server" → Run archive_to_remote()
  • "restore study" → Run restore_from_remote()

Automatic Suggestions

After optimization completion, suggest:

Optimization complete! The study is using X GB.
Would you like me to clean up regenerable files to save Y GB?
(This keeps all results but removes intermediate model copies)

Part 10: File Inventory

Files Created

File Purpose
optimization_engine/utils/study_archiver.py Core utility module
tools/archive_study.bat Windows batch script
docs/protocols/operations/OP_07_DISK_OPTIMIZATION.md Full protocol
.claude/skills/modules/study-disk-optimization.md This document

Dependencies

  • Python 3.8+
  • rsync (for remote operations, usually pre-installed)
  • SSH client (for remote operations)
  • Tailscale (optional, for remote access outside LAN)

Appendix A: Cleanup Results Log (Dec 2025)

Initial Cleanup Run

Study Before After Freed Files Deleted
m1_mirror_cost_reduction_V11 32.24 GB 15.94 GB 16.30 GB 3,403
m1_mirror_cost_reduction_flat_back_V3 52.50 GB 26.87 GB 25.63 GB 5,084
m1_mirror_cost_reduction_flat_back_V6 33.71 GB 16.64 GB 17.08 GB 3,391
m1_mirror_cost_reduction_V12 22.68 GB 10.60 GB 12.08 GB 2,508
m1_mirror_cost_reduction_flat_back_V1 8.76 GB 4.54 GB 4.22 GB 813
m1_mirror_cost_reduction_flat_back_V5 8.01 GB 4.09 GB 3.92 GB 765
m1_mirror_cost_reduction 3.58 GB 3.08 GB 0.50 GB 267
TOTAL 161.48 GB 81.76 GB 79.73 GB 16,231

Project-Wide Summary

Before cleanup: 193.75 GB
After cleanup:  114.03 GB
Total freed:     79.72 GB (41% reduction)

Appendix B: Quick Reference Card

Commands

# Analyze
archive_study.bat analyze <path>

# Cleanup (always dry-run first!)
archive_study.bat cleanup <study>           # Dry run
archive_study.bat cleanup <study> --execute # Execute

# Archive
archive_study.bat archive <study> --execute
archive_study.bat archive <study> --execute --tailscale

# Remote
archive_study.bat list
archive_study.bat restore <name>

Python

from optimization_engine.utils.study_archiver import *

# Quick analysis
analysis = analyze_study(Path("studies/M1_Mirror"))
print(f"Deletable: {analysis['deletable_size']/1e9:.2f} GB")

# Cleanup
cleanup_study(Path("studies/M1_Mirror/m1_mirror_V12"), dry_run=False)

Server Access

# Local
ssh papa@192.168.86.50

# Remote (Tailscale)
ssh papa@100.80.199.40

# Archive location
/srv/storage/atomizer-archive/

This module enables efficient disk space management for large-scale FEA optimization studies.