Files
ATOCore/src/atocore/config.py

180 lines
5.7 KiB
Python
Raw Normal View History

"""AtoCore configuration via environment variables."""
from pathlib import Path
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
env: str = "development"
debug: bool = False
log_level: str = "INFO"
data_dir: Path = Path("./data")
db_dir: Path | None = None
chroma_dir: Path | None = None
cache_dir: Path | None = None
tmp_dir: Path | None = None
vault_source_dir: Path = Path("./sources/vault")
drive_source_dir: Path = Path("./sources/drive")
source_vault_enabled: bool = True
source_drive_enabled: bool = True
log_dir: Path = Path("./logs")
backup_dir: Path = Path("./backups")
run_dir: Path = Path("./run")
project_registry_path: Path = Path("./config/project-registry.json")
feat(assets): binary asset store + artifact entity + wiki evidence (Issue F) Wires visual evidence into the knowledge graph. Images, PDFs, and CAD exports can now be uploaded, deduped by SHA-256, thumbnailed, linked to entities via EVIDENCED_BY, and rendered inline on wiki pages. Unblocks AKC uploading voice-session screenshots alongside extracted entities. - assets/ module: store_asset (hash dedup + MIME allowlist + 20 MB cap), get_asset_binary, get_thumbnail (Pillow, on-disk cache under .thumbnails/<size>/), list_orphan_assets, invalidate_asset - models/database.py: new `assets` table + indexes - engineering/service.py: `artifact` added to ENTITY_TYPES - api/routes.py: POST /assets (multipart), GET /assets/{id}, /assets/{id}/thumbnail, /assets/{id}/meta, /admin/assets/orphans, DELETE /assets/{id} (409 if still referenced), GET /entities/{id}/evidence (EVIDENCED_BY artifacts with asset meta) - main.py: all new paths aliased under /v1 - engineering/wiki.py: entity pages render EVIDENCED_BY → artifact as a "Visual evidence" thumbnail strip; artifact pages render the full image + caption + capture_context - deploy/dalidou/docker-compose.yml: bind-mount ${ATOCORE_ASSETS_DIR} - config.py: assets_dir + assets_max_upload_bytes settings - requirements.txt + pyproject.toml: python-multipart, Pillow>=10.0.0 - tests/test_assets.py: 16 tests (dedup, cap, thumbnail cache, orphan detection, invalidate gating, API upload/fetch, evidence, v1 aliases, wiki rendering) - DEV-LEDGER.md: session log + cleanup note + test_count 478 -> 494 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:46:52 -04:00
assets_dir: Path | None = None
assets_max_upload_bytes: int = 20 * 1024 * 1024 # 20 MB per upload
host: str = "127.0.0.1"
port: int = 8100
db_busy_timeout_ms: int = 5000
# Embedding
embedding_model: str = (
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)
# Chunking
chunk_max_size: int = 800
chunk_overlap: int = 100
chunk_min_size: int = 50
# Context
context_budget: int = 3000
context_top_k: int = 15
feat: tunable ranking, refresh status, chroma backup + admin endpoints Three small improvements that move the operational baseline forward without changing the existing trust model. 1. Tunable retrieval ranking weights - rank_project_match_boost, rank_query_token_step, rank_query_token_cap, rank_path_high_signal_boost, rank_path_low_signal_penalty are now Settings fields - all overridable via ATOCORE_* env vars - retriever no longer hard-codes 2.0 / 1.18 / 0.72 / 0.08 / 1.32 - lets ranking be tuned per environment as Wave 1 is exercised without code changes 2. /projects/{name}/refresh status - refresh_registered_project now returns an overall status field ("ingested", "partial", "nothing_to_ingest") plus roots_ingested and roots_skipped counters - ProjectRefreshResponse advertises the new fields so callers can rely on them - covers the case where every configured root is missing on disk 3. Chroma cold snapshot + admin backup endpoints - create_runtime_backup now accepts include_chroma and writes a cold directory copy of the chroma persistence path - new list_runtime_backups() and validate_backup() helpers - new endpoints: - POST /admin/backup create snapshot (optional chroma) - GET /admin/backup list snapshots - GET /admin/backup/{stamp}/validate structural validation - chroma snapshots are taken under exclusive_ingestion() so a refresh or ingest cannot race with the cold copy - backup metadata records what was actually included and how big Tests: - 8 new tests covering tunable weights, refresh status branches (ingested / partial / nothing_to_ingest), chroma snapshot, list, validate, and the API endpoints (including the lock-acquisition path) - existing fake refresh stubs in test_api_storage.py updated for the expanded ProjectRefreshResponse model - full suite: 105 passing (was 97) next-steps doc updated to reflect that the chroma snapshot + restore validation gap from current-state.md is now closed in code; only the operational retention policy remains.
2026-04-06 18:42:19 -04:00
# Retrieval ranking weights (tunable per environment).
# All multipliers default to the values used since Wave 1; tighten or
# loosen them via ATOCORE_* env vars without touching code.
rank_project_match_boost: float = 2.0
rank_query_token_step: float = 0.08
rank_query_token_cap: float = 1.32
rank_path_high_signal_boost: float = 1.18
rank_path_low_signal_penalty: float = 0.72
model_config = {"env_prefix": "ATOCORE_"}
@property
def db_path(self) -> Path:
legacy_path = self.resolved_data_dir / "atocore.db"
if self.db_dir is not None:
return self.resolved_db_dir / "atocore.db"
if legacy_path.exists():
return legacy_path
return self.resolved_db_dir / "atocore.db"
@property
def chroma_path(self) -> Path:
return self._resolve_path(self.chroma_dir or (self.resolved_data_dir / "chroma"))
@property
def cache_path(self) -> Path:
return self._resolve_path(self.cache_dir or (self.resolved_data_dir / "cache"))
@property
def tmp_path(self) -> Path:
return self._resolve_path(self.tmp_dir or (self.resolved_data_dir / "tmp"))
@property
def resolved_data_dir(self) -> Path:
return self._resolve_path(self.data_dir)
feat(assets): binary asset store + artifact entity + wiki evidence (Issue F) Wires visual evidence into the knowledge graph. Images, PDFs, and CAD exports can now be uploaded, deduped by SHA-256, thumbnailed, linked to entities via EVIDENCED_BY, and rendered inline on wiki pages. Unblocks AKC uploading voice-session screenshots alongside extracted entities. - assets/ module: store_asset (hash dedup + MIME allowlist + 20 MB cap), get_asset_binary, get_thumbnail (Pillow, on-disk cache under .thumbnails/<size>/), list_orphan_assets, invalidate_asset - models/database.py: new `assets` table + indexes - engineering/service.py: `artifact` added to ENTITY_TYPES - api/routes.py: POST /assets (multipart), GET /assets/{id}, /assets/{id}/thumbnail, /assets/{id}/meta, /admin/assets/orphans, DELETE /assets/{id} (409 if still referenced), GET /entities/{id}/evidence (EVIDENCED_BY artifacts with asset meta) - main.py: all new paths aliased under /v1 - engineering/wiki.py: entity pages render EVIDENCED_BY → artifact as a "Visual evidence" thumbnail strip; artifact pages render the full image + caption + capture_context - deploy/dalidou/docker-compose.yml: bind-mount ${ATOCORE_ASSETS_DIR} - config.py: assets_dir + assets_max_upload_bytes settings - requirements.txt + pyproject.toml: python-multipart, Pillow>=10.0.0 - tests/test_assets.py: 16 tests (dedup, cap, thumbnail cache, orphan detection, invalidate gating, API upload/fetch, evidence, v1 aliases, wiki rendering) - DEV-LEDGER.md: session log + cleanup note + test_count 478 -> 494 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:46:52 -04:00
@property
def resolved_assets_dir(self) -> Path:
return self._resolve_path(self.assets_dir or (self.resolved_data_dir / "assets"))
@property
def resolved_db_dir(self) -> Path:
return self._resolve_path(self.db_dir or (self.resolved_data_dir / "db"))
@property
def resolved_vault_source_dir(self) -> Path:
return self._resolve_path(self.vault_source_dir)
@property
def resolved_drive_source_dir(self) -> Path:
return self._resolve_path(self.drive_source_dir)
@property
def resolved_log_dir(self) -> Path:
return self._resolve_path(self.log_dir)
@property
def resolved_backup_dir(self) -> Path:
return self._resolve_path(self.backup_dir)
@property
def resolved_run_dir(self) -> Path:
if self.run_dir == Path("./run"):
return self._resolve_path(self.resolved_data_dir.parent / "run")
return self._resolve_path(self.run_dir)
@property
def resolved_project_registry_path(self) -> Path:
"""Path to the project registry JSON file.
If ``ATOCORE_PROJECT_REGISTRY_DIR`` env var is set, the registry
lives at ``<that dir>/project-registry.json``. Otherwise falls
back to the configured ``project_registry_path`` field.
This lets Docker deployments point at a mounted volume via env
var without the ephemeral in-image ``/app/config/`` getting
wiped on every rebuild.
"""
import os
registry_dir = os.environ.get("ATOCORE_PROJECT_REGISTRY_DIR", "").strip()
if registry_dir:
return Path(registry_dir) / "project-registry.json"
return self._resolve_path(self.project_registry_path)
@property
def machine_dirs(self) -> list[Path]:
return [
self.db_path.parent,
self.chroma_path,
self.cache_path,
self.tmp_path,
self.resolved_log_dir,
self.resolved_backup_dir,
self.resolved_run_dir,
2026-04-06 09:52:19 -04:00
self.resolved_project_registry_path.parent,
feat(assets): binary asset store + artifact entity + wiki evidence (Issue F) Wires visual evidence into the knowledge graph. Images, PDFs, and CAD exports can now be uploaded, deduped by SHA-256, thumbnailed, linked to entities via EVIDENCED_BY, and rendered inline on wiki pages. Unblocks AKC uploading voice-session screenshots alongside extracted entities. - assets/ module: store_asset (hash dedup + MIME allowlist + 20 MB cap), get_asset_binary, get_thumbnail (Pillow, on-disk cache under .thumbnails/<size>/), list_orphan_assets, invalidate_asset - models/database.py: new `assets` table + indexes - engineering/service.py: `artifact` added to ENTITY_TYPES - api/routes.py: POST /assets (multipart), GET /assets/{id}, /assets/{id}/thumbnail, /assets/{id}/meta, /admin/assets/orphans, DELETE /assets/{id} (409 if still referenced), GET /entities/{id}/evidence (EVIDENCED_BY artifacts with asset meta) - main.py: all new paths aliased under /v1 - engineering/wiki.py: entity pages render EVIDENCED_BY → artifact as a "Visual evidence" thumbnail strip; artifact pages render the full image + caption + capture_context - deploy/dalidou/docker-compose.yml: bind-mount ${ATOCORE_ASSETS_DIR} - config.py: assets_dir + assets_max_upload_bytes settings - requirements.txt + pyproject.toml: python-multipart, Pillow>=10.0.0 - tests/test_assets.py: 16 tests (dedup, cap, thumbnail cache, orphan detection, invalidate gating, API upload/fetch, evidence, v1 aliases, wiki rendering) - DEV-LEDGER.md: session log + cleanup note + test_count 478 -> 494 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:46:52 -04:00
self.resolved_assets_dir,
]
@property
def source_specs(self) -> list[dict[str, object]]:
return [
{
"name": "vault",
"enabled": self.source_vault_enabled,
"path": self.resolved_vault_source_dir,
"read_only": True,
},
{
"name": "drive",
"enabled": self.source_drive_enabled,
"path": self.resolved_drive_source_dir,
"read_only": True,
},
]
@property
def source_dirs(self) -> list[Path]:
return [spec["path"] for spec in self.source_specs if spec["enabled"]]
def _resolve_path(self, path: Path) -> Path:
return path.expanduser().resolve(strict=False)
settings = Settings()
def ensure_runtime_dirs() -> None:
"""Create writable runtime directories for machine state and logs.
Source directories are intentionally excluded because they are treated as
read-only ingestion inputs by convention.
"""
for directory in settings.machine_dirs:
directory.mkdir(parents=True, exist_ok=True)