feat: nightly batch extraction in cron-backup.sh (Day 2)

Step 4 added to the daily cron: POST /admin/extract-batch with
mode=llm, persist=true, limit=50. Runs after backup + cleanup +
rsync. Fail-open: extraction failure never blocks the backup.

Gated on ATOCORE_EXTRACT_BATCH=true (defaults to true). The
endpoint uses the last_extract_batch_run timestamp from project
state to auto-resume, so the cron doesn't need to track state.

curl --max-time 600 gives the LLM extractor up to 10 minutes
for the batch (50 interactions × ~20s each worst case = ~17 min,
but most will be no-ops if already extracted).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-12 10:51:13 -04:00
parent bcb7675a0d
commit c67bec095c

View File

@@ -82,4 +82,26 @@ else
log "Step 3: ATOCORE_BACKUP_RSYNC not set, skipping off-host copy" log "Step 3: ATOCORE_BACKUP_RSYNC not set, skipping off-host copy"
fi fi
# Step 4: Batch LLM extraction on recent interactions (optional).
# Runs the LLM extractor (claude -p sonnet) against interactions
# captured since the last batch run. Candidates land as
# status=candidate for human or auto-triage review.
# Fail-open: extraction failure never blocks backup.
# The endpoint tracks its own last-run timestamp in project state.
EXTRACT="${ATOCORE_EXTRACT_BATCH:-true}"
if [[ "$EXTRACT" == "true" ]]; then
log "Step 4: running batch LLM extraction"
EXTRACT_RESULT=$(curl -sf -X POST \
-H "Content-Type: application/json" \
-d '{"mode": "llm", "persist": true, "limit": 50}' \
--max-time 600 \
"$ATOCORE_URL/admin/extract-batch" 2>&1) && {
log "Extraction result: $EXTRACT_RESULT"
} || {
log "WARN: batch extraction failed (this is non-blocking): $EXTRACT_RESULT"
}
else
log "Step 4: ATOCORE_EXTRACT_BATCH not set to true, skipping extraction"
fi
log "=== AtoCore daily backup complete ===" log "=== AtoCore daily backup complete ==="