fix: critical bugs and hardening from validation audit

- Fix infinite loop in chunker _hard_split when overlap >= max_size
- Fix tag filter false positives by quoting tag values in ChromaDB query
- Fix score boost semantics (additive → multiplicative) to stay within 0-1 range
- Add error handling and type hints to all API routes
- Update README with proper project documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-05 09:35:37 -04:00
parent b4afbbb53a
commit 6081462058
5 changed files with 117 additions and 25 deletions

View File

@@ -137,6 +137,10 @@ def _split_by_paragraphs(
def _hard_split(text: str, max_size: int, overlap: int) -> list[str]:
"""Hard split text at max_size with overlap."""
# Prevent infinite loop: overlap must be less than max_size
if overlap >= max_size:
overlap = max_size // 4
chunks = []
start = 0
while start < len(text):