Hybrid stack: local Ollama for day-to-day work, cloud model for evaluation and occasional high-quality runs. The goal is to keep costs low while steadily improving local behavior.
All AI calls go through src/lib/ai.ts, which picks between primary cloud and local Ollama using modes (default, cheap, local, highQuality).
Offline scripts call evaluateOllamaAnswer to have the cloud model score Ollama answers. Outputs are JSON so you can see strengths and gaps.
Logs feed into prompts, examples, or finetuning datasets for local models. Over time, local quality approaches cloud while remaining cheap and private.
Finetuning happens outside this app with specialized tools. This site focuses on routing, evaluation, and safe usage. Set FORCE_OLLAMA_ONLY=true to keep everything local when needed.