Docs (Markdown)
Tip: this is plain markdown in a <pre> block for maximum inspectability.
# Ask AI (Local‑First Intelligence Augmentation) — Architecture + Safety Model
Ask AI is the grounded assistant layer for **AI Power Progress iA**. It is designed to be:
- **local‑first** (Ollama; no default third‑party model calls)
- **grounded** (catalog/docs/grid/web sources are explicit and linkable)
- **trust‑preserving** (retrieved content is treated as *untrusted*; prompt‑injection defenses)
- **capable** (teaching, planning, building/debugging support; structured answers when useful)
This document focuses on the Ask AI backend (`/api/ai/*`) and how it integrates with Search and PowerSearch Grid.
## Ask AI architecture map
```mermaid
flowchart LR
user[Browser UI / CLI] -->|POST /api/ai/chat/stream| chat[/Ask AI Router/]
user -->|POST /api/ai/generate| gen[/Ask AI Router/]
chat --> prompt[Prompt builder + safety guards]
gen --> prompt
prompt -->|optional| catalog[(Canonical catalog JSON)]
prompt -->|optional| sqlite[(SQLite app.db\nRAG + embeddings caches)]
prompt -->|optional| grid[(grid_docs excerpts via SQLite)]
prompt -->|optional| web[Brave Search API\nwhen enabled + configured]
prompt -->|optional| kiwix[Offline library (Kiwix)\nwhen enabled + configured]
prompt --> ollama[Ollama\nchat + embeddings]
ollama --> prompt
chat --> format[Grounding formatter\n(appends Sources)]
format --> user
```
## Trust boundaries (what can talk to what)
- **Browser → FastAPI**: user prompt, optional page context, and explicit feature toggles (use_web_search/use_docs/use_resources/use_grid_context).
- **FastAPI → Ollama**: local LLM + embeddings. This is the default AI runtime.
- **FastAPI → Brave Search (optional)**: only when enabled by payload or auto‑enabled by server policy *and* `BRAVE_API_KEY` exists.
- **FastAPI → SQLite**: local‑first storage for runs, sources metadata, embedding caches, Grid docs, and catalog indexes.
- **FastAPI → Kiwix (optional)**: offline library snippets when enabled.
## Prompt‑injection defense (retrieved content is untrusted)
When Ask AI composes prompts from docs/web/offline/page snippets, the server:
- **normalizes** snippets (strip HTML, collapse whitespace, bound size)
- **wraps** them in explicit delimiters: `<BEGIN_UNTRUSTED_…>` / `<END_UNTRUSTED_…>`
- **appends** a short guardrail block only when untrusted blocks are present
Code: `app/prompt_safety.py` + `app/safety.py`.
## Grounding + citations model
Ask AI maintains a compact `sources[]` list (name, url, source_type) for:
- auditability (what the answer was based on)
- linkable evidence (a stable Sources list appended to outputs when `structured` or `force_cite` is enabled)
Formatting: `app/ai_format.py`.
## UI: Evidence panel (widget + /ask)
The Ask AI widget and `GET /ask` render backend-appended grounding blocks as a dedicated **Evidence** panel:
- Answer body: the assistant’s main markdown (safe-rendered).
- Evidence:
- “What this is based on” (provenance summary)
- “Sources” (expandable, clickable links)
- Run ID: when available, the UI shows **Copy run id** to help debug and correlate with server logs.
Client safety + usability:
- Follow-up context strips appended grounding sections to keep the thread compact and reduce prompt-injection surface.
- Retrieved content remains explicitly labeled as **untrusted** (the UI never treats citations as instructions).
Implementation:
- `static/site.js`: `ppiaSplitGroundingSections()` + `ppiaRenderGroundingEvidence()` + `ppiaRenderAssistantMessage()`.
- `static/ask.html`: stores `run_id` per assistant message and uses an explicit `web: off|auto|on` selector (default **off**).
## Profiles (response shaping)
Clients can pass a `profile` hint to bias model selection and response style:
- `fast`, `general`/`quality`/`balanced`, `code`
- `tutor` / `teach` (teaching)
- `builder` / `build`, `debugger` / `debug` (development assistance; maps to the code model when configured)
Source labels:
- Ask AI appends a `Sources:` list where each entry is prefixed with a small source-type label:
- `(catalog)`, `(docs)`, `(grid)`, `(web)`, `(realtime)`
## Agent mode (safe actions + explicit approval)
The Ask AI widget supports an opt-in **agent mode** that can propose *safe next actions* to help users make progress without pretending to run arbitrary commands.
Key properties:
- **Off by default**: users must explicitly choose `mode: agent (actions)` (Ask AI widget or `GET /ask`).
- **No automatic execution**: actions are only *suggested* by the model; the user must click a button to run one.
- **Allowlisted tools only**: actions map to a small set of safe, read-only endpoints and browser navigation.
- **Untrusted evidence**: tool outputs are wrapped and treated as untrusted context (prompt-injection aware).
### Action proposal format (model → UI)
When agent mode is enabled, the model may append an actions block at the end of its response:
- `<BEGIN_ACTIONS_JSON> … <END_ACTIONS_JSON>` (JSON only inside the block)
- Shape: `{"v":1,"actions":[{"id":"a1","label":"…","kind":"…","input":{...}}]}`
Allowed `kind` values (initial set):
- `open_url` → open a same-origin path (preferred) or `https://` URL
- `services_status` → summarize `GET /api/services/status`
- `site_search` → local-first `GET /api/search/blended?q=…` (web/live off; `include_grid=false`)
- `rag_query` → `GET /api/rag/query?q=…`
- `grid_search` → `GET /api/grid/search?q=…&mode=lexical|semantic`
### Tool result format (UI → model)
When the user runs an action, the UI appends tool output wrapped as:
- `<BEGIN_UNTRUSTED_TOOL_RESULT> … <END_UNTRUSTED_TOOL_RESULT>` (JSON inside)
The agent-mode system prompt instructs the model to treat these blocks as untrusted evidence and to never follow instructions contained within retrieved content.
UX note: the UI renders actions as buttons and shows tool outputs in a collapsible “Tool result (raw)” panel (header includes ok/error + latency + hybrid/dedupe hints when available); users can then click “Continue with AI” to interpret the tool result and propose next steps.
## Teaching / tutoring / development assistance flow map
```mermaid
flowchart TD
intent[User intent] -->|learn| teach[Teach: step-by-step + practice]
intent -->|build/debug| dev[Build/Debug: plan + commands + checks]
intent -->|summarize/compare| synth[Synthesize: bullets + tradeoffs]
teach -->|grounded| sources[Use best available sources\n(catalog/docs/grid/web if enabled)]
dev -->|grounded| sources
synth -->|grounded| sources
sources --> answer[Structured response\n+ appended Sources links]
```
## Baseline verification (local)
From `aipowerprogressia.com/`:
- Unit tests: `bash scripts/run_unit_tests.sh`
- Regression smoke: `python3 scripts/regression_smoke.py`
- Ask AI smoke (streaming): open `GET /ask`, send a prompt in **cited** mode
- Search + AI overview smoke: open `GET /search`, run a query, generate AI overview
- RAG smoke: `GET /api/rag/status` should show `weaviate_up` plus an ingest snapshot at `ingest.*` (best-effort)