Docs (Markdown)
Tip: this is plain markdown in a <pre> block for maximum inspectability.
# Ask AI (Local‑First Intelligence Augmentation) — Architecture + Safety Model
Ask AI is the grounded assistant layer for **AI Power Progress iA**. It is designed to be:
- **local‑first** (Local AI runtime; optional Swarm router; no default third‑party model calls)
- **grounded** (catalog/docs/grid/web sources are explicit and linkable)
- **trust‑preserving** (retrieved content is treated as *untrusted*; prompt‑injection defenses)
- **capable** (teaching, planning, building/debugging support; structured answers when useful)
This document focuses on the Ask AI backend (`/api/ai/*`) and how it integrates with Search and PowerSearch Grid.
## Ask AI architecture map
```mermaid
flowchart LR
user[Browser UI / CLI] -->|POST /api/ai/chat/stream| chat[/Ask AI Router/]
user -->|POST /api/ai/generate| gen[/Ask AI Router/]
chat --> prompt[Prompt builder + safety guards]
gen --> prompt
prompt -->|optional| catalog[(Canonical catalog JSON)]
prompt -->|optional| sqlite[(SQLite app.db\nRAG + embeddings caches)]
prompt -->|optional| grid[(grid_docs excerpts via SQLite)]
prompt -->|optional| web[Brave Search API\nwhen enabled + configured]
prompt -->|optional| kiwix[Offline library (Kiwix)\nwhen enabled + configured]
prompt --> localai[Local AI runtime\n(Ollama or Swarm router)]
localai --> prompt
chat --> format[Grounding formatter\n(appends Sources)]
format --> user
```
## Trust boundaries (what can talk to what)
- **Browser → FastAPI**: user prompt, optional page context, and explicit feature toggles (use_web_search/use_docs/use_resources/use_grid_context).
- **FastAPI → Local AI runtime**: local LLM + embeddings (Ollama by default; Swarm router optional).
- **FastAPI → Brave Search (optional)**: only when enabled by payload or auto‑enabled by server policy *and* `BRAVE_API_KEY` exists.
- **FastAPI → SQLite**: local‑first storage for runs, sources metadata, embedding caches, Grid docs, and catalog indexes.
- **FastAPI → Kiwix (optional)**: offline library snippets when enabled.
## Prompt‑injection defense (retrieved content is untrusted)
When Ask AI composes prompts from docs/web/offline/page snippets, the server:
- **normalizes** snippets (strip HTML, collapse whitespace, bound size)
- **wraps** them in explicit delimiters: `<BEGIN_UNTRUSTED_…>` / `<END_UNTRUSTED_…>`
- **appends** a short guardrail block only when untrusted blocks are present
Code: `app/prompt_safety.py` + `app/safety.py`.
## Grounding + citations model
Ask AI maintains a compact `sources[]` list (name, url, source_type) for:
- auditability (what the answer was based on)
- linkable evidence (a stable Sources list appended to outputs when `structured` or `force_cite` is enabled)
Formatting: `app/ai_format.py`.
### Repo docs + intelligence citations (openable)
When the system cites local RAG/doc sources:
- Repo docs (`docs/*.md`) are linked as openable URLs like `/docs/search?line=123` (HTML viewer + excerpt).
- PPIA Hub artifacts (`ppia_hub/ppia_*.md`) are linked as openable URLs like `/hub/ppia_catalog?line=6` (inspectable excerpt + provenance; robots: noindex).
- Daily Intelligence artifacts staged for RAG (`rag/daily_intel_YYYY-MM-DD.md`) link to `/intelligence/YYYY-MM-DD`.
- PowerSearch Grid docs (`grid_docs`) link to `/grid/doc/{doc_id}` (inspectable cached excerpt + provenance + “Open original”).
This keeps local-first grounding **inspectable** in the Evidence panel without requiring external providers.
## UI: Evidence panel (widget + /ask)
The Ask AI widget and `GET /ask` render backend-appended grounding blocks as a dedicated **Evidence** panel:
- Answer body: the assistant’s main markdown (safe-rendered).
- Evidence:
- “What this is based on” (provenance summary)
- “Sources” (expandable, clickable links)
- Run ID: when available, the UI shows **Copy run id** to help debug and correlate with server logs.
Client safety + usability:
- Follow-up context strips appended grounding sections to keep the thread compact and reduce prompt-injection surface.
- Retrieved content remains explicitly labeled as **untrusted** (the UI never treats citations as instructions).
Implementation:
- `static/site.js`: `ppiaSplitGroundingSections()` + `ppiaRenderGroundingEvidence()` + `ppiaRenderAssistantMessage()`.
- `static/ask.html`: stores `run_id` per assistant message and uses an explicit `web: off|auto|on` selector (default **off**).
## Degraded mode (when local AI is down)
Ask AI is designed to remain useful even when the local model backend (Ollama) is unavailable.
Behavior (trust-first):
- `POST /api/ai/chat` and `POST /api/ai/generate` return HTTP `200` with `ok:false` and a **helpful degraded response** (includes `run_id` + next-step checklist).
- Grounding UI still works: the backend appends “What this is based on” + “Sources” via `app/ai_format.py`.
- For **news-like prompts** (e.g., “latest tech news”), degraded responses also include a short **Daily Intelligence** preview sourced from local artifacts (`${PPIA_DATA_DIR:-data}/intelligence/latest.json`) plus an “Open the full digest” link.
This avoids hallucinated “news” and keeps the system local-first and inspectable even during AI outages.
## Profiles (response shaping)
Clients can pass a `profile` hint to bias model selection and response style:
- `fast`, `general`/`quality`/`balanced`, `code`
- `tutor` / `teach` (teaching)
- `builder` / `build`, `debugger` / `debug` (development assistance; maps to the code model when configured)
Source labels:
- Ask AI appends a `Sources:` list where each entry is prefixed with a small source-type label:
- `(catalog)`, `(docs)`, `(grid)`, `(web)`, `(realtime)`
## Agent mode (safe actions + explicit approval)
The Ask AI widget supports an opt-in **agent mode** that can propose *safe next actions* to help users make progress without pretending to run arbitrary commands.
Key properties:
- **Off by default**: users must explicitly choose `mode: agent (actions)` (Ask AI widget or `GET /ask`).
- **No automatic execution**: actions are only *suggested* by the model; the user must click a button to run one.
- **Allowlisted tools only**: actions map to a small set of safe, read-only endpoints and browser navigation.
- **Untrusted evidence**: tool outputs are wrapped and treated as untrusted context (prompt-injection aware).
### Action proposal format (model → UI)
When agent mode is enabled, the model may append an actions block at the end of its response:
- `<BEGIN_ACTIONS_JSON> … <END_ACTIONS_JSON>` (JSON only inside the block)
- Shape: `{"v":1,"actions":[{"id":"a1","label":"…","kind":"…","input":{...}}]}`
Allowed `kind` values (allowlisted in the UI; user-click to execute):
- `open_url` → open a same-origin path (preferred) or `https://` URL
- `services_status` → summarize `GET /api/services/status`
- `storage_status` → fetch `GET /api/storage/status_public` (loopback/admin only)
- `storage_cleanup_plan` → fetch `GET /api/storage/cleanup_plan_public` (loopback/admin only; dry-run plan)
- `storage_upload_orphans` → fetch `GET /api/storage/upload_orphans_public` (loopback/admin only; dry-run scan)
- `storage_backup_db` → run `POST /api/storage/backup_db_public` (loopback/admin only; safe snapshot)
- `intel_digest` → fetch `GET /api/intelligence/daily` (optionally by date)
- `work_summary` → fetch `GET /api/saas/service-requests/mine/summary` (auth required; metadata-only)
- `work_search` → fetch `GET /api/saas/service-requests/mine/search?q=…` (auth required; metadata-only; optional deep mode)
- `unified_search` → fetch `GET /api/unified/search?q=…` (permission-aware: Work when signed in + Community always)
- `site_search` → local-first `GET /api/search/blended?q=…` (web/live off; `include_grid=false`)
- `rag_query` → `GET /api/rag/query?q=…`
- `grid_search` → `GET /api/grid/search?q=…&mode=lexical|semantic`
### Tool result format (UI → model)
When the user runs an action, the UI appends tool output wrapped as:
- `<BEGIN_UNTRUSTED_TOOL_RESULT> … <END_UNTRUSTED_TOOL_RESULT>` (JSON inside)
The agent-mode system prompt instructs the model to treat these blocks as untrusted evidence and to never follow instructions contained within retrieved content.
## Tool follow-up routing (v1.1)
When users click “Continue with AI” after running a tool:
- The backend detects explicit tool-followup prompts (e.g., “Using the tool result above…”) and **bypasses** low-signal docs/resources/web retrieval that would otherwise run against that generic text.
- Ask AI derives grounded **Sources** from the tool result payload (RAG hits, site search results, Grid results, intelligence previews) so the Evidence panel stays trustworthy after tool use.
- Sensitive internal URLs are filtered (e.g., work-portal tokens under `/w/*` never appear in Sources).
UX note: the UI renders actions as buttons and shows tool outputs as:
- a compact **Preview** panel (openable links for common tools like `rag_query`, `grid_search`, `site_search`, `intel_digest`)
- plus a collapsible “Tool result (raw)” JSON panel (header includes ok/error + latency + hybrid/dedupe hints when available)
Users can then click “Continue with AI” to interpret the tool result and propose next steps.
## Teaching / tutoring / development assistance flow map
```mermaid
flowchart TD
intent[User intent] -->|learn| teach[Teach: step-by-step + practice]
intent -->|build/debug| dev[Build/Debug: plan + commands + checks]
intent -->|summarize/compare| synth[Synthesize: bullets + tradeoffs]
teach -->|grounded| sources[Use best available sources\n(catalog/docs/grid/web if enabled)]
dev -->|grounded| sources
synth -->|grounded| sources
sources --> answer[Structured response\n+ appended Sources links]
```
## Baseline verification (local)
From `aipowerprogressia.com/`:
- Unit tests: `bash scripts/run_unit_tests.sh`
- Regression smoke: `python3 scripts/regression_smoke.py`
- Ask AI smoke (streaming): open `GET /ask`, send a prompt in **cited** mode
- Search + AI overview smoke: open `GET /search`, run a query, generate AI overview
- RAG smoke: `GET /api/rag/status` should show `weaviate_up` plus an ingest snapshot at `ingest.*` (best-effort)