Docs (Markdown)
Tip: this is plain markdown in a <pre> block for maximum inspectability.
# Search System (Keyword · Semantic · Docs · Grid · Web · Live)
This document describes how the on-site search experience works end-to-end, and where to tune relevance/UX.
## Surfaces
- Website page: `GET /search` → `static/search.html`
- Tabs: **All**, **Keyword**, **Semantic**, **Docs**, **Grid**, **Web**, **Live**, **Work**, **Community**
- **Docs** searches first-party, local-only corpora (repo docs + Hub artifacts + Daily Intel RAG markdown) via `GET /api/docs/search`.
- Results are trust-first and openable via inspectable viewers: `/docs/{slug}?line=…`, `/hub/{ppia_*}?line=…`, `/intelligence/{day}`.
- **Work** searches your signed-in Services Desk requests (local-first).
- Default is **metadata-first** via `/api/saas/service-requests/mine/summary` (avoids loading freeform request text into a navigation surface).
- Optional “deep search” is explicit opt-in and uses `/api/saas/service-requests/mine` (may include request details on the page).
- Portal links are rendered as buttons to avoid leaking portal tokens into page-aware citations.
- **Community** searches local-first creators + posts without mixing UGC into the canonical blended ranking.
- **Grid** results include a “View excerpt” link (`/grid/doc/{doc_id}`) for trust-first, inspectable grounding.
- Typing mode: **All** runs a cheap local-only blend while you type; press Enter (or click Search) for the full blended search (semantic/grid/web/live).
- Explainability: **All** includes an optional Explain toggle (calls `/api/search/blended?explain=1`) and renders “why matched” summaries per card (tokens/fields + fusion sources).
- Dedupe/diversity: **All** applies UI-first near-duplicate clustering + lightweight host caps for external sources (web/grid/live-only) and shows a “Show all / Hide duplicates” control when items are hidden.
- “AI overview” renders an inline grounded summary (streaming) via `POST /api/ai/chat/stream`.
- “Ask AI” opens the embedded widget and prefills an overview prompt.
- After generating an overview, the page shows a dedicated **Sources** panel plus an in-card **follow-up thread** (streaming), and also provides quick follow-up pills that open Ask AI with grounded, result-aware prompts (explain simply, compare, next actions, etc.).
- Grid page: `GET /grid` → `static/grid.html` (PowerSearch Grid; opt-in edge compute)
- Command palette (Ctrl+K): provides quick keyword search + navigation (`static/site.js`)
- Includes first-party docs hits (`GET /api/docs/search`) ahead of catalog hits when available.
## Backend endpoints
Primary endpoints used by `static/search.html`:
- Blended “All”: `GET /api/search/blended`
- Optional: `explain=1` adds additive per-result explain blocks (`result.explain`) for UI/debugging (does not change ranking).
- Keyword: `GET /api/catalog`
- Semantic: `GET /api/catalog/semantic`
- Docs: `GET /api/docs/search` (local-only; safe roots only)
- Grid: `GET /api/grid/search`
- Work (default): `GET /api/saas/service-requests/mine/summary` (auth required; metadata-only)
- Work (deep, explicit opt-in): `GET /api/saas/service-requests/mine` (auth required; includes freeform request text)
- Community: `GET /api/social/search`
- Web (Brave; optional): `GET /api/web/search` (`source=web|news|auto`)
- Live feeds (local): `GET /api/feeds/items`
- Includes `source=news` (RSS/Atom metadata-only) when `scripts/update_realtime_feeds.py` has been run.
Ask AI grounding over results:
- Widget + CLI: `POST /api/ai/chat/stream` (streaming, cited mode)
- Appends structured grounding sections (`What this is based on`, `Sources`) and includes **clickable source links** when URLs are available.
- Local repo-doc citations (`docs/*.md`) are openable via `/docs/{slug}?line=…` to keep citations inspectable (trust-first).
- PPIA Hub artifact citations (`ppia_hub/ppia_*.md`) are openable via `/hub/{ppia_*}?line=…` (robots: noindex; inspectable excerpt + provenance).
- Grid doc citations are openable via `/grid/doc/{doc_id}` (inspectable excerpt + provenance + “Open original”).
### Prompt-injection defense (sources are untrusted)
When Ask AI composes prompts from docs/web/offline snippets, the server:
- wraps snippets inside explicit `<BEGIN_UNTRUSTED_…>` / `<END_UNTRUSTED_…>` blocks
- appends a short “Safety (untrusted context)” guardrail section when any such blocks are present
Helpers live in `app/prompt_safety.py` and are used by `/api/ai/chat` and `/api/ai/chat/stream`.
## Request/response flow map
```mermaid
flowchart LR
browser[Browser /search] -->|JS fetch| blended[/api/search/blended]
browser --> keyword[/api/catalog]
browser --> semantic[/api/catalog/semantic]
browser --> docs[/api/docs/search]
browser --> grid[/api/grid/search]
browser --> web[/api/web/search]
browser --> live[/api/feeds/items]
blended --> catalog[(canonical catalog JSON)]
blended --> sqlite[(SQLite app.db\nresource_embeddings + grid_doc_embeddings)]
semantic --> sqlite
grid --> sqlite
web --> brave[Brave Search API\noptional]
live --> feeds[(local feeds JSONL)]
browser -->|Ask AI| ai[/api/ai/chat/stream]
ai --> ollama[Ollama chat + embeddings\n(local-first; multi-endpoint failover)]
ai --> brave
ai --> sqlite
```
## Relevance pipeline map (current)
### Keyword (local-first)
1. `canonical_items_filtered()` loads the canonical catalog and applies facet filters.
2. `lexical_rank_items()` ranks by token matches with small typo tolerance.
3. Results render via `resourceCard()` with badges.
### Semantic (rerank local candidates)
1. Collect candidates from local keyword hits (bounded set).
2. Embed query and candidate texts via Ollama embeddings.
3. Cache vectors in SQLite (`resource_embeddings` table).
4. Rank by cosine similarity.
### Grid
1. Use SQLite FTS (`grid_docs_fts`) when available.
2. Optional semantic rerank (embeddings cached in SQLite `grid_doc_embeddings`).
### Blended “All”
1. Always include keyword-ranked local items.
2. Optionally include semantic rerank + grid results.
3. Optionally include web + live when explicitly enabled (or auto-enabled for time-sensitive queries).
- Web can use either Brave Web or Brave News depending on query intent.
4. Fuse lists via reciprocal-rank-fusion (`rrf_fuse`) with source weights.
- `rrf_fuse` dedupes **within** each source list (prevents PDF page fragments from accumulating fusion score).
5. For confident navigational queries, pin top local hits above fusion results.
## Storage/index/cache
- Canonical catalog: `data/canonical/canonical_catalog.json` (built by `scripts/build_canonical_catalog.py`)
- Link health: `data/canonical/link_health.json` (advisory; only **dead** links are suppressed by default)
- Search index snapshot: `data/canonical/search-index.json` (used for local AI context and some UI flows)
- Embeddings cache (SQLite): `${PPIA_DATA_DIR:-data}/app.db`
- `resource_embeddings` (semantic cache for catalog items)
- `grid_doc_embeddings` (semantic cache for grid docs)
- Web search disk cache (when configured): `${BRAVE_CACHE_DIR:-${PPIA_DATA_DIR:-data}/web_search_cache}`
- Live feeds: `${PPIA_DATA_DIR:-data}/feeds/*.jsonl`
- RSS news: `${PPIA_DATA_DIR:-data}/feeds/news.jsonl`
- Daily digest (built from feeds): `${PPIA_DATA_DIR:-data}/intelligence/daily/YYYY-MM-DD.json`
## Measurement
Current lightweight baseline:
- `python3 scripts/run_search_evals.py --in-process --limit 40`
- Measures keyword + blended `recall@10`, `mrr@10`, and `p50_latency_s`
- Also prints `keyword_extra:` / `blended_extra:` JSON with:
- `no_results_rate`, `dup_rate_at_k`
- blended local-first indicators (`trusted_local_top1_rate`, `trusted_local_top3_rate`)
- blended stage p50 latency from the server (`p50_server_local_ms`, `p50_server_semantic_ms`, `p50_server_grid_ms`, `p50_server_total_ms`)
- semantic embedding build signal (`avg_semantic_built_this_request`)
Recommended local dataset:
- `python3 scripts/run_search_evals.py --in-process --dataset evals/search_variants.jsonl --verbose`
- Uses a small set of typo/spacing/navigational variants and prints per-case ranks.
Recommended next eval expansions:
- Add a small JSONL dataset of queries with expected top URLs/IDs (docs, datasets, “help”, comparisons, freshness).
- Expand coverage beyond the canonical catalog (grid/web/live) with explicit assertions and safe degraded-mode behavior.
- Add a “blend quality” dataset that checks local-first behavior for in-domain queries and controlled external augmentation.
## Tuning knobs (high leverage)
- `app/search_rank.py`:
- lexical scoring weights, typo tolerance, URL canonicalization, fusion ordering
- `/api/search/blended`:
- source weights and inclusion rules (time-sensitive, navigational pinning)
- `static/search.html` + `static/site.js`:
- cancellation, debouncing, URL state, result rendering, “Ask AI over results” UX