# Search System (Keyword · Semantic · Docs · Grid · Web · Live)

This document describes how the on-site search experience works end-to-end, and where to tune relevance/UX.

## Surfaces

- Website page: `GET /search` → `static/search.html`
  - Tabs: **All**, **Keyword**, **Semantic**, **Docs**, **Grid**, **Web**, **Live**, **Work**, **Community**
    - **Docs** searches first-party, local-only corpora (repo docs + Hub artifacts + Daily Intel RAG markdown) via `GET /api/docs/search`.
      - Results are trust-first and openable via inspectable viewers: `/docs/{slug}?line=…`, `/hub/{ppia_*}?line=…`, `/intelligence/{day}`.
    - **Work** searches your signed-in Services Desk requests (local-first).
      - Default is **metadata-first** via `/api/saas/service-requests/mine/summary` (avoids loading freeform request text into a navigation surface).
      - Optional “deep search” is explicit opt-in and uses `/api/saas/service-requests/mine` (may include request details on the page).
      - Portal links are rendered as buttons to avoid leaking portal tokens into page-aware citations.
    - **Community** searches local-first creators + posts without mixing UGC into the canonical blended ranking.
    - **Grid** results include a “View excerpt” link (`/grid/doc/{doc_id}`) for trust-first, inspectable grounding.
  - Typing mode: **All** runs a cheap local-only blend while you type; press Enter (or click Search) for the full blended search (semantic/grid/web/live).
  - Explainability: **All** includes an optional Explain toggle (calls `/api/search/blended?explain=1`) and renders “why matched” summaries per card (tokens/fields + fusion sources).
  - Dedupe/diversity: **All** applies UI-first near-duplicate clustering + lightweight host caps for external sources (web/grid/live-only) and shows a “Show all / Hide duplicates” control when items are hidden.
  - “AI overview” renders an inline grounded summary (streaming) via `POST /api/ai/chat/stream`.
  - “Ask AI” opens the embedded widget and prefills an overview prompt.
  - After generating an overview, the page shows a dedicated **Sources** panel plus an in-card **follow-up thread** (streaming), and also provides quick follow-up pills that open Ask AI with grounded, result-aware prompts (explain simply, compare, next actions, etc.).
- Grid page: `GET /grid` → `static/grid.html` (PowerSearch Grid; opt-in edge compute)
- Command palette (Ctrl+K): provides quick keyword search + navigation (`static/site.js`)
  - Includes first-party docs hits (`GET /api/docs/search`) ahead of catalog hits when available.

## Backend endpoints

Primary endpoints used by `static/search.html`:

- Blended “All”: `GET /api/search/blended`
  - Optional: `explain=1` adds additive per-result explain blocks (`result.explain`) for UI/debugging (does not change ranking).
- Keyword: `GET /api/catalog`
- Semantic: `GET /api/catalog/semantic`
- Docs: `GET /api/docs/search` (local-only; safe roots only)
- Grid: `GET /api/grid/search`
- Work (default): `GET /api/saas/service-requests/mine/summary` (auth required; metadata-only)
- Work (deep, explicit opt-in): `GET /api/saas/service-requests/mine` (auth required; includes freeform request text)
- Community: `GET /api/social/search`
- Web (Brave; optional): `GET /api/web/search` (`source=web|news|auto`)
- Live feeds (local): `GET /api/feeds/items`
  - Includes `source=news` (RSS/Atom metadata-only) when `scripts/update_realtime_feeds.py` has been run.

Ask AI grounding over results:

- Widget + CLI: `POST /api/ai/chat/stream` (streaming, cited mode)
  - Appends structured grounding sections (`What this is based on`, `Sources`) and includes **clickable source links** when URLs are available.
  - Local repo-doc citations (`docs/*.md`) are openable via `/docs/{slug}?line=…` to keep citations inspectable (trust-first).
  - PPIA Hub artifact citations (`ppia_hub/ppia_*.md`) are openable via `/hub/{ppia_*}?line=…` (robots: noindex; inspectable excerpt + provenance).
  - Grid doc citations are openable via `/grid/doc/{doc_id}` (inspectable excerpt + provenance + “Open original”).

### Prompt-injection defense (sources are untrusted)

When Ask AI composes prompts from docs/web/offline snippets, the server:

- wraps snippets inside explicit `<BEGIN_UNTRUSTED_…>` / `<END_UNTRUSTED_…>` blocks
- appends a short “Safety (untrusted context)” guardrail section when any such blocks are present

Helpers live in `app/prompt_safety.py` and are used by `/api/ai/chat` and `/api/ai/chat/stream`.

## Request/response flow map

```mermaid
flowchart LR
  browser[Browser /search] -->|JS fetch| blended[/api/search/blended]
  browser --> keyword[/api/catalog]
  browser --> semantic[/api/catalog/semantic]
  browser --> docs[/api/docs/search]
  browser --> grid[/api/grid/search]
  browser --> web[/api/web/search]
  browser --> live[/api/feeds/items]

  blended --> catalog[(canonical catalog JSON)]
  blended --> sqlite[(SQLite app.db\nresource_embeddings + grid_doc_embeddings)]
  semantic --> sqlite
  grid --> sqlite
  web --> brave[Brave Search API\noptional]
  live --> feeds[(local feeds JSONL)]

  browser -->|Ask AI| ai[/api/ai/chat/stream]
  ai --> ollama[Ollama chat + embeddings\n(local-first; multi-endpoint failover)]
  ai --> brave
  ai --> sqlite
```

## Relevance pipeline map (current)

### Keyword (local-first)

1. `canonical_items_filtered()` loads the canonical catalog and applies facet filters.
2. `lexical_rank_items()` ranks by token matches with small typo tolerance.
3. Results render via `resourceCard()` with badges.

### Semantic (rerank local candidates)

1. Collect candidates from local keyword hits (bounded set).
2. Embed query and candidate texts via Ollama embeddings.
3. Cache vectors in SQLite (`resource_embeddings` table).
4. Rank by cosine similarity.

### Grid

1. Use SQLite FTS (`grid_docs_fts`) when available.
2. Optional semantic rerank (embeddings cached in SQLite `grid_doc_embeddings`).

### Blended “All”

1. Always include keyword-ranked local items.
2. Optionally include semantic rerank + grid results.
3. Optionally include web + live when explicitly enabled (or auto-enabled for time-sensitive queries).
   - Web can use either Brave Web or Brave News depending on query intent.
4. Fuse lists via reciprocal-rank-fusion (`rrf_fuse`) with source weights.
   - `rrf_fuse` dedupes **within** each source list (prevents PDF page fragments from accumulating fusion score).
5. For confident navigational queries, pin top local hits above fusion results.

## Storage/index/cache

- Canonical catalog: `data/canonical/canonical_catalog.json` (built by `scripts/build_canonical_catalog.py`)
- Link health: `data/canonical/link_health.json` (advisory; only **dead** links are suppressed by default)
- Search index snapshot: `data/canonical/search-index.json` (used for local AI context and some UI flows)
- Embeddings cache (SQLite): `${PPIA_DATA_DIR:-data}/app.db`
  - `resource_embeddings` (semantic cache for catalog items)
  - `grid_doc_embeddings` (semantic cache for grid docs)
- Web search disk cache (when configured): `${BRAVE_CACHE_DIR:-${PPIA_DATA_DIR:-data}/web_search_cache}`
- Live feeds: `${PPIA_DATA_DIR:-data}/feeds/*.jsonl`
  - RSS news: `${PPIA_DATA_DIR:-data}/feeds/news.jsonl`
  - Daily digest (built from feeds): `${PPIA_DATA_DIR:-data}/intelligence/daily/YYYY-MM-DD.json`

## Measurement

Current lightweight baseline:

- `python3 scripts/run_search_evals.py --in-process --limit 40`
  - Measures keyword + blended `recall@10`, `mrr@10`, and `p50_latency_s`
  - Also prints `keyword_extra:` / `blended_extra:` JSON with:
    - `no_results_rate`, `dup_rate_at_k`
    - blended local-first indicators (`trusted_local_top1_rate`, `trusted_local_top3_rate`)
    - blended stage p50 latency from the server (`p50_server_local_ms`, `p50_server_semantic_ms`, `p50_server_grid_ms`, `p50_server_total_ms`)
    - semantic embedding build signal (`avg_semantic_built_this_request`)

Recommended local dataset:

- `python3 scripts/run_search_evals.py --in-process --dataset evals/search_variants.jsonl --verbose`
  - Uses a small set of typo/spacing/navigational variants and prints per-case ranks.

Recommended next eval expansions:

- Add a small JSONL dataset of queries with expected top URLs/IDs (docs, datasets, “help”, comparisons, freshness).
- Expand coverage beyond the canonical catalog (grid/web/live) with explicit assertions and safe degraded-mode behavior.
- Add a “blend quality” dataset that checks local-first behavior for in-domain queries and controlled external augmentation.

## Tuning knobs (high leverage)

- `app/search_rank.py`:
  - lexical scoring weights, typo tolerance, URL canonicalization, fusion ordering
- `/api/search/blended`:
  - source weights and inclusion rules (time-sensitive, navigational pinning)
- `static/search.html` + `static/site.js`:
  - cancellation, debouncing, URL state, result rendering, “Ask AI over results” UX