AI 应用¶

Reference · 速查

本章组织

本章按 3 层（按 LLM 应用栈位置）：

层 1 · 应用模式：RAG / MCP / Agent Patterns / Structured Output / Agents on Lakehouse —— 写什么样的 AI 应用
层 2 · 应用-Runtime 桥：LLM Inference / LLM Gateway / Semantic Cache —— LLM 怎么被服务到应用
层 3 · 工程纪律：Prompt 管理 / Evaluation / LLM Observability / Guardrails / Authorization / Conversation Lifecycle —— 生产 AI 必过的质量 / 安全 / 运维

外部权威：docs/references/ai-workloads/（RAG / Agent 论文 · LangChain / LlamaIndex / Anthropic 官方文档 · MCP spec）。LlamaIndex Component Guides（Indexing / Loading / Querying / Evaluating / Observability）覆盖类似主题，本章按"层"分组比按"组件"分组更突出生产工程化路径。

一句话定位

湖仓 + 多模检索是底座 · AI 应用是负载。本章讲 "LLM 应用层怎么被工程化" · 按三层分组组织页面（应用模式 / 应用-Runtime 桥 / 工程纪律）· 让读者知道不同层该什么时间进入。不讲 ML 平台底层（Model Serving / Feature Store / GPU 调度 → 看 ml-infra）· 不讲检索机制（Embedding / ANN / Rerank → 看 retrieval）。

TL;DR

分层结构：
- 层 1 · 应用模式（RAG · MCP · Agent Patterns · Structured Output · Agents on Lakehouse）—— 写什么样的 AI 应用
- 层 2 · 应用-Runtime 桥（LLM Inference · LLM Gateway · Semantic Cache）—— LLM 怎么被服务到应用
- 层 3 · 工程纪律（Prompt 管理 · Evaluation · Observability · Guardrails · Authorization · Conversation Lifecycle）—— 生产 AI 必过的质量 / 安全 / 运维
和 ml-infra 分工：本章讲 LLM 应用层 · ml-infra 讲 ML 平台底层（Feature Store / Model Registry / 训练 / 通用部署）· llm-inference 例外在本章是因为它是 LLM-specific runtime
和 retrieval 分工：本章讲 RAG 应用层（chunking / prompt / 评估）· retrieval 讲 检索机制（ANN / Hybrid / Rerank 算法）
关键协议：MCP（Anthropic 2024）已是 2025-2026 跨 host Tool 事实标准

边界声明 · 避免走错章

本章不讲以下话题 · 请去对应章：

检索机制（Embedding / ANN / Rerank / 向量库）→ retrieval/
ML 平台（Feature Store / Model Registry / Model Serving / 训练编排 / GPU 调度）→ ml-infra/
BI × LLM（Text-to-SQL / 语义层 LLM 集成 / Auto-Viz）→ bi-workloads/bi-plus-llm.md
端到端业务场景（RAG on Lake / Agentic Workflows / 推荐系统）→ scenarios/
法规 / 治理规范（EU AI Act / NIST AI RMF / 中国生成式 AI 管理办法）→ ops/compliance §4

三层分组¶

层 1 · 应用模式（你要写什么样的 AI 应用）¶

RAG ⭐ —— 检索增强生成 · 工业 LLM 应用最普及 pattern · 含 GraphRAG
MCP ⭐ —— 跨 host Tool 协议 · 2025 起 Agent 事实标准
Agent Patterns ⭐ —— ReAct / Reflexion / Multi-agent / Memory / HITL / 执行契约 + 框架生态（LangGraph / Microsoft Agent Framework / OpenAI Agents SDK / Claude Agent SDK / Google ADK / CrewAI）
Structured Output —— JSON mode / Function Calling / Schema 约束 / Instructor / Outlines / SGLang CFSM
Agents on Lakehouse —— 湖仓专属 Agent tool 设计 + MCP × 湖仓架构

层 2 · 应用-Runtime 桥（LLM 怎么被服务到应用）¶

这一层是 LLM-specific runtime · 和 ml-infra 的通用 model serving 分工（本章讲 LLM specific · ml-infra 讲通用 ML）。

LLM Inference ⭐ —— vLLM / SGLang / TensorRT-LLM / Dynamo · KV cache / 连续批次 / PagedAttention · Serverless 托管对比
LLM Gateway —— 统一代理 · 限流 / 重试 / 缓存 / 成本监控 / 路由 / 灰度
Semantic Cache —— 语义缓存 + 系统级 Prompt Caching（Anthropic / OpenAI / Gemini）机制对比

层 3 · 工程纪律（生产 AI 必过的质量 / 安全 / 运维）¶

Prompt 管理 —— 版本化 · DSPy · Prompt Caching · System Prompt 工程
LLM / RAG / Agent 评估 ⭐ —— RAGAS · TruLens · DeepEval · SWE-bench · τ-bench · Agent 评估生产 5 维 · Eval vs Obs 分工
LLM Observability ⭐ —— Trace / Cost / Latency / Version 四维契约 · OTel GenAI · Langfuse / LangSmith / Phoenix / Helicone · 事故归因模板
Guardrails ⭐ —— Llama Guard 3 · NeMo Guardrails 2026 IORails · Guardrails AI · Lakera · 输入输出双向 · Defense-in-Depth
AI App Authorization ⭐ —— Tool ACL / Data ACL / Cache 隔离 / Log 隔离 / Identity 流转 / Multi-tenant
Conversation Lifecycle ⭐ —— 会话状态 6 组件 · 3 种 history 裁剪 · Letta / ReMe / Mem0 · 多轮工程

学习路径 · 按角色顺序¶

新手（任一角色起步）： - 层 1 先选 rag → mcp → structured-output 一对 - 搭一个最小可用应用（Gateway + 简单 RAG + 基础 prompt） - 上 evaluation（RAG 最简 · 层 3 的 rag-evaluation） - 再补 observability · guardrails · authorization（层 3 全家桶）

资深 LLM 应用工程师： - 全章通读 · 重点 layered Prompt Caching × Semantic Cache 组合优化 - Agent 侧从 agent-patterns 模式 → 执行契约（§8）→ agents-on-lakehouse 湖仓专属 - 场景端到端：agentic-workflows · rag-on-lake · text-to-sql-platform

AI 平台工程师： - 层 2 · llm-inference → llm-gateway → semantic-cache 全 - 层 3 · llm-observability → guardrails → authorization · 三大生产支柱 - 底层 ml-infra/

数据 / BI 工程师接触 LLM： - 层 1 rag + mcp + structured-output - 相邻 BI × LLM + Text-to-SQL

SRE / Security： - 层 3 · guardrails + authorization + llm-observability · 三件都读 - llm-gateway（运维视角） - 相邻 AI 治理（法规层）

角色建议速查¶

角色	首读路径
LLM 应用工程师（新手）	rag → mcp → prompt-management → llm-gateway
LLM 应用工程师（资深）	全层 1 + agent 执行契约（agent-patterns §8）+ conversation-lifecycle · 加 evaluation + observability
AI 平台工程师	llm-inference → llm-gateway → semantic-cache → llm-observability → guardrails → authorization
数据 / BI 工程师	rag + mcp + structured-output → BI × LLM
SRE / Security	guardrails + authorization + llm-observability · llm-gateway

横向对比（跳转 compare）¶

Feature Store 横比（ML 平台侧）
Embedding 模型横比 · Rerank 模型横比（检索侧）

2024-2026 新方向¶

RAG §4 高级范式 —— Contextual Retrieval / CRAG / Self-RAG / Agentic RAG / GraphRAG / Multi-Query / HyDE / ColBERT v2 / LLMLingua
Embedding · Quantization · Sparse Retrieval —— Matryoshka / Binary / SPLADE / BM42 / ColBERT
AI 合规（法规层）· Guardrails §7 Red Teaming（工程层 + 对抗测试）

注：参见 ADR 0010 · 原 frontier/ 章节已按主题归属到机制章。

底层依赖¶

ML 基础设施 —— Model Registry / Serving / Training / GPU / Feature Store
多模检索 —— 向量 / Hybrid / Rerank / 多模

场景（端到端）¶

RAG on Lake · Agentic Workflows · Text-to-SQL · 多模检索流水线
推荐系统 · 欺诈检测 · CDP