Ollama
Run local models for enrichment, routing, and intent fallback.
Ollama
Ollama backs local steps in AgentFlow: enrichment passes, the intent resolver when intent.resolver.use_ollama_fallback is on, and cost-aware routing that targets work.default_enricher. It is not a separate product integration — it is the HTTP endpoint and model tags you wire under agents.ollama plus matching models entries.
Configure
Point endpoint at your daemon, name the chat and embedding tags you pulled, and mirror the model id in a models profile with provider: ollama and the usage classes you expect locally.
agents:
ollama:
endpoint: http://localhost:11434
model: qwen2.5-coder:14b
embedding_model: nomic-embed-text
timeout: 300
models:
ollama_local_qwen:
provider: ollama
class: local
model: qwen2.5-coder:14b
usage: [summarize, classify, pre_review, context_selection]Prerequisites
Start the daemon, pull the weights you referenced, then let doctor confirm the tree and config agree.
ollama serve
ollama pull qwen2.5-coder:14b
agentflow doctorUsage
enrich calls Ollama directly when you pass --agent ollama; work --prefer-local keeps eligible steps on the local profile your routing describes.
agentflow enrich billing-v2 --agent ollama
agentflow work "refactor utils" --prefer-localEmbeddings / RAG
embedding_model is reserved for retrieval work the project has not fully shipped yet. Today’s agentflow index persists text chunks in SQLite — vector similarity search is not something you should expect from a stock build.