Local models
When to prefer local agents and how routing selects them.
Local models
“Local” here means models you serve on the same machine or LAN, usually via Ollama. In models, local profiles typically carry zero marginal rates (input_cost_per_1m_tokens: 0 / output_cost_per_1m_tokens: 0) so cost-aware routing can prefer them when the step class allows.
When to use
Local models shine when latency to a cloud API is undesirable or when you want bounded spend: intent classification, spec enrichment before a cloud dev pass, log triage, and lightweight pre-review are typical. They are also the default answer for air-gapped trees or repos where every cent and every byte out of the network matters.
- Classifying intents and tasks
- Enriching specs before cloud
dev - Log analysis and pre-review
- Air-gapped or cost-sensitive repos
Enable locally
Wire agents.ollama (or another local protocol you configure), add a models profile with class: local, extend routing.prefer_local_for with the step classes you want to keep home, and use --prefer-local or --no-cloud on work when you need a one-off constraint without editing YAML.
- Configure
agents.ollamaand amodelsprofile withclass: local - Set routing
prefer_local_forstep classes - Run with
--prefer-localor--no-cloudonwork
agentflow work "add logging" --prefer-local --estimate-onlyLimits
Local weights are smaller and context windows shorter; large refactors or security-sensitive reviews may still need a cloud profile. When use_cloud_heavy_for matches and you pass --allow-cloud, routing is allowed to escalate after the configured failure thresholds — check your strategy numbers before assuming a command stayed entirely local.