Skip to Content
View as .md

Models

Models define connections to LLM providers. Cloud providers (Anthropic, OpenAI, Gemini) have built-in model lists — just add your API key and all supported models are automatically available. Local providers (Ollama) use aliases to map HCL-friendly keys to model names.

Cloud Providers

model "anthropic" { provider = "anthropic" api_key = vars.anthropic_api_key } model "openai" { provider = "openai" api_key = vars.openai_api_key } model "gemini" { provider = "gemini" api_key = vars.gemini_api_key }

All supported models for each provider are available automatically — no need to list them.

Custom Endpoints

Every provider accepts an optional base_url to redirect API calls to a compatible proxy or gateway (LiteLLM, OpenRouter, a corporate gateway, etc.). Leave it unset to use each SDK’s default endpoint.

model "anthropic" { provider = "anthropic" api_key = vars.anthropic_api_key base_url = "https://litellm.internal.example.com" }

Local Models (Ollama)

The ollama provider connects to any OpenAI-compatible local inference server. Use aliases to define which models are available and map HCL-safe keys to the actual model names.

Requires Ollama 0.13.3 or newer (or any other server that implements /v1/responses). Squadron speaks to OpenAI-compatible servers via the Responses API, not the older Chat Completions API. vLLM (recent versions) and LiteLLM also support this.

model "local" { provider = "ollama" base_url = "http://localhost:11434/v1" aliases = { gemma4 = "gemma4" gemma4_26b = "gemma4:26b" nemotron = "nemotron-cascade-2:30b" } }

The alias key (left side) becomes the HCL reference name. The value (right side) is the exact model name sent to the server. This handles models with colons or hyphens that aren’t valid in HCL identifiers.

agent "researcher" { model = models.local.gemma4_26b # sends "gemma4:26b" to Ollama } agent "writer" { model = models.local.nemotron # sends "nemotron-cascade-2:30b" to Ollama }

The base_url should point to the OpenAI-compatible API endpoint. Common values:

ServerBase URL
Ollamahttp://localhost:11434/v1
vLLMhttp://localhost:8000/v1
llama.cpphttp://localhost:8080/v1
LM Studiohttp://localhost:1234/v1

Attributes

AttributeTypeRequiredDescription
providerstringyesProvider name: anthropic, openai, gemini, or ollama
api_keystringcloud providersAPI key (required for anthropic, openai, gemini)
base_urlstringnoOverride the provider’s API endpoint (required for ollama; optional for cloud providers to route through a compatible proxy)
aliasesmapollama onlyMap of HCL key → API model name
prompt_cachingboolnoEnable prompt caching (default: true)

Supported Models

See the full list of built-in models with pricing for every provider on the Supported Models page.

Referencing Models

Use models.<config_name>.<model_key> to reference a model:

agent "assistant" { model = models.anthropic.claude_sonnet_4 } agent "local_researcher" { model = models.local.gemma4_26b } mission "pipeline" { commander { model = models.anthropic.claude_sonnet_4 } }

Multiple Configs Per Provider

You can have multiple model configs for the same provider with different API keys:

model "anthropic_prod" { provider = "anthropic" api_key = vars.anthropic_prod_key } model "anthropic_dev" { provider = "anthropic" api_key = vars.anthropic_dev_key }

Custom Aliases for Cloud Providers

Cloud providers can also use aliases to add custom model name mappings or override the built-in ones:

model "anthropic" { provider = "anthropic" api_key = vars.anthropic_api_key aliases = { sonnet = "claude-sonnet-4-20250514" } } agent "assistant" { model = models.anthropic.sonnet # custom alias }

Pricing Overrides

Squadron includes built-in pricing for all supported models to estimate costs per turn. Override with custom pricing using pricing blocks:

model "anthropic" { provider = "anthropic" api_key = vars.anthropic_api_key pricing "claude_sonnet_4_6" { input = 2.50 # per 1M tokens output = 12.00 cache_read = 0.25 cache_write = 3.00 } }
AttributeTypeDescription
inputnumberCost per 1M input tokens (required)
outputnumberCost per 1M output tokens (required)
cache_readnumberCost per 1M cached input tokens (optional, default: 0)
cache_writenumberCost per 1M cache write tokens (optional, default: 0)

The pricing block label must match a model key. Costs are shown in the command center’s Costs tab.

Local Models and Cost Tracking

Local models (Ollama provider) have no built-in pricing since they run on your own hardware. Squadron still tracks token usage for every turn, so you can monitor how many tokens your local models consume even though the dollar cost is $0.

Native Reasoning

Squadron supports native reasoning (“extended thinking” on Anthropic, reasoning summaries on OpenAI Responses, thinking_config on Gemini). Agents and commanders enable it via the reasoning attribute ("low", "medium", or "high"); see Agents → Reasoning.

Capability is read from Squadron’s built-in model registry: each model that supports native reasoning is flagged at registration. Claude 4.x, OpenAI o3/o4/gpt-5, and Gemini 2.5+/3.x are flagged today. Setting reasoning on an agent or commander whose model isn’t flagged is a no-op and logs a warning at startup; the agent runs as if the attribute weren’t set. Models that come in through Ollama via user aliases aren’t in the registry — reasoning on those is also a no-op.

Last updated on