Models
Models define connections to LLM providers. Cloud providers (Anthropic, OpenAI, Gemini) have built-in model lists — just add your API key and all supported models are automatically available. Local providers (Ollama) use aliases to map HCL-friendly keys to model names.
Cloud Providers
model "anthropic" {
provider = "anthropic"
api_key = vars.anthropic_api_key
}
model "openai" {
provider = "openai"
api_key = vars.openai_api_key
}
model "gemini" {
provider = "gemini"
api_key = vars.gemini_api_key
}All supported models for each provider are available automatically — no need to list them.
Custom Endpoints
Every provider accepts an optional base_url to redirect API calls to a compatible proxy or gateway (LiteLLM, OpenRouter, a corporate gateway, etc.). Leave it unset to use each SDK’s default endpoint.
model "anthropic" {
provider = "anthropic"
api_key = vars.anthropic_api_key
base_url = "https://litellm.internal.example.com"
}Local Models (Ollama)
The ollama provider connects to any OpenAI-compatible local inference server. Use aliases to define which models are available and map HCL-safe keys to the actual model names.
Requires Ollama 0.13.3 or newer (or any other server that implements
/v1/responses). Squadron speaks to OpenAI-compatible servers via the Responses API, not the older Chat Completions API. vLLM (recent versions) and LiteLLM also support this.
model "local" {
provider = "ollama"
base_url = "http://localhost:11434/v1"
aliases = {
gemma4 = "gemma4"
gemma4_26b = "gemma4:26b"
nemotron = "nemotron-cascade-2:30b"
}
}The alias key (left side) becomes the HCL reference name. The value (right side) is the exact model name sent to the server. This handles models with colons or hyphens that aren’t valid in HCL identifiers.
agent "researcher" {
model = models.local.gemma4_26b # sends "gemma4:26b" to Ollama
}
agent "writer" {
model = models.local.nemotron # sends "nemotron-cascade-2:30b" to Ollama
}The base_url should point to the OpenAI-compatible API endpoint. Common values:
| Server | Base URL |
|---|---|
| Ollama | http://localhost:11434/v1 |
| vLLM | http://localhost:8000/v1 |
| llama.cpp | http://localhost:8080/v1 |
| LM Studio | http://localhost:1234/v1 |
Attributes
| Attribute | Type | Required | Description |
|---|---|---|---|
provider | string | yes | Provider name: anthropic, openai, gemini, or ollama |
api_key | string | cloud providers | API key (required for anthropic, openai, gemini) |
base_url | string | no | Override the provider’s API endpoint (required for ollama; optional for cloud providers to route through a compatible proxy) |
aliases | map | ollama only | Map of HCL key → API model name |
prompt_caching | bool | no | Enable prompt caching (default: true) |
Supported Models
See the full list of built-in models with pricing for every provider on the Supported Models page.
Referencing Models
Use models.<config_name>.<model_key> to reference a model:
agent "assistant" {
model = models.anthropic.claude_sonnet_4
}
agent "local_researcher" {
model = models.local.gemma4_26b
}
mission "pipeline" {
commander {
model = models.anthropic.claude_sonnet_4
}
}Multiple Configs Per Provider
You can have multiple model configs for the same provider with different API keys:
model "anthropic_prod" {
provider = "anthropic"
api_key = vars.anthropic_prod_key
}
model "anthropic_dev" {
provider = "anthropic"
api_key = vars.anthropic_dev_key
}Custom Aliases for Cloud Providers
Cloud providers can also use aliases to add custom model name mappings or override the built-in ones:
model "anthropic" {
provider = "anthropic"
api_key = vars.anthropic_api_key
aliases = {
sonnet = "claude-sonnet-4-20250514"
}
}
agent "assistant" {
model = models.anthropic.sonnet # custom alias
}Pricing Overrides
Squadron includes built-in pricing for all supported models to estimate costs per turn. Override with custom pricing using pricing blocks:
model "anthropic" {
provider = "anthropic"
api_key = vars.anthropic_api_key
pricing "claude_sonnet_4_6" {
input = 2.50 # per 1M tokens
output = 12.00
cache_read = 0.25
cache_write = 3.00
}
}| Attribute | Type | Description |
|---|---|---|
input | number | Cost per 1M input tokens (required) |
output | number | Cost per 1M output tokens (required) |
cache_read | number | Cost per 1M cached input tokens (optional, default: 0) |
cache_write | number | Cost per 1M cache write tokens (optional, default: 0) |
The pricing block label must match a model key. Costs are shown in the command center’s Costs tab.
Local Models and Cost Tracking
Local models (Ollama provider) have no built-in pricing since they run on your own hardware. Squadron still tracks token usage for every turn, so you can monitor how many tokens your local models consume even though the dollar cost is $0.
Native Reasoning
Squadron supports native reasoning (“extended thinking” on Anthropic, reasoning summaries on OpenAI Responses, thinking_config on Gemini). Agents and commanders enable it via the reasoning attribute ("low", "medium", or "high"); see Agents → Reasoning.
Capability is read from Squadron’s built-in model registry: each model that supports native reasoning is flagged at registration. Claude 4.x, OpenAI o3/o4/gpt-5, and Gemini 2.5+/3.x are flagged today. Setting reasoning on an agent or commander whose model isn’t flagged is a no-op and logs a warning at startup; the agent runs as if the attribute weren’t set. Models that come in through Ollama via user aliases aren’t in the registry — reasoning on those is also a no-op.