# Models

Models define connections to LLM providers. Cloud providers (Anthropic, OpenAI, Gemini) have built-in model lists — just add your API key and all supported models are automatically available. Local providers (Ollama) use `aliases` to map HCL-friendly keys to model names.

## Cloud Providers

```hcl
model "anthropic" {
  provider = "anthropic"
  api_key  = vars.anthropic_api_key
}

model "openai" {
  provider = "openai"
  api_key  = vars.openai_api_key
}

model "gemini" {
  provider = "gemini"
  api_key  = vars.gemini_api_key
}
```

All supported models for each provider are available automatically — no need to list them.

### Custom Endpoints

Every provider accepts an optional `base_url` to redirect API calls to a compatible proxy or gateway (LiteLLM, OpenRouter, a corporate gateway, etc.). Leave it unset to use each SDK's default endpoint.

```hcl
model "anthropic" {
  provider = "anthropic"
  api_key  = vars.anthropic_api_key
  base_url = "https://litellm.internal.example.com"
}
```

## Local Models (Ollama)

The `ollama` provider connects to any OpenAI-compatible local inference server. Use `aliases` to define which models are available and map HCL-safe keys to the actual model names.

> **Requires Ollama 0.13.3 or newer** (or any other server that implements `/v1/responses`). Squadron speaks to OpenAI-compatible servers via the Responses API, not the older Chat Completions API. vLLM (recent versions) and LiteLLM also support this.

```hcl
model "local" {
  provider = "ollama"
  base_url = "http://localhost:11434/v1"
  aliases = {
    gemma4     = "gemma4"
    gemma4_26b = "gemma4:26b"
    nemotron   = "nemotron-cascade-2:30b"
  }
}
```

The alias key (left side) becomes the HCL reference name. The value (right side) is the exact model name sent to the server. This handles models with colons or hyphens that aren't valid in HCL identifiers.

```hcl
agent "researcher" {
  model = models.local.gemma4_26b   # sends "gemma4:26b" to Ollama
}

agent "writer" {
  model = models.local.nemotron     # sends "nemotron-cascade-2:30b" to Ollama
}
```

The `base_url` should point to the OpenAI-compatible API endpoint. Common values:

| Server | Base URL |
|--------|----------|
| Ollama | `http://localhost:11434/v1` |
| vLLM | `http://localhost:8000/v1` |
| llama.cpp | `http://localhost:8080/v1` |
| LM Studio | `http://localhost:1234/v1` |

## Attributes

| Attribute | Type | Required | Description |
|-----------|------|----------|-------------|
| `provider` | string | yes | Provider name: `anthropic`, `openai`, `gemini`, or `ollama` |
| `api_key` | string | cloud providers | API key (required for `anthropic`, `openai`, `gemini`) |
| `base_url` | string | no | Override the provider's API endpoint (required for `ollama`; optional for cloud providers to route through a compatible proxy) |
| `aliases` | map | `ollama` only | Map of HCL key → API model name |
| `prompt_caching` | bool | no | Enable prompt caching (default: `true`) |

## Supported Models

See the full list of built-in models with pricing for every provider on the **[Supported Models](/config/supported-models)** page.

## Referencing Models

Use `models.<config_name>.<model_key>` to reference a model:

```hcl
agent "assistant" {
  model = models.anthropic.claude_sonnet_4
}

agent "local_researcher" {
  model = models.local.gemma4_26b
}

mission "pipeline" {
  commander {
    model = models.anthropic.claude_sonnet_4
  }
}
```

## Multiple Configs Per Provider

You can have multiple model configs for the same provider with different API keys:

```hcl
model "anthropic_prod" {
  provider = "anthropic"
  api_key  = vars.anthropic_prod_key
}

model "anthropic_dev" {
  provider = "anthropic"
  api_key  = vars.anthropic_dev_key
}
```

## Custom Aliases for Cloud Providers

Cloud providers can also use `aliases` to add custom model name mappings or override the built-in ones:

```hcl
model "anthropic" {
  provider = "anthropic"
  api_key  = vars.anthropic_api_key
  aliases = {
    sonnet = "claude-sonnet-4-20250514"
  }
}

agent "assistant" {
  model = models.anthropic.sonnet  # custom alias
}
```

## Pricing Overrides

Squadron includes built-in pricing for all supported models to estimate costs per turn. Override with custom pricing using `pricing` blocks:

```hcl
model "anthropic" {
  provider = "anthropic"
  api_key  = vars.anthropic_api_key

  pricing "claude_sonnet_4_6" {
    input       = 2.50   # per 1M tokens
    output      = 12.00
    cache_read  = 0.25
    cache_write = 3.00
  }
}
```

| Attribute | Type | Description |
|-----------|------|-------------|
| `input` | number | Cost per 1M input tokens (required) |
| `output` | number | Cost per 1M output tokens (required) |
| `cache_read` | number | Cost per 1M cached input tokens (optional, default: 0) |
| `cache_write` | number | Cost per 1M cache write tokens (optional, default: 0) |

The pricing block label must match a model key. Costs are shown in the command center's Costs tab.

### Local Models and Cost Tracking

Local models (Ollama provider) have no built-in pricing since they run on your own hardware. Squadron still tracks **token usage** for every turn, so you can monitor how many tokens your local models consume even though the dollar cost is $0.

## Native Reasoning

Squadron supports native reasoning ("extended thinking" on Anthropic, reasoning summaries on OpenAI Responses, `thinking_config` on Gemini). Agents and commanders enable it via the `reasoning` attribute (`"low"`, `"medium"`, or `"high"`); see [Agents → Reasoning](/config/agents#reasoning).

Capability is read from Squadron's built-in model registry: each model that supports native reasoning is flagged at registration. Claude 4.x, OpenAI o3/o4/gpt-5, and Gemini 2.5+/3.x are flagged today. Setting `reasoning` on an agent or commander whose model isn't flagged is a no-op and logs a warning at startup; the agent runs as if the attribute weren't set. Models that come in through Ollama via user `aliases` aren't in the registry — `reasoning` on those is also a no-op.
