View as .md

Datasets

Name: Squadron
Author: Squadron

Datasets are collections of items that tasks can iterate over.

Defining Datasets


mission "process_cities" {
  dataset "city_list" {
    description = "Cities to process"
 
    schema {
      field "name" {
        type     = "string"
        required = true
      }
      field "state" {
        type = "string"
      }
    }
  }
 
  # Tasks can iterate over this dataset
}

Attributes

Attribute	Type	Description
`description`	string	Documentation for the dataset
`schema`	block	Optional schema for validating items
`items`	list	Optional inline list of items
`bind_to`	expression	Optional input binding (e.g., `inputs.cities`)

Schema Definition

Define expected fields:


schema {
  field "id" {
    type     = "integer"
    required = true
  }
 
  field "name" {
    type     = "string"
    required = true
  }
 
  field "metadata" {
    type = "object"
  }
}

Shorthand Schema Syntax

Instead of field blocks you can use a schema = { ... } attribute with schema helper functions:


dataset "city_list" {
  description = "Cities to process"
  schema = {
    id       = integer("City ID", true)
    name     = string("City name", true)
    metadata = map(string, "Additional metadata")   # free-form key-value
  }
}

Use object({...}, "desc") when items have a known nested structure:


dataset "order_list" {
  schema = {
    id      = integer("Order ID", true)
    status  = string("Order status", true)
    address = object({
      city    = string("City", true)
      country = string("Country code", true)
    }, "Shipping address")
  }
}

See Functions for the complete reference on all helper functions and type references. Both the block form and shorthand are fully equivalent.

Field Types

string
number
integer
boolean
list (aliased as array) — element type via list(type)
map — key-value pairs with typed values via map(type)
object — structured data with named properties via object({...})

Populating Datasets

Datasets can be populated in three ways:

1. Bind to Mission Input


mission "process" {
  input "items" {
    type = "list"
  }
 
  dataset "item_list" {
    bind_to = inputs.items
  }
}

2. Inline Items


dataset "regions" {
  items = [
    { name = "us-east-1" },
    { name = "us-west-2" },
    { name = "eu-west-1" }
  ]
}

3. Dynamic Population

Agents can populate datasets at runtime using the set_dataset tool:


task "load_data" {
  objective = "Read cities from data.json and populate the city_list dataset"
}
 
task "process_cities" {
  depends_on = [tasks.load_data]
  iterator {
    dataset = datasets.city_list
  }
}

Dataset Tools

When running in a mission, agents automatically have access to:

set_dataset - Populate a dataset with items
dataset_sample - Get sample items from a dataset
dataset_count - Get the number of items in a dataset

set_dataset


{
  "name": "city_list",
  "items": [
    {"name": "Chicago", "state": "IL"},
    {"name": "Detroit", "state": "MI"}
  ]
}

dataset_sample


{
  "name": "city_list",
  "count": 3
}

dataset_count


{
  "name": "city_list"
}

Schema Validation

When a schema is defined, all items are validated:


dataset "users" {
  schema {
    field "email" {
      type     = "string"
      required = true
    }
  }
}

Setting an item without email will fail validation.

Large Result Handling

When tools return large results (>8KB), Squadron automatically protects context by:

Storing the full result outside LLM context
Returning a sample/summary to the LLM
Providing tools for the LLM to access more data as needed

This prevents context overflow while preserving full access to the data.

How It Works


Tool returns large JSON array (500 items)
         │
         ▼
   Intercepted & stored
         │
         ▼
LLM sees:
  <OBSERVATION>
  [{...}, {...}, ...]  (sample)
  </OBSERVATION>
  <OBSERVATION_METADATA>
  type: array
  id: _result_http_get_1
  partial: true
  total_items: 500
  shown_items: 5
  </OBSERVATION_METADATA>

Result Tools

When a large result is intercepted, the LLM can use these tools:

Tool	Purpose
`result_info`	Get type and size of stored result
`result_items`	Get items from array by offset/count
`result_get`	Navigate object with dot path (e.g., `users.0.name`)
`result_keys`	Get keys of an object
`result_chunk`	Get text by offset/length

Promoting to Datasets

Use result_to_dataset to convert a large array result into a dataset for iteration:


{
  "id": "_result_http_get_1",
  "dataset_name": "users"
}

After promotion, the data is available via standard dataset tools and can be iterated using an iterator block.

Example Flow

Agent calls http_get → returns 500 users
Interceptor stores full array, LLM sees sample of 5
LLM examines sample, decides this is useful
LLM calls result_to_dataset("_result_http_get_1", "users")
Subsequent task iterates: iterator { dataset = datasets.users }