Skip to Content
MissionsDatasets

Datasets

Datasets are collections of items that tasks can iterate over.

Defining Datasets

mission "process_cities" { dataset "city_list" { description = "Cities to process" schema { field "name" { type = "string" required = true } field "state" { type = "string" } } } # Tasks can iterate over this dataset }

Attributes

AttributeTypeDescription
descriptionstringDocumentation for the dataset
schemablockOptional schema for validating items
itemslistOptional inline list of items
bind_toexpressionOptional input binding (e.g., inputs.cities)

Schema Definition

Define expected fields:

schema { field "id" { type = "integer" required = true } field "name" { type = "string" required = true } field "metadata" { type = "object" } }

Shorthand Schema Syntax

Instead of field blocks you can use a schema = { ... } attribute with schema helper functions:

dataset "city_list" { description = "Cities to process" schema = { id = integer("City ID", true) name = string("City name", true) metadata = map(string, "Additional metadata") # free-form key-value } }

Use object({...}, "desc") when items have a known nested structure:

dataset "order_list" { schema = { id = integer("Order ID", true) status = string("Order status", true) address = object({ city = string("City", true) country = string("Country code", true) }, "Shipping address") } }

See Functions for the complete reference on all helper functions and type references. Both the block form and shorthand are fully equivalent.

Field Types

  • string
  • number
  • integer
  • boolean
  • list (aliased as array) — element type via list(type)
  • map — key-value pairs with typed values via map(type)
  • object — structured data with named properties via object({...})

Populating Datasets

Datasets can be populated in three ways:

1. Bind to Mission Input

mission "process" { input "items" { type = "list" } dataset "item_list" { bind_to = inputs.items } }

2. Inline Items

dataset "regions" { items = [ { name = "us-east-1" }, { name = "us-west-2" }, { name = "eu-west-1" } ] }

3. Dynamic Population

Agents can populate datasets at runtime using the set_dataset tool:

task "load_data" { objective = "Read cities from data.json and populate the city_list dataset" } task "process_cities" { depends_on = [tasks.load_data] iterator { dataset = datasets.city_list } }

Dataset Tools

When running in a mission, agents automatically have access to:

  • set_dataset - Populate a dataset with items
  • dataset_sample - Get sample items from a dataset
  • dataset_count - Get the number of items in a dataset

set_dataset

{ "name": "city_list", "items": [ {"name": "Chicago", "state": "IL"}, {"name": "Detroit", "state": "MI"} ] }

dataset_sample

{ "name": "city_list", "count": 3 }

dataset_count

{ "name": "city_list" }

Schema Validation

When a schema is defined, all items are validated:

dataset "users" { schema { field "email" { type = "string" required = true } } }

Setting an item without email will fail validation.

Large Result Handling

When tools return large results (>8KB), Squadron automatically protects context by:

  1. Storing the full result outside LLM context
  2. Returning a sample/summary to the LLM
  3. Providing tools for the LLM to access more data as needed

This prevents context overflow while preserving full access to the data.

How It Works

Tool returns large JSON array (500 items) Intercepted & stored LLM sees: <OBSERVATION> [{...}, {...}, ...] (sample) </OBSERVATION> <OBSERVATION_METADATA> type: array id: _result_http_get_1 partial: true total_items: 500 shown_items: 5 </OBSERVATION_METADATA>

Result Tools

When a large result is intercepted, the LLM can use these tools:

ToolPurpose
result_infoGet type and size of stored result
result_itemsGet items from array by offset/count
result_getNavigate object with dot path (e.g., users.0.name)
result_keysGet keys of an object
result_chunkGet text by offset/length

Promoting to Datasets

Use result_to_dataset to convert a large array result into a dataset for iteration:

{ "id": "_result_http_get_1", "dataset_name": "users" }

After promotion, the data is available via standard dataset tools and can be iterated using an iterator block.

Example Flow

  1. Agent calls http_get → returns 500 users
  2. Interceptor stores full array, LLM sees sample of 5
  3. LLM examines sample, decides this is useful
  4. LLM calls result_to_dataset("_result_http_get_1", "users")
  5. Subsequent task iterates: iterator { dataset = datasets.users }

See Also

Last updated on