Development Guide

Environment Setup

Requirements

Python 3.10+ (developed on 3.11)
openai >= 1.50
pydantic >= 1.10, < 2 (v1 API)
pyyaml >= 6.0
fastapi >= 0.99, < 0.100 (API control plane)
uvicorn >= 0.20 (API server)
pytest and pytest-asyncio for tests

Installation

# Clone
git clone git@gitlab.benac.dev:toys/singing-bird.git
cd singing-bird

# Install dependencies (globally or in a venv)
pip install openai pydantic pyyaml "fastapi>=0.99,<0.100" "uvicorn[standard]" pytest pytest-asyncio

# Set your API key
export OPENAI_API_KEY=sk-...

Project Layout

singing-bird/
  src/
    singing_bird/       # all source code
      models/           # data models (world, synth, director, payloads, ruleset)
      engine/           # kernel (director, perception, resolver, processes, invariants)
      llm/              # OpenAI integration (client, schemas, prompts)
      synth/            # synth agent (cognitive cycle, theory of mind)
      content/          # scenario loader
      persistence/      # snapshots, audit, migration
      api/              # HTTP/JSON control plane
        app.py          # FastAPI application
        session_manager.py
        auth.py         # bearer token + scopes
        refs.py         # name → ref resolution
        diff.py         # change computation
        streaming.py    # SSE event stream
        services/       # observation + intervention logic
      cli.py            # CLI entry point
  scenarios/            # YAML content packs
  tests/                # test suite
  docs/                 # documentation

The src/ layout requires PYTHONPATH=src for all commands.

Running the API Server

# Start the control plane
PYTHONPATH=src uvicorn singing_bird.api.app:app --host 0.0.0.0 --port 8420

# With auth enabled
SINGING_BIRD_API_TOKEN=my-secret-token PYTHONPATH=src uvicorn singing_bird.api.app:app --port 8420

# OpenAPI docs at http://localhost:8420/docs

See the Observer Guide for full API usage including session management, observation, intervention, and LLM agent integration.

Running the Simulation (CLI)

# Run a scenario
PYTHONPATH=src python -m singing_bird.cli run --scenario scenarios/lighthouse.yaml --cycles 5

# Verbose logging
PYTHONPATH=src python -m singing_bird.cli run --scenario scenarios/workshop_blackout.yaml --cycles 3 -v

Output includes per-cycle summaries showing synth moods, speech, actions, and a token usage summary at the end.

Output Structure

Each run creates a directory under data/runs/<run_id>/:

data/runs/abc12345/
  snapshots/
    snapshot_000000.json   # initial state
    snapshot_000001.json   # after cycle 0
    ...
  audit/
    events.jsonl           # cycle events, resolutions, communications
    llm_calls.jsonl        # full LLM prompts + responses (for replay)

Running Tests

# All unit tests (no API key needed, fast)
PYTHONPATH=src pytest tests/ -m "not llm" -v

# Full suite including LLM integration (needs API key, ~10 min)
PYTHONPATH=src pytest tests/ --run-llm -v

# Single test file
PYTHONPATH=src pytest tests/test_kernel.py -v

# Scenario matrix only
PYTHONPATH=src pytest tests/test_scenario_matrix.py -v -m "not llm"

Test Files

File	Tests	LLM?	What it proves
`test_models.py`	9	No	Serialization, components, provenance, theory of mind
`test_statechart.py`	6	No	HSM: events, timers, guards, history
`test_perception.py`	8	No	Channel filtering, speech delivery, action surface
`test_content_loader.py`	8	No	YAML loading, referential integrity, ruleset validation
`test_invariants.py`	5	No	Invariants, rollback isolation
`test_kernel.py`	12	No	Component validation, grounding, anonymization, replay
`test_hardening.py`	7	1	Snapshot roundtrip, forced rollback, containers, tools
`test_preplay.py`	7	No	Migration, conflict, cascades
`test_scenario_matrix.py`	12	6	All 6 packs load + stable IDs + one LLM cycle each
`test_engine.py`	4	4	LLM integration: single cycle, blind synth, intents
`test_api.py`	21	No	API control plane: sessions, observation, intervention, stimuli, idempotency, auth

The --run-llm flag enables tests marked with @pytest.mark.llm. Without it, they are skipped.

Adding a New Scenario Pack

Read the Scenario Authoring Contract
Create scenarios/your_scenario.yaml
Use an existing pack as a template
Required fields: scenario_id, schema_version, ruleset_ref, random_seed
Define topology (locations + adjacency), entities (with registered components), synths (with embodiment profiles)
Validate: PYTHONPATH=src python -c "from singing_bird.content.loader import load_scenario; load_scenario('scenarios/your_scenario.yaml')"
The scenario matrix test will automatically pick it up

Rules

Use only registered component types (see authoring contract)
All authored IDs must be unique within their category
All entity locations must reference valid location IDs
All synth entities must reference valid entity IDs
Latent facts must have allowed_values (no open-ended invention)
Do not depend on mechanics marked "partially implemented" or "not implemented"

Adding a New Component Type

Add the schema to PHYSICAL_SOCIAL_V1 in src/singing_bird/models/ruleset.py:

"climbable": ComponentSchema(
    component_type="climbable",
    description="Can be climbed",
    required_properties=["height"],
    optional_properties=["difficulty", "description"],
    defaults={"difficulty": 0.5},
),

Add affordance generation in compose_action_surface() in src/singing_bird/engine/perception.py:

elif ct == "climbable":
    actions.append(AvailableAction(
        action_id=f"climb_{ent_ref}",
        action_type=ActionType.manipulate,
        description=f"Climb {ent.name}",
        target_entity_ref=ent_ref,
        target_entity_name=ent.name,
        affordance="climb",
    ))

Add resolution in _resolve_manipulate() in src/singing_bird/engine/resolver.py:

elif aid.startswith("climb_"):
    return _resolve_climb(intent, synth, target, world), events

Write the resolver function and a test.

Note: the current architecture hardcodes affordance/resolution dispatch in the engine. A future improvement (ruleset-driven handler registries) would make this fully declarative.

Adding a Perception Policy

Subclass PerceptionPolicy in src/singing_bird/engine/perception_policy.py:

class FogPolicy(PerceptionPolicy):
    def filter(self, observations, synth, world):
        if world.environment.weather.condition != "foggy":
            return observations
        return [obs for obs in observations if obs.channel.value != "visual" or obs.salience > 0.7]

Pass it to compose_sensory_bundle() via the perception_policy parameter.

Snapshot Format and Migration

Snapshots are JSON files containing the full serialized state:

{
  "snapshot_ref": "snapshot_000003",
  "created_at": "2026-04-16T18:13:42",
  "schema_version": "0.3.0",
  "world": { ... },
  "director": { ... },
  "synths": { "uuid": { ... }, ... }
}

Adding a Migration

When you change the data model, register a migration in src/singing_bird/persistence/migration.py:

@register_migration("0.3.0", "0.4.0")
def _migrate_030_to_040(data):
    # Transform the snapshot dict
    for eid, entity in data["world"]["entities"].items():
        entity["new_field"] = "default_value"
    return data

Update CURRENT_SCHEMA_VERSION in the same file. The snapshot loader auto-migrates when it encounters older versions.

Debugging

Audit Log

Every cycle writes structured events to audit/events.jsonl:

# Watch events as they happen
cat data/runs/<run_id>/audit/events.jsonl | python -m json.tool --no-ensure-ascii

# Find all speech
grep '"communication"' data/runs/<run_id>/audit/events.jsonl

# Find all failed intents
grep '"failure"' data/runs/<run_id>/audit/events.jsonl

LLM Call Log

Every LLM call is logged with full prompt and response:

# Count calls per run
wc -l data/runs/<run_id>/audit/llm_calls.jsonl

# Inspect a specific call
head -1 data/runs/<run_id>/audit/llm_calls.jsonl | python -m json.tool

Snapshot Inspection

# Read a snapshot
python -c "
import json
with open('data/runs/<run_id>/snapshots/snapshot_000003.json') as f:
    d = json.load(f)
print(f'Revision: {d[\"world\"][\"revision\"]}')
print(f'Time: {d[\"world\"][\"simulation_time\"]}')
for sid, s in d['synths'].items():
    print(f'  {s[\"identity\"][\"name\"]}: mood={s[\"affect\"][\"mood\"]}')
"

Cost Management

The CLI prints token usage at the end of each run:

--- Token Usage ---
LLM calls: 79
Prompt tokens: 219,809
Completion tokens: 43,786
Total tokens: 263,595
Estimated cost: $0.9874
  gpt-4o: 79 calls, 263,595 tokens

Reducing Costs

Use fewer cycles: --cycles 2 instead of --cycles 5
Use gpt-4o-mini in synth cognition config (less nuanced but much cheaper): yaml cognition: model: gpt-4o-mini temperature: 0.9
Scenarios with fewer synths cost proportionally less per cycle
Microcycles multiply LLM calls: scenarios with heavy speech generate more microcycles

Typical Costs (gpt-4o, 3 synths)

Cycles	LLM Calls	Tokens	Estimated Cost
1	3-9	~20K-60K	$0.05-0.20
3	9-30	~60K-150K	$0.20-0.50
5	15-80	~100K-270K	$0.35-1.00

Costs vary with microcycle count, which depends on how much the synths talk.