Development Guide
Environment Setup
Requirements
- Python 3.10+ (developed on 3.11)
openai>= 1.50pydantic>= 1.10, < 2 (v1 API)pyyaml>= 6.0fastapi>= 0.99, < 0.100 (API control plane)uvicorn>= 0.20 (API server)pytestandpytest-asynciofor tests
Installation
# Clone
git clone git@gitlab.benac.dev:toys/singing-bird.git
cd singing-bird
# Install dependencies (globally or in a venv)
pip install openai pydantic pyyaml "fastapi>=0.99,<0.100" "uvicorn[standard]" pytest pytest-asyncio
# Set your API key
export OPENAI_API_KEY=sk-...
Project Layout
singing-bird/
src/
singing_bird/ # all source code
models/ # data models (world, synth, director, payloads, ruleset)
engine/ # kernel (director, perception, resolver, processes, invariants)
llm/ # OpenAI integration (client, schemas, prompts)
synth/ # synth agent (cognitive cycle, theory of mind)
content/ # scenario loader
persistence/ # snapshots, audit, migration
api/ # HTTP/JSON control plane
app.py # FastAPI application
session_manager.py
auth.py # bearer token + scopes
refs.py # name → ref resolution
diff.py # change computation
streaming.py # SSE event stream
services/ # observation + intervention logic
cli.py # CLI entry point
scenarios/ # YAML content packs
tests/ # test suite
docs/ # documentation
The src/ layout requires PYTHONPATH=src for all commands.
Running the API Server
# Start the control plane
PYTHONPATH=src uvicorn singing_bird.api.app:app --host 0.0.0.0 --port 8420
# With auth enabled
SINGING_BIRD_API_TOKEN=my-secret-token PYTHONPATH=src uvicorn singing_bird.api.app:app --port 8420
# OpenAPI docs at http://localhost:8420/docs
See the Observer Guide for full API usage including session management, observation, intervention, and LLM agent integration.
Running the Simulation (CLI)
# Run a scenario
PYTHONPATH=src python -m singing_bird.cli run --scenario scenarios/lighthouse.yaml --cycles 5
# Verbose logging
PYTHONPATH=src python -m singing_bird.cli run --scenario scenarios/workshop_blackout.yaml --cycles 3 -v
Output includes per-cycle summaries showing synth moods, speech, actions, and a token usage summary at the end.
Output Structure
Each run creates a directory under data/runs/<run_id>/:
data/runs/abc12345/
snapshots/
snapshot_000000.json # initial state
snapshot_000001.json # after cycle 0
...
audit/
events.jsonl # cycle events, resolutions, communications
llm_calls.jsonl # full LLM prompts + responses (for replay)
Running Tests
# All unit tests (no API key needed, fast)
PYTHONPATH=src pytest tests/ -m "not llm" -v
# Full suite including LLM integration (needs API key, ~10 min)
PYTHONPATH=src pytest tests/ --run-llm -v
# Single test file
PYTHONPATH=src pytest tests/test_kernel.py -v
# Scenario matrix only
PYTHONPATH=src pytest tests/test_scenario_matrix.py -v -m "not llm"
Test Files
| File | Tests | LLM? | What it proves |
|---|---|---|---|
test_models.py |
9 | No | Serialization, components, provenance, theory of mind |
test_statechart.py |
6 | No | HSM: events, timers, guards, history |
test_perception.py |
8 | No | Channel filtering, speech delivery, action surface |
test_content_loader.py |
8 | No | YAML loading, referential integrity, ruleset validation |
test_invariants.py |
5 | No | Invariants, rollback isolation |
test_kernel.py |
12 | No | Component validation, grounding, anonymization, replay |
test_hardening.py |
7 | 1 | Snapshot roundtrip, forced rollback, containers, tools |
test_preplay.py |
7 | No | Migration, conflict, cascades |
test_scenario_matrix.py |
12 | 6 | All 6 packs load + stable IDs + one LLM cycle each |
test_engine.py |
4 | 4 | LLM integration: single cycle, blind synth, intents |
test_api.py |
21 | No | API control plane: sessions, observation, intervention, stimuli, idempotency, auth |
The --run-llm flag enables tests marked with @pytest.mark.llm. Without it, they are skipped.
Adding a New Scenario Pack
- Read the Scenario Authoring Contract
- Create
scenarios/your_scenario.yaml - Use an existing pack as a template
- Required fields:
scenario_id,schema_version,ruleset_ref,random_seed - Define topology (locations + adjacency), entities (with registered components), synths (with embodiment profiles)
- Validate:
PYTHONPATH=src python -c "from singing_bird.content.loader import load_scenario; load_scenario('scenarios/your_scenario.yaml')" - The scenario matrix test will automatically pick it up
Rules
- Use only registered component types (see authoring contract)
- All authored IDs must be unique within their category
- All entity locations must reference valid location IDs
- All synth entities must reference valid entity IDs
- Latent facts must have
allowed_values(no open-ended invention) - Do not depend on mechanics marked "partially implemented" or "not implemented"
Adding a New Component Type
- Add the schema to
PHYSICAL_SOCIAL_V1insrc/singing_bird/models/ruleset.py:
"climbable": ComponentSchema(
component_type="climbable",
description="Can be climbed",
required_properties=["height"],
optional_properties=["difficulty", "description"],
defaults={"difficulty": 0.5},
),
- Add affordance generation in
compose_action_surface()insrc/singing_bird/engine/perception.py:
elif ct == "climbable":
actions.append(AvailableAction(
action_id=f"climb_{ent_ref}",
action_type=ActionType.manipulate,
description=f"Climb {ent.name}",
target_entity_ref=ent_ref,
target_entity_name=ent.name,
affordance="climb",
))
- Add resolution in
_resolve_manipulate()insrc/singing_bird/engine/resolver.py:
elif aid.startswith("climb_"):
return _resolve_climb(intent, synth, target, world), events
- Write the resolver function and a test.
Note: the current architecture hardcodes affordance/resolution dispatch in the engine. A future improvement (ruleset-driven handler registries) would make this fully declarative.
Adding a Perception Policy
- Subclass
PerceptionPolicyinsrc/singing_bird/engine/perception_policy.py:
class FogPolicy(PerceptionPolicy):
def filter(self, observations, synth, world):
if world.environment.weather.condition != "foggy":
return observations
return [obs for obs in observations if obs.channel.value != "visual" or obs.salience > 0.7]
- Pass it to
compose_sensory_bundle()via theperception_policyparameter.
Snapshot Format and Migration
Snapshots are JSON files containing the full serialized state:
{
"snapshot_ref": "snapshot_000003",
"created_at": "2026-04-16T18:13:42",
"schema_version": "0.3.0",
"world": { ... },
"director": { ... },
"synths": { "uuid": { ... }, ... }
}
Adding a Migration
When you change the data model, register a migration in src/singing_bird/persistence/migration.py:
@register_migration("0.3.0", "0.4.0")
def _migrate_030_to_040(data):
# Transform the snapshot dict
for eid, entity in data["world"]["entities"].items():
entity["new_field"] = "default_value"
return data
Update CURRENT_SCHEMA_VERSION in the same file. The snapshot loader auto-migrates when it encounters older versions.
Debugging
Audit Log
Every cycle writes structured events to audit/events.jsonl:
# Watch events as they happen
cat data/runs/<run_id>/audit/events.jsonl | python -m json.tool --no-ensure-ascii
# Find all speech
grep '"communication"' data/runs/<run_id>/audit/events.jsonl
# Find all failed intents
grep '"failure"' data/runs/<run_id>/audit/events.jsonl
LLM Call Log
Every LLM call is logged with full prompt and response:
# Count calls per run
wc -l data/runs/<run_id>/audit/llm_calls.jsonl
# Inspect a specific call
head -1 data/runs/<run_id>/audit/llm_calls.jsonl | python -m json.tool
Snapshot Inspection
# Read a snapshot
python -c "
import json
with open('data/runs/<run_id>/snapshots/snapshot_000003.json') as f:
d = json.load(f)
print(f'Revision: {d[\"world\"][\"revision\"]}')
print(f'Time: {d[\"world\"][\"simulation_time\"]}')
for sid, s in d['synths'].items():
print(f' {s[\"identity\"][\"name\"]}: mood={s[\"affect\"][\"mood\"]}')
"
Cost Management
The CLI prints token usage at the end of each run:
--- Token Usage ---
LLM calls: 79
Prompt tokens: 219,809
Completion tokens: 43,786
Total tokens: 263,595
Estimated cost: $0.9874
gpt-4o: 79 calls, 263,595 tokens
Reducing Costs
- Use fewer cycles:
--cycles 2instead of--cycles 5 - Use
gpt-4o-miniin synth cognition config (less nuanced but much cheaper):yaml cognition: model: gpt-4o-mini temperature: 0.9 - Scenarios with fewer synths cost proportionally less per cycle
- Microcycles multiply LLM calls: scenarios with heavy speech generate more microcycles
Typical Costs (gpt-4o, 3 synths)
| Cycles | LLM Calls | Tokens | Estimated Cost |
|---|---|---|---|
| 1 | 3-9 | ~20K-60K | $0.05-0.20 |
| 3 | 9-30 | ~60K-150K | $0.20-0.50 |
| 5 | 15-80 | ~100K-270K | $0.35-1.00 |
Costs vary with microcycle count, which depends on how much the synths talk.