Spec-Driven Development (SDD) + BDD + BMAD
Atomic Claim
Overview
In PUMA, SDD ensures every agent component is defined by an executable specification before any code is written, making the artefact auditable, testable, and reproducible — directly addressing the reproducibility gap identified in the SLR.
💡 SDD — Spec-Driven Development
SDD (also: Spec-First, Specification-by-Example) is an engineering paradigm where:
- Specification precedes implementation — write the spec, then generate/write code
- Specs are executable — they double as tests (BDD scenarios, unit tests)
- AI assists spec-to-code translation — human validates, AI generates scaffolding
OpenSpec Format (used in PUMA)
# OpenSpec template
spec_id: "SP-[Component]-v[N]"
component: "[Agent name]"
version: "1.0"
status: draft | review | approved | deprecated
purpose: |
[What this component does and why it exists]
inputs:
- name: [input_field]
type: [str | int | list | dict]
description: [what it contains]
example: [concrete example]
outputs:
- name: [output_field]
type: [str | int | dict]
description: [what it contains]
example: [concrete example]
behaviour:
- given: [precondition]
when: [trigger/input]
then: [expected output/behaviour]
constraints:
- [Technical constraint 1, e.g., "Must run on CPU with 16GB RAM"]
- [Constraint 2]
acceptance_criteria:
- metric: F1-macro
threshold: ">= 0.55"
dataset: jira-sr
- metric: latency
threshold: "< 60s per query"💡 BDD — Behaviour-Driven Development
BDD translates specs into human-readable test scenarios using Given/When/Then:
Feature: Issue Triage Agent
As a project manager
I want the agent to classify issue priority automatically
So that I can focus on high-value decisions
Scenario: Classify a critical production bug
Given a Jira issue with title "System crash in production — all users affected"
And the model is llama3.2-8b with strategy few-shot-3
When the triage agent processes the issue
Then the predicted priority should be "Critical"
And the latency should be less than 60 seconds
And the response should include a reasoning trace
Scenario: Handle ambiguous priority
Given a Jira issue with title "Button colour is slightly off"
When the triage agent processes the issue
Then the predicted priority should be "Low" or "Medium"
And confidence should be flagged as "low"💡 BMAD — Brainstorm · Map · Architect · Develop
BMAD is a workflow for AI-assisted system design:
| Phase | Activity | AI Role | Human Role |
|---|---|---|---|
| Brainstorm | Generate component ideas | Propose options | Select + filter |
| Map | Create architecture diagram | Draft relationships | Validate logic |
| Architect | Write OpenSpec for each component | Generate spec template | Review + approve |
| Develop | Generate code from spec | Scaffold implementation | Review + test |
BMAD applied to PUMA Triage Agent
B: Brainstorm — What components does the triage agent need?
AI: "Possible components: prompt builder, Ollama client,
output parser, metric calculator, carbon tracker, logger"
Human: Selects: prompt builder + Ollama client + output parser + metric calc + CodeCarbon
M: Map — How do they connect?
[Architecture diagram → SP-Architecture-v1]
A: Architect — Write spec for each component
[→ SP-Triage-Agent-v1, SP-Dataset-Preparation, etc.]
D: Develop — Generate from specs
[Claude Code / Cursor AI scaffold, human review]
🔗 Connected Ideas
Implements: PN-DSR-SLR-Methods (Design step) | Produces: SP-Architecture Validated by: PN-DSR-SLR-Methods (SLR artefact evaluation) | AI-assisted via: PN-RCOIF-Framework Agent generation: PN-KeyConcepts-Agents-Reproducibility-RedTeam (Agent Prompt Engineering) Constitution: SP-PUMA-Constitution | Project context: PR-PUMA-Ch3-Methods
⚠️ Caveats
- BDD scenarios must be written BEFORE the code — writing them after defeats the purpose
- AI-generated specs need careful human review: the AI optimises for plausibility, not correctness
- BMAD requires resisting the urge to jump to D (Develop) before completing A (Architect)