đ The Evolution of Agentic Software Engineering: An Analytical Study of Spec-Driven Development (SDD) and Its Impact on TIC Project Governance
The software development industry is at a historic inflection point, transitioning from a phase of enthusiastic experimentation with generative artificial intelligence â often referred to as âvibe codingâ â toward a paradigm of technical and methodological rigor known as Spec-Driven Development (SDD).
This transition is not merely superficial; it represents the maturation of Agentic Software Engineering (SE 3.0), where the center of gravity shifts from the manual writing of lines of code to the orchestration of complex systems of autonomous agents operating under strict specifications.
In this new scenario, the specification ceases to be a passive post-development documentation artifact and becomes the primary artifact, the single source of truth, and the executable contract that guides the logic of the agents.
This study analyzes the depth of this change, evaluating the dialectical tensions between upfront design and agility, the effectiveness of leading frameworks and emerging tools, while integrating these findings into the empirical context of the PUMA Project (PUMA Understanding & Management with Agents).
The Decision Architecture Debate: Upfront Design vs. Agentic Agility
The adoption of Spec-Driven Development has revived a classic controversy in software engineering: the value of detailed design upfront versus iterative flexibility. The fundamental difference in 2026 is that this debate no longer only involves humans, but centers on how machines consume context and intent.
Marmelabâs Critical Perspective: The Specter of the Waterfall Model
From Marmelabâs point of view, SDD runs the risk of being a regression toward the bureaucratic practices of the Waterfall model, disguised under modern terminology. François Zaninottoâs critique is based on the observation that tools such as Spec-kit or Kiro force developers to generate a massive amount of Markdown documentation before allowing code generation. This âMarkdown madnessâ can result in developers spending 80% of their time reading and reviewing AI-generated prose instead of thinking critically about the architecture.
Marmelab identifies a phenomenon of âcontext blindnessâ where, despite detailed specifications, AI agents often omit existing functions in the codebase because their search is limited to the text provided in the specification. Additionally, it is argued that SDD introduces a âfalse sense of securityâ, since in practice agents do not always follow the specification to the letter, marking tasks as completed without having implemented the corresponding unit tests.
In response to this, Marmelab proposes âNatural Language Developmentâ, an evolution of the Agile paradigm where incremental and simple instructions are used to converge toward the product in an organic way.
Marc Brookerâs (AWS) Defense: Elevating the Level of Abstraction
Marc Brooker, Distinguished Engineer at AWS, offers a robust counterargument, maintaining that SDD is not Waterfall because it does not seek to fix all requirements âup-frontâ, but rather to elevate the design (âpulling designs upâ) to a higher level of abstraction. For Brooker, specifications are living, versioned, and explicit artifacts that act as a âmapâ for AI agents.
The key distinction is autonomy. While âvibe codingâ is like giving step-by-step instructions (turn-by-turn directions), which forces the human to remain constantly in the feedback loop, SDD provides the agent with the ability to operate autonomously for longer periods by understanding the âbig pictureâ and the final objectives. Brooker argues that this separation between requirements and implementation allows the human to focus on managing internal requirement conflicts and strategic decision-making, leaving the AI to derive the technical implementation more efficiently.
Implementation Levels According to Birgitta Böckeler
To disambiguate the term SDD, Birgitta Böckeler proposes on Martin Fowlerâs site a taxonomy of three implementation levels that is essential for assessing an engineering teamâs maturity:
| SDD Level | Main Characteristic | Human Role | Code State |
|---|---|---|---|
| Spec-first | The specification guides the initial task and is then discarded | Designs the initial prompt | Source of truth post-task |
| Spec-anchored | The specification is maintained throughout the functionality lifecycle | Edits the specification for any change | Synchronized with the spec |
| Spec-as-source | The specification is the only source file the human edits | Does not touch the code directly | Artifact generated by AI |
Böckeler warns that the âSpec-as-sourceâ level is a form of Model-Driven Development (MDD) which, while eliminating the need for complex domain-specific languages (DSLs), introduces the challenge of language model non-determinism, which can hinder reproducibility.
Technical Evaluation of Frameworks: OpenSpec, BMAD, and GitHub Spec Kit
The SDD framework ecosystem in 2026 shows remarkable diversification, with tools optimized for different project scales and types of codebases.
OpenSpec: The Efficiency of the Brownfield Approach and Delta Specs
OpenSpec has positioned itself as the fastest-growing framework, with an 863% increase in its user base during the last semester. Its competitive advantage lies in its âbrownfield-firstâ approach, specifically designed to work on existing codebases where full respecification is not feasible.
The technical innovation of OpenSpec is the use of âdelta specificationsâ. Instead of processing massive documents, the system isolates proposed changes in independent folders (openspec/changes/), using ADDED, MODIFIED, and REMOVED markers. This strategy drastically reduces token consumption and the cognitive load on the human reviewer, while keeping artifacts lightweight (~250 lines versus 800+ in competing frameworks). Additionally, its interoperability is superior, supporting more than 20 coding assistants through universal slash commands and the AGENTS.md protocol.
GitHub Spec Kit: The Constitution and Corporate Standard
Spec Kit, developed by GitHub, represents the de facto standard for projects seeking strict governance from the start. Its architecture is based on the Constitution (constitution.md), a document that defines non-negotiable principles such as test coverage standards, naming conventions, and architectural constraints that the AI must respect in every iteration.
The Spec Kit flow is rigorous and closed, based on a Plan-Act-Review cycle. Although it produces high-quality artifacts, such as OpenAPI specifications and structured user journeys, it has been criticized for its administrative heaviness. Teams working on small tasks have reported that a simple feature can generate up to 8 files and 1,300 lines of text, confirming Marmelabâs concerns about excessive verbosity.
BMAD (Breakthrough Method of Agile AI-Driven Development): Simulation of Agile Teams
The BMAD method stands out as the most architecturally complex framework, designed to simulate a complete agile development team through more than 12 specialized agent roles (Analyst, Architect, QA, etc.). BMAD introduces the concept of âAgile Intelligenceâ, where agents not only execute tasks but also participate in elicitation flows and mid-process course correction.
Its âQuick Flowâ version attempts to solve the entry friction of the full version, allowing a faster transition from technical requirements to immediate coding, although with less planning depth. A critical differentiator of BMAD is its adversarial code review (/bmad-bmm-code-review), where a control agent actively looks for failures that a standard review would miss before authorizing the cycle closure.
Emerging Tools and Agentic Discipline: Superpowers, GSD, and Beads
To complement governance frameworks, specialized tools have emerged to inject discipline, memory, and context isolation into the agentsâ workflow.
Superpowers (Jesse Vincent): TDD and Psychological Persuasion
Superpowers, created by Jesse Vincent, is a framework that enforces a strict development methodology based on Test-Driven Development (TDD). The premise of Superpowers is radical: the agent is forbidden from writing production code unless a failing unit test that validates it already exists. If an agent attempts to generate implementation without tests, the framework automatically deletes the written code.
Additionally, Superpowers applies social psychology principles drawn from Robert Cialdiniâs research on persuasion (authority, commitment, social proof) within the system prompts. By forcing the agent to explicitly declare which skill it is using and to commit to compliance with the rules before each response, a significant reduction in hallucinations and an increase in the precision of generated code has been achieved in projects such as the chardet library.
GSD (Get Shit Done): Isolation Against Context Rot
GSD, developed by TĂCHES, addresses the technical problem of âcontext rotâ. It has been empirically demonstrated that the quality of language models drops drastically when the context window exceeds 50-70% of its capacity, causing the agent to forget initial instructions or introduce inconsistencies.
GSDâs solution is total context isolation. The system fragments the project into atomic phases and tasks, providing each task with a fresh agent instance with a clean 200,000-token context window. All project information resides in persistent Markdown files on disk (PROJECT.md, ROADMAP.md, STATE.md), eliminating the need to maintain a long and error-prone conversation history.
Beads (Steve Yegge): Structured Graph Memory
Beads represents the effort to provide agents with long-term memory that does not depend on flat text files. Designed by Steve Yegge, Beads uses an issue tracker based on a dependency graph, powered by Dolt technology (a SQL database with native version control).
This structure allows agents to navigate relationships between tasks (blockers, duplicates, epic hierarchies) in a deterministic way. Beads implements âsemantic compactionâ that automatically summarizes closed tasks, preserving the agentâs attention for current operations without losing the historical context of architectural decisions.
| Tool | Discipline Approach | Base Technology | Adoption Level (Stars) |
|---|---|---|---|
| Superpowers | Strict process (TDD) | Claude/Amplifier plugins | 149,000 |
| GSD | Environment quality (Isolation) | TypeScript / Filesystem | 35,000 |
| Beads | Structural memory (Graphs) | Go / Dolt (SQL+Git) | ~91,000 |
Native Capabilities and Boundaries of Claude Code in 2026
Claude Code has consolidated its position as the leading agentic orchestration environment by natively integrating functions that previously required complex external frameworks.
The Dichotomy Between Plan Mode and Ultraplan
Claude Codeâs architecture imposes a deliberate separation between the âthinkingâ and âdoingâ phases through Plan Mode. In this read-only mode, Claude analyzes the codebase, performs grep searches, and proposes a detailed approach without modifying a single file, allowing the developer to audit the strategy before committing changes.
The qualitative leap occurs with Ultraplan (Research Preview), a cloud service that allows delegating massive planning tasks to Anthropicâs infrastructure. Ultraplan runs the Opus 4.6 model for periods of up to 30 minutes, generating plans that include automatic Mermaid diagrams to visualize structural dependencies.
Distributed Governance: CLAUDE.md and Agent Skills
Standard configuration in Claude Code is centralized in the CLAUDE.md file, which acts as the repositoryâs persistent memory. Best practices suggest keeping this file under 200 lines to optimize token consumption in each conversation turn.
Agentic Context Engineering: The Academic Foundation of Reliability
Recent research has identified that the limit of agent performance does not reside in model size, but in the quality of the injected information. Context Engineering (CE) has been formalized as the discipline responsible for designing the information environment surrounding the model at inference time.
The Challenge of âBrevity Biasâ and Context Collapse
Studies published in 2025 document that prompt optimization systems tend toward Brevity Bias, a tendency to generate excessively short instructions that sacrifice critical domain details, leading to systematic errors in complex tasks. Likewise, the phenomenon of context collapse has been observed, where iterative rewriting of conversation summaries erodes technical nuances.
The ACE (Agentic Context Engineering) framework proposes mitigating these failures by treating contexts as evolutionary âplaybooksâ that are updated through incremental delta inputs, rather than monolithic substitutions.
Tokenomics: The Economy of the Cognitive Resource
âTokenomicsâ analyzes how resources are distributed in multi-agent systems. Evidence suggests that the main cost of agentic software engineering does not lie in initial code generation, but in the refinement and automatic verification phases.
Conclusion: Toward Mature and Sustainable Agentic Governance
The exhaustive analysis of the Spec-Driven Development ecosystem in 2026 allows us to conclude that software engineering has moved beyond the âvibe codingâ stage and entered an era of procedural rigor.
The dialectical tension between upfront design and agility is not resolved by eliminating one of the poles, but through the elevation of abstraction. Specifications, far from being bureaucratic walls, are the connective tissue that enables swarms of agents to collaborate autonomously under effective human governance.
Frameworks such as OpenSpec and GSD offer the necessary technical solutions to manage the inherent limitations of language models (context and memory), while methodologies such as Superpowers and initiatives such as the PUMA Project establish the quality standards and empirical evaluation essential for organizational adoption.
The future of TIC project management lies in the engineerâs ability to evolve toward the role of system designer and âagent trainerâ, where the detailed and auditable specification becomes the new source code of the intelligent organization.
Cited Works
(The full list of 61 references is preserved here as in the original document â let me know if you want this section cleaned up or formatted as a proper reference list.)