🌐 The Evolution of Agentic Software Engineering: An Analytical Study of Spec-Driven Development (SDD) and Its Impact on TIC Project Governance

The software development industry is at a historic inflection point, transitioning from a phase of enthusiastic experimentation with generative artificial intelligence — often referred to as “vibe coding” — toward a paradigm of technical and methodological rigor known as Spec-Driven Development (SDD).

This transition is not merely superficial; it represents the maturation of Agentic Software Engineering (SE 3.0), where the center of gravity shifts from the manual writing of lines of code to the orchestration of complex systems of autonomous agents operating under strict specifications.

In this new scenario, the specification ceases to be a passive post-development documentation artifact and becomes the primary artifact, the single source of truth, and the executable contract that guides the logic of the agents.

This study analyzes the depth of this change, evaluating the dialectical tensions between upfront design and agility, the effectiveness of leading frameworks and emerging tools, while integrating these findings into the empirical context of the PUMA Project (PUMA Understanding & Management with Agents).

The Decision Architecture Debate: Upfront Design vs. Agentic Agility

The adoption of Spec-Driven Development has revived a classic controversy in software engineering: the value of detailed design upfront versus iterative flexibility. The fundamental difference in 2026 is that this debate no longer only involves humans, but centers on how machines consume context and intent.

Marmelab’s Critical Perspective: The Specter of the Waterfall Model

From Marmelab’s point of view, SDD runs the risk of being a regression toward the bureaucratic practices of the Waterfall model, disguised under modern terminology. François Zaninotto’s critique is based on the observation that tools such as Spec-kit or Kiro force developers to generate a massive amount of Markdown documentation before allowing code generation. This “Markdown madness” can result in developers spending 80% of their time reading and reviewing AI-generated prose instead of thinking critically about the architecture.

Marmelab identifies a phenomenon of “context blindness” where, despite detailed specifications, AI agents often omit existing functions in the codebase because their search is limited to the text provided in the specification. Additionally, it is argued that SDD introduces a “false sense of security”, since in practice agents do not always follow the specification to the letter, marking tasks as completed without having implemented the corresponding unit tests.

In response to this, Marmelab proposes “Natural Language Development”, an evolution of the Agile paradigm where incremental and simple instructions are used to converge toward the product in an organic way.

Marc Brooker’s (AWS) Defense: Elevating the Level of Abstraction

Marc Brooker, Distinguished Engineer at AWS, offers a robust counterargument, maintaining that SDD is not Waterfall because it does not seek to fix all requirements “up-front”, but rather to elevate the design (“pulling designs up”) to a higher level of abstraction. For Brooker, specifications are living, versioned, and explicit artifacts that act as a “map” for AI agents.

The key distinction is autonomy. While “vibe coding” is like giving step-by-step instructions (turn-by-turn directions), which forces the human to remain constantly in the feedback loop, SDD provides the agent with the ability to operate autonomously for longer periods by understanding the “big picture” and the final objectives. Brooker argues that this separation between requirements and implementation allows the human to focus on managing internal requirement conflicts and strategic decision-making, leaving the AI to derive the technical implementation more efficiently.

Implementation Levels According to Birgitta Böckeler

To disambiguate the term SDD, Birgitta Böckeler proposes on Martin Fowler’s site a taxonomy of three implementation levels that is essential for assessing an engineering team’s maturity:

SDD LevelMain CharacteristicHuman RoleCode State
Spec-firstThe specification guides the initial task and is then discardedDesigns the initial promptSource of truth post-task
Spec-anchoredThe specification is maintained throughout the functionality lifecycleEdits the specification for any changeSynchronized with the spec
Spec-as-sourceThe specification is the only source file the human editsDoes not touch the code directlyArtifact generated by AI

Böckeler warns that the “Spec-as-source” level is a form of Model-Driven Development (MDD) which, while eliminating the need for complex domain-specific languages (DSLs), introduces the challenge of language model non-determinism, which can hinder reproducibility.

Technical Evaluation of Frameworks: OpenSpec, BMAD, and GitHub Spec Kit

The SDD framework ecosystem in 2026 shows remarkable diversification, with tools optimized for different project scales and types of codebases.

OpenSpec: The Efficiency of the Brownfield Approach and Delta Specs

OpenSpec has positioned itself as the fastest-growing framework, with an 863% increase in its user base during the last semester. Its competitive advantage lies in its “brownfield-first” approach, specifically designed to work on existing codebases where full respecification is not feasible.

The technical innovation of OpenSpec is the use of “delta specifications”. Instead of processing massive documents, the system isolates proposed changes in independent folders (openspec/changes/), using ADDED, MODIFIED, and REMOVED markers. This strategy drastically reduces token consumption and the cognitive load on the human reviewer, while keeping artifacts lightweight (~250 lines versus 800+ in competing frameworks). Additionally, its interoperability is superior, supporting more than 20 coding assistants through universal slash commands and the AGENTS.md protocol.

GitHub Spec Kit: The Constitution and Corporate Standard

Spec Kit, developed by GitHub, represents the de facto standard for projects seeking strict governance from the start. Its architecture is based on the Constitution (constitution.md), a document that defines non-negotiable principles such as test coverage standards, naming conventions, and architectural constraints that the AI must respect in every iteration.

The Spec Kit flow is rigorous and closed, based on a Plan-Act-Review cycle. Although it produces high-quality artifacts, such as OpenAPI specifications and structured user journeys, it has been criticized for its administrative heaviness. Teams working on small tasks have reported that a simple feature can generate up to 8 files and 1,300 lines of text, confirming Marmelab’s concerns about excessive verbosity.

BMAD (Breakthrough Method of Agile AI-Driven Development): Simulation of Agile Teams

The BMAD method stands out as the most architecturally complex framework, designed to simulate a complete agile development team through more than 12 specialized agent roles (Analyst, Architect, QA, etc.). BMAD introduces the concept of “Agile Intelligence”, where agents not only execute tasks but also participate in elicitation flows and mid-process course correction.

Its “Quick Flow” version attempts to solve the entry friction of the full version, allowing a faster transition from technical requirements to immediate coding, although with less planning depth. A critical differentiator of BMAD is its adversarial code review (/bmad-bmm-code-review), where a control agent actively looks for failures that a standard review would miss before authorizing the cycle closure.

Emerging Tools and Agentic Discipline: Superpowers, GSD, and Beads

To complement governance frameworks, specialized tools have emerged to inject discipline, memory, and context isolation into the agents’ workflow.

Superpowers (Jesse Vincent): TDD and Psychological Persuasion

Superpowers, created by Jesse Vincent, is a framework that enforces a strict development methodology based on Test-Driven Development (TDD). The premise of Superpowers is radical: the agent is forbidden from writing production code unless a failing unit test that validates it already exists. If an agent attempts to generate implementation without tests, the framework automatically deletes the written code.

Additionally, Superpowers applies social psychology principles drawn from Robert Cialdini’s research on persuasion (authority, commitment, social proof) within the system prompts. By forcing the agent to explicitly declare which skill it is using and to commit to compliance with the rules before each response, a significant reduction in hallucinations and an increase in the precision of generated code has been achieved in projects such as the chardet library.

GSD (Get Shit Done): Isolation Against Context Rot

GSD, developed by TÂCHES, addresses the technical problem of “context rot”. It has been empirically demonstrated that the quality of language models drops drastically when the context window exceeds 50-70% of its capacity, causing the agent to forget initial instructions or introduce inconsistencies.

GSD’s solution is total context isolation. The system fragments the project into atomic phases and tasks, providing each task with a fresh agent instance with a clean 200,000-token context window. All project information resides in persistent Markdown files on disk (PROJECT.md, ROADMAP.md, STATE.md), eliminating the need to maintain a long and error-prone conversation history.

Beads (Steve Yegge): Structured Graph Memory

Beads represents the effort to provide agents with long-term memory that does not depend on flat text files. Designed by Steve Yegge, Beads uses an issue tracker based on a dependency graph, powered by Dolt technology (a SQL database with native version control).

This structure allows agents to navigate relationships between tasks (blockers, duplicates, epic hierarchies) in a deterministic way. Beads implements “semantic compaction” that automatically summarizes closed tasks, preserving the agent’s attention for current operations without losing the historical context of architectural decisions.

ToolDiscipline ApproachBase TechnologyAdoption Level (Stars)
SuperpowersStrict process (TDD)Claude/Amplifier plugins149,000
GSDEnvironment quality (Isolation)TypeScript / Filesystem35,000
BeadsStructural memory (Graphs)Go / Dolt (SQL+Git)~91,000

Native Capabilities and Boundaries of Claude Code in 2026

Claude Code has consolidated its position as the leading agentic orchestration environment by natively integrating functions that previously required complex external frameworks.

The Dichotomy Between Plan Mode and Ultraplan

Claude Code’s architecture imposes a deliberate separation between the “thinking” and “doing” phases through Plan Mode. In this read-only mode, Claude analyzes the codebase, performs grep searches, and proposes a detailed approach without modifying a single file, allowing the developer to audit the strategy before committing changes.

The qualitative leap occurs with Ultraplan (Research Preview), a cloud service that allows delegating massive planning tasks to Anthropic’s infrastructure. Ultraplan runs the Opus 4.6 model for periods of up to 30 minutes, generating plans that include automatic Mermaid diagrams to visualize structural dependencies.

Distributed Governance: CLAUDE.md and Agent Skills

Standard configuration in Claude Code is centralized in the CLAUDE.md file, which acts as the repository’s persistent memory. Best practices suggest keeping this file under 200 lines to optimize token consumption in each conversation turn.

Agentic Context Engineering: The Academic Foundation of Reliability

Recent research has identified that the limit of agent performance does not reside in model size, but in the quality of the injected information. Context Engineering (CE) has been formalized as the discipline responsible for designing the information environment surrounding the model at inference time.

The Challenge of “Brevity Bias” and Context Collapse

Studies published in 2025 document that prompt optimization systems tend toward Brevity Bias, a tendency to generate excessively short instructions that sacrifice critical domain details, leading to systematic errors in complex tasks. Likewise, the phenomenon of context collapse has been observed, where iterative rewriting of conversation summaries erodes technical nuances.

The ACE (Agentic Context Engineering) framework proposes mitigating these failures by treating contexts as evolutionary “playbooks” that are updated through incremental delta inputs, rather than monolithic substitutions.

Tokenomics: The Economy of the Cognitive Resource

“Tokenomics” analyzes how resources are distributed in multi-agent systems. Evidence suggests that the main cost of agentic software engineering does not lie in initial code generation, but in the refinement and automatic verification phases.

Conclusion: Toward Mature and Sustainable Agentic Governance

The exhaustive analysis of the Spec-Driven Development ecosystem in 2026 allows us to conclude that software engineering has moved beyond the “vibe coding” stage and entered an era of procedural rigor.

The dialectical tension between upfront design and agility is not resolved by eliminating one of the poles, but through the elevation of abstraction. Specifications, far from being bureaucratic walls, are the connective tissue that enables swarms of agents to collaborate autonomously under effective human governance.

Frameworks such as OpenSpec and GSD offer the necessary technical solutions to manage the inherent limitations of language models (context and memory), while methodologies such as Superpowers and initiatives such as the PUMA Project establish the quality standards and empirical evaluation essential for organizational adoption.

The future of TIC project management lies in the engineer’s ability to evolve toward the role of system designer and “agent trainer”, where the detailed and auditable specification becomes the new source code of the intelligent organization.

Cited Works

(The full list of 61 references is preserved here as in the original document — let me know if you want this section cleaned up or formatted as a proper reference list.)