Test-Driven and Spec-Driven Development: Building Lossless Feedback Loops for Agentic Coding

The rise of AI coding agents has fundamentally changed how we write software. But with this power comes a new set of challenges: code that works but nobody understands, specifications that drift from implementation, and the gradual erosion of architectural intent. The solution lies in adapting two proven methodologies—Test-Driven Development (TDD) and Spec-Driven Development (SDD)—to create lossless feedback loops that preserve intent, enforce correctness, and maintain coherence across AI-human collaboration.

This post explores how TDD and SDD form the foundation of sustainable agentic coding workflows, with practical examples using Claude Code.

The Feedback Loop Problem in Agentic Coding

Traditional software development operates on a simple feedback loop:

Write code → 2. Test/Run → 3. Debug → 4. Refine → Repeat

This loop is lossy by nature. Each iteration loses context: why a decision was made, what alternatives were considered, what constraints guided the implementation. Over time, the original intent becomes archaeological—buried in commit messages and faded memories.

AI agents amplify this problem exponentially. Consider a typical interaction:

You: "Add authentication to the user dashboard" AI: Generates 500 lines of code across 8 files You: "Looks good, ship it"

Three months later, you discover the implementation uses session tokens stored in localStorage—a security vulnerability. But why was this approach chosen? What were the trade-offs? The AI's reasoning is ephemeral, lost the moment the chat window closed.

This is lossy agentic coding—rapid output, zero institutional memory.

Test-Driven Development: Executable Specifications

TDD inverts the traditional development sequence:

Write failing test → 2. Write minimal code to pass → 3. Refactor → Repeat

For human developers, TDD enforces discipline and design thinking. For AI agents, TDD becomes something more powerful: executable intent preservation.

Why TDD Works for AI Agents

AI models are excellent at pattern matching and code generation but lack intrinsic understanding of business requirements. Tests act as grounding artifacts—unambiguous contracts that survive across sessions, developers, and even model versions.

Consider this traditional prompt:

Create a function that calculates shipping costs based on weight and destination.

The AI might generate code that works for the happy path but fails edge cases. Now consider the TDD approach:

# tests/test_shipping.py
def test_domestic_shipping_under_5kg():
    assert calculate_shipping(weight=3.5, destination="domestic") == 8.50

def test_international_shipping_over_10kg():
    assert calculate_shipping(weight=12.0, destination="international") == 45.00

def test_zero_weight_raises_error():
    with pytest.raises(ValueError):
        calculate_shipping(weight=0, destination="domestic")

def test_negative_weight_raises_error():
    with pytest.raises(ValueError):
        calculate_shipping(weight=-1, destination="domestic")

def test_invalid_destination_raises_error():
    with pytest.raises(ValueError):
        calculate_shipping(weight=5, destination="moon")

When you prompt an AI agent with these tests, the requirement space is precisely defined. The agent cannot generate code that "looks good" but fails silently on edge cases. The tests are both specification and validation.

TDD in Claude Code: A Practical Example

Claude Code natively integrates with test frameworks. Here's a workflow:

Step 1: Define Test Suite First

// tests/auth.test.ts
describe('Authentication System', () => {
  test('should hash passwords with bcrypt', async () => {
    const password = 'SecureP@ss123'
    const hashed = await hashPassword(password)
    expect(hashed).not.toBe(password)
    expect(await verifyPassword(password, hashed)).toBe(true)
  })

  test('should reject weak passwords', async () => {
    await expect(hashPassword('123')).rejects.toThrow('Password too weak')
  })

  test('should generate JWT tokens with 1h expiry', async () => {
    const token = await generateToken({ userId: '123' })
    const decoded = jwt.decode(token)
    expect(decoded.exp - decoded.iat).toBe(3600)
  })

  test('should store tokens in httpOnly cookies, not localStorage', async () => {
    const response = await login({ username: 'test', password: 'test' })
    expect(response.headers['set-cookie']).toContain('httpOnly')
  })
})

Step 2: Prompt Claude Code

@claude Implement the authentication system to pass all tests in tests/auth.test.ts.
Use bcrypt for password hashing, JWT for tokens, and httpOnly cookies for storage.
Run the tests after implementation and fix any failures.

Claude Code will:

Read the test file to understand requirements
Generate implementation code
Run the test suite
Iterate on failures until all tests pass
Report back with test results

The critical difference: the test suite is the specification. Six months later, when you ask Claude Code to refactor the auth system, those tests ensure the security properties (httpOnly cookies, bcrypt hashing) are preserved.

Spec-Driven Development: Architectural Intent as Code

While TDD captures what code should do, SDD captures why and how at the architectural level. In agentic workflows, specifications become the primary artifact—code is merely the compiled output.

The Specification Hierarchy

Effective SDD operates on three levels:

Level 1: Functional Specifications

What the system does from a user perspective.

# User Authentication Specification

## Requirements
- Users must authenticate with email + password
- Passwords must meet complexity requirements (8+ chars, 1 uppercase, 1 number, 1 special)
- Failed login attempts are rate-limited (5 attempts per 15 minutes)
- Sessions expire after 1 hour of inactivity
- Logout invalidates the session server-side

## Security Constraints
- Passwords must be hashed with bcrypt (cost factor 12)
- Session tokens must be stored in httpOnly cookies
- CSRF protection must be implemented
- Passwords must never appear in logs

Level 2: Technical Specifications

How the system achieves requirements architecturally.

# Authentication Architecture

## Data Model
- User table: id (UUID), email (unique), password_hash, created_at, updated_at
- Session table: id (UUID), user_id (FK), token_hash, expires_at, created_at

## API Endpoints
- POST /auth/login → Returns Set-Cookie header
- POST /auth/logout → Deletes session from DB
- GET /auth/me → Returns current user (requires valid session cookie)

## Dependencies
- bcrypt (password hashing)
- jsonwebtoken (token generation)
- express-rate-limit (rate limiting)
- csurf (CSRF protection)

Level 3: Implementation Specifications

How the code is structured.

# Implementation Guidelines

## File Structure
/src
  /auth
    /controllers
      - loginController.ts
      - logoutController.ts
    /services
      - passwordService.ts (hashing, verification)
      - sessionService.ts (CRUD operations)
    /middleware
      - authMiddleware.ts (session validation)
      - rateLimitMiddleware.ts
    /validators
      - passwordValidator.ts
    /models
      - User.ts
      - Session.ts

## Coding Standards
- All async operations must include try-catch error handling
- Database queries must use parameterized statements (no string interpolation)
- All endpoints must include OpenAPI/Swagger documentation

Why SDD Creates Lossless Loops

Unlike chat-based prompts, specifications are versioned, diffable, and reviewable. When an AI agent generates code from a spec, you can:

Validate against the spec during code review
Update the spec when requirements change
Regenerate code from the updated spec
Detect drift when code diverges from specification

This is the "lossless" property: intent is never lost because it's explicitly encoded.

SDD in Claude Code: OpenSpec Integration

Claude Code can work with spec-driven frameworks like OpenSpec. Here's a workflow:

Step 1: Create Specification

# Initialize OpenSpec in your project
openspec init

# Create a change proposal
openspec proposal add-rate-limiting

This creates openspec/changes/add-rate-limiting/proposal.md:

# Add Rate Limiting to Authentication

## Motivation
Prevent brute-force attacks on login endpoint.

## Requirements
- Limit login attempts to 5 per 15 minutes per IP address
- Return 429 status code when limit exceeded
- Include Retry-After header in 429 responses

## Technical Approach
- Use express-rate-limit middleware
- Configure Redis as backing store for distributed systems
- Apply middleware only to POST /auth/login

## Tasks
- [ ] Install express-rate-limit and redis dependencies
- [ ] Create rateLimitMiddleware.ts
- [ ] Configure Redis connection
- [ ] Apply middleware to login route
- [ ] Add tests for rate limiting behavior
- [ ] Update API documentation

Step 2: Prompt Claude Code

@claude Implement the rate limiting feature according to the proposal in
openspec/changes/add-rate-limiting/proposal.md. Mark tasks as completed
in the proposal as you finish them.

Claude Code will:

Read the proposal file
Understand the context and constraints
Implement each task sequentially
Update the proposal with completion status
Run tests to validate

Step 3: Archive the Change

openspec archive add-rate-limiting

This merges the delta into openspec/specs/, creating a permanent record. Future AI sessions can reference this specification to understand why rate limiting was implemented this way.

Lossless Feedback Loops: The Convergence

When TDD and SDD are combined, they create a lossless feedback loop:

┌─────────────────────────────────────────────────────────────┐
│                    SPECIFICATION LAYER                       │
│  ┌────────────────────────────────────────────────────┐     │
│  │ Functional Specs + Technical Specs + Tests         │     │
│  │ (Intent, Architecture, Validation)                 │     │
│  └─────────────────┬──────────────────────────────────┘     │
└────────────────────┼────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                  AI AGENT (Claude Code)                      │
│  ┌────────────────────────────────────────────────────┐     │
│  │ 1. Read Spec     4. Run Tests                      │     │
│  │ 2. Generate Code 5. Fix Failures                   │     │
│  │ 3. Implement     6. Update Spec if needed          │     │
│  └─────────────────┬──────────────────────────────────┘     │
└────────────────────┼────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                   CODE + TEST RESULTS                        │
│  ┌────────────────────────────────────────────────────┐     │
│  │ Implementation + Test Pass/Fail + Coverage         │     │
│  └─────────────────┬──────────────────────────────────┘     │
└────────────────────┼────────────────────────────────────────┘
                     │
                     ▼
              Human Review
                     │
         ┌───────────┴───────────┐
         │                       │
         ▼                       ▼
     Approve                 Refine Spec
         │                       │
         └───────────┬───────────┘
                     │
                     ▼
                 Git Commit
            (Spec + Code + Tests)

What Makes This "Lossless"?

Intent Preservation: Specifications document why decisions were made
Validation Automation: Tests ensure behavior matches intent
Auditability: Git history tracks spec evolution alongside code
Reproducibility: Any AI agent can regenerate code from specs + tests
Knowledge Transfer: New developers (human or AI) read specs, not code

Compare this to traditional "lossy" development:

Property	Traditional AI Coding	TDD + SDD Approach
Intent documentation	Chat logs (ephemeral)	Versioned specs (permanent)
Correctness validation	Manual testing	Automated test suites
Architectural consistency	Implicit (in AI's training)	Explicit (in spec)
Onboarding new devs	Read code + guess	Read spec + run tests
Refactoring safety	High risk (may break unknowns)	Low risk (tests catch regressions)

Practical Workflow: Building a Feature with Claude Code

Let's walk through a complete example: adding dark mode to a web application.

Phase 1: Write the Specification

Create docs/specs/dark-mode.md:

# Dark Mode Feature Specification

## Functional Requirements
- Users can toggle between light and dark themes
- Theme preference persists across sessions
- System theme (OS-level) is detected and applied by default
- Smooth transitions between themes (no flash of unstyled content)

## Technical Requirements
- Theme state managed in React Context
- Preference stored in localStorage
- CSS variables for theming
- Support for system theme via prefers-color-scheme media query

## Design Constraints
- Toggle must be accessible (keyboard navigable, ARIA labels)
- Color contrast must meet WCAG AA standards in both themes
- Theme switch must not cause layout shift

## Implementation Files
- src/contexts/ThemeContext.tsx (state management)
- src/components/ThemeToggle.tsx (UI control)
- src/styles/themes.css (CSS variables)
- src/hooks/useTheme.ts (custom hook)

Phase 2: Write Tests First

Create tests/theme.test.tsx:

import { render, screen, fireEvent } from '@testing-library/react'
import { ThemeProvider, useTheme } from '@/contexts/ThemeContext'
import ThemeToggle from '@/components/ThemeToggle'

describe('Theme System', () => {
  beforeEach(() => {
    localStorage.clear()
    // Mock matchMedia
    Object.defineProperty(window, 'matchMedia', {
      value: jest.fn().mockImplementation(query => ({
        matches: query === '(prefers-color-scheme: dark)',
        addEventListener: jest.fn(),
        removeEventListener: jest.fn(),
      })),
    })
  })

  test('defaults to system theme when no preference stored', () => {
    const TestComponent = () => {
      const { theme } = useTheme()
      return <div data-testid="theme">{theme}</div>
    }

    render(
      <ThemeProvider>
        <TestComponent />
      </ThemeProvider>
    )

    expect(screen.getByTestId('theme')).toHaveTextContent('dark')
  })

  test('persists theme preference to localStorage', () => {
    render(
      <ThemeProvider>
        <ThemeToggle />
      </ThemeProvider>
    )

    fireEvent.click(screen.getByRole('button'))
    expect(localStorage.getItem('theme')).toBe('light')
  })

  test('applies correct CSS class to document root', () => {
    render(
      <ThemeProvider>
        <ThemeToggle />
      </ThemeProvider>
    )

    expect(document.documentElement.classList.contains('dark')).toBe(true)
    fireEvent.click(screen.getByRole('button'))
    expect(document.documentElement.classList.contains('light')).toBe(true)
  })

  test('toggle button has proper ARIA labels', () => {
    render(
      <ThemeProvider>
        <ThemeToggle />
      </ThemeProvider>
    )

    const button = screen.getByRole('button')
    expect(button).toHaveAttribute('aria-label', 'Toggle theme')
  })
})

Phase 3: Prompt Claude Code

@claude Implement dark mode according to docs/specs/dark-mode.md.
All tests in tests/theme.test.tsx must pass. Use the existing
Tailwind CSS setup. Ensure WCAG AA contrast ratios.

Phase 4: Claude Code Executes

Claude Code's internal process:

Reads spec to understand requirements
Reads tests to understand validation criteria
Scans existing codebase to find Tailwind config
Generates implementation:
- Creates ThemeContext with localStorage persistence
- Implements system theme detection
- Creates ThemeToggle component
- Adds CSS variables for light/dark themes
Runs test suite: npm test
Iterates on failures until all tests pass
Reports results

Phase 5: Human Review

You review:

Spec was followed (check generated code against docs/specs/dark-mode.md)
Tests pass (automated validation)
Visual testing (manual check for smooth transitions)

If issues found:

Update spec if requirements were unclear
Add tests if edge case discovered
Refine prompt for Claude Code to regenerate

Phase 6: Commit Everything Together

git add docs/specs/dark-mode.md tests/theme.test.tsx src/contexts/ src/components/
git commit -m "feat: implement dark mode

- Add theme context with localStorage persistence
- Implement system theme detection
- Create accessible theme toggle component
- All tests passing (12/12)

Implements: docs/specs/dark-mode.md
Tests: tests/theme.test.tsx"

The commit captures spec + tests + implementation as an atomic unit. Future developers (or AI agents) can trace implementation back to intent.

Advanced Pattern: Specification-as-Contract

For complex features, combine SDD with contract testing:

# specs/api/user-auth.contract.yaml
openapi: 3.0.0
paths:
  /auth/login:
    post:
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                email:
                  type: string
                  format: email
                password:
                  type: string
                  minLength: 8
              required: [email, password]
      responses:
        '200':
          description: Successful login
          headers:
            Set-Cookie:
              schema:
                type: string
                pattern: 'auth_token=.*; HttpOnly; Secure'
        '401':
          description: Invalid credentials
        '429':
          description: Rate limit exceeded
          headers:
            Retry-After:
              schema:
                type: integer

Claude Code can generate both server implementation AND client SDK from this contract:

@claude Generate Express.js implementation and TypeScript client
SDK from specs/api/user-auth.contract.yaml. Ensure contract tests
pass for both.

This enables contract-first development: frontend and backend teams (or AI agents) work in parallel against the same specification.

Best Practices for TDD + SDD with AI Agents

1. Specs Before Code, Always

Never prompt "build X" without a specification. Even simple features benefit from explicit requirements.

Bad: @claude add user profile page

Good:

@claude Implement user profile page per docs/specs/user-profile.md.
Tests must pass in tests/profile.test.tsx.

2. Tests as Acceptance Criteria

Write tests that encode business logic, not implementation details.

Bad: expect(UserService.query).toHaveBeenCalledWith('SELECT * FROM users')

Good: expect(await getUser('123')).toEqual({ id: '123', name: 'Alice' })

3. Version Specs Alongside Code

Store specifications in docs/specs/ under version control. Update specs when requirements change, then regenerate code.

4. Use Spec Frameworks

Tools like OpenSpec, GitHub Spec Kit, or even simple Markdown + YAML provide structure. Unstructured prompts lead to unstructured results.

5. Automate Spec Validation

Use tools like Spectral (OpenAPI linting), JSON Schema validators, or custom scripts to ensure specs are well-formed before handing to AI.

6. Iterate on Specs, Not Code

When AI-generated code is wrong, ask: "Is the spec unclear?" Update the spec and regenerate rather than manually fixing code.

7. Treat AI Agent as Junior Developer

Would you tell a junior dev "just build auth"? No—you'd provide requirements, examples, and validation criteria. Do the same for AI.

The Future: Specification-Only Development

The trajectory is clear: code becomes an implementation detail.

In the near future, developers will:

Write specifications (functional + technical + tests)
AI agents generate code automatically
Tests validate correctness
Humans review only the spec + test results, not code
Code is treated like compiled binaries—never directly edited

This is already emerging in domains like Infrastructure-as-Code (Terraform), where HashiCorp's Copilot generates configurations from natural language specs.

Claude Code is positioned at the forefront of this shift. When you write:

docs/specs/payment-processing.md (the WHAT and WHY)
tests/payment.test.ts (the validation)

And prompt:

@claude implement docs/specs/payment-processing.md with all tests passing

You're operating in specification-only mode. The code that emerges is guaranteed correct (tests pass) and guaranteed aligned (implements spec).

This is the "lossless" endgame: intent → specification → validated implementation, with zero information loss across the chain.

Conclusion

TDD and SDD aren't new—they're battle-tested methodologies from decades of software engineering. What's new is their necessity in agentic coding.

AI agents are powerful but ephemeral. They generate code rapidly but forget context instantly. Without TDD and SDD, every AI interaction is a one-shot gamble—code that works today but becomes unmaintainable tomorrow.

By contrast, when you combine:

Specifications (intent preservation)
Tests (correctness validation)
AI agents (implementation automation)

You create a lossless feedback loop where:

Intent is explicit and versioned
Correctness is automated and continuous
Knowledge is transferable across developers and AI sessions
Refactoring is safe and predictable

The future of software development isn't "AI writes all the code." It's "humans write intent, AI compiles it to code, tests validate it." TDD and SDD are the bridge between human intent and machine execution.

Start small:

Next feature request → write a spec first
Before prompting AI → write failing tests
After AI generates code → verify against spec and tests
When you commit → commit spec + tests + code together

Your future self (and future AI agents) will thank you.

References and Further Reading

GitHub Spec Kit - GitHub's spec-driven development framework
OpenSpec - Lightweight spec-driven development for AI workflows
Test-Driven Development: By Example - Kent Beck's seminal book
Claude Code Documentation - Official Claude Code guide
Agent Skills Standard - Modular orchestration for AI agents
The RSpec Book - Behavior-driven development principles
OpenAPI Specification - Contract-first API development