Building an Agentic Workflow for Automated Data Quality Debugging

How a ReACT-based AI agent autonomously investigates dbt test failures, traces data lineage, implements fixes, and generates documentation.

ReACT Pattern dbt Data Quality AI Agents

Contents

  1. The Problem: Manual Data Quality Debugging
  2. The Agentic Solution
  3. The ReACT Pattern at Work
  4. Architecture and Tools
  5. The Investigation Flow
  6. Worked Example
  7. What Makes It Truly Agentic
  8. Extension: Automated Merge Requests
  9. Results and Impact
  10. Limitations and Honest Assessment
  11. Lessons Learned
  12. Future Directions

The Problem: Manual Data Quality Debugging

Data quality failures are inevitable in modern data warehouses. A uniqueness test fails at 3 AM. A null constraint is violated in production. A relationship test breaks downstream pipelines. The traditional response? A data engineer wakes up, starts investigating, and begins a manual process that can take hours.

Tools like dbt have made data testing straightforward. You define tests in YAML, run them, and get pass/fail results. But when a test fails, the real work begins:

  1. Understanding the failure — Which table? Which column? What constraint was violated?
  2. Finding the data — Query the warehouse to see the actual failing records.
  3. Tracing lineage — Where did the bad data come from? Which upstream table?
  4. Multi-level investigation — Often the root cause isn't in the immediate upstream table, but 2–3 hops back.
  5. Root cause analysis — Logic bug? Source data issue? Join explosion?
  6. Solution design — Deduplication? Null handling? Join refinement?
  7. Implementation — Modify SQL, test locally, verify the fix.
  8. Documentation — Record findings so knowledge isn't lost.

This process is time-consuming (several hours per investigation), context-heavy (requires understanding data models, business logic, and SQL), error-prone (easy to miss upstream causes), and knowledge-siloed (investigation insights live in someone's head or scattered across Slack threads).

The question:

What if an AI agent could handle the entire investigation autonomously—tracing data lineage through multiple tables, identifying root causes, implementing fixes, and generating comprehensive documentation?

The Agentic Solution

We built a multi-tool agentic workflow, grounded in the ReACT (Reasoning + Acting) pattern, that handles dbt test failure investigation end to end. Given a failing test name, the agent autonomously investigates the failure, traces causation through the data lineage, designs a context-appropriate fix, implements it, and generates a comprehensive investigation report.

The key insight is that debugging data quality failures follows a reasoning loop, not a fixed script. Each finding informs what to investigate next. The depth of investigation varies. The solution depends on what the agent discovers along the way. This is a natural fit for an agentic approach.

The ReACT Pattern at Work

The workflow follows the ReACT framework (Yao et al., 2022): the agent iterates through cycles of Thought, Action, and Observation until it reaches a resolution.

The ReACT Loop
THOUGHT Reason about the current situation. Decide next step. ACTION Use a tool: parse manifest, query data, trace lineage, read SQL. OBSERVATION Process tool output. Update understanding. Enough info? Loop or act. Repeat until root cause is found and fix is implemented

Each iteration looks like this:

Thought

"This is a uniqueness test failure on a surrogate key. I need to find which records are duplicated and why."

Action

Query the warehouse for duplicate values of the surrogate key column.

Observation

"Found 2 complete row duplicates. The rows are identical. Need to understand how they were generated."

The agent doesn't execute a predetermined script. It reasons about findings, decides what to do next, and terminates when it has enough evidence—whether that takes 3 iterations or 12.

Architecture and Tools

The system comprises three custom Python tools and an AI agent orchestrator:

System Architecture
AI Agent (LLM + Skill Prompt) Orchestrates the ReACT loop Manifest Parser Extracts test metadata: type, model, column, deps Warehouse Query Executes SQL against the data warehouse SQL Lineage Tracer Traces column origins through CTEs & refs IDE File Operations Read SQL models | Edit code | Write reports Code Fix Implemented HTML Investigation Report Human Review

Tool Details

Tool Input Output ReACT Role
Manifest Parser dbt manifest.json + test name Test type, target model, column, SQL file path, dependencies Perception
Warehouse Query SQL query string Structured query results (JSON/table) Investigation
SQL Lineage Tracer SQL file path + column name Upstream sources, CTE chain, column transformations Analysis
File Read/Write File path File contents / modified file Perception + Modification

The AI agent doesn't need specialized orchestration code. It operates within an AI coding assistant (like Cursor with Claude) guided by a detailed skill prompt—a structured document that teaches the agent the investigation methodology, tool usage patterns, solution design heuristics, and reporting format.

The Investigation Flow

When given a failing test name, here is the typical autonomous workflow. Note that the flow is not rigid—the agent may skip steps, repeat steps, or investigate different branches depending on what it finds.

1

Context Gathering

Action: Parse dbt manifest for the failing test.

Learns: Test type (unique/not_null/relationship), target model and column, SQL file location, upstream dependencies.

2

Data Investigation

Action: Query the warehouse for failing records.

Learns: Specific records that violate the constraint—duplicate keys, null values, orphaned foreign keys. Patterns in the data.

3

Code Analysis

Action: Read the SQL model file.

Learns: Transformation logic, join patterns, aggregation logic, existing deduplication attempts.

4

Lineage Tracing (Recursive)

Action: Trace the failing column through CTEs and upstream models.

Decision point: If the root cause isn't in this model, the agent follows the lineage upstream and repeats steps 2–4. It can trace 3–4 levels deep.

5

Root Cause Determination

Action: Synthesize findings across all investigated layers.

Determines: Duplicate source records? Join explosion? Missing aggregation? Logic bug?

6

Solution Design & Implementation

Action: Design a context-appropriate fix considering business logic, performance, and correctness. Implement the change in the SQL model file.

7

Documentation

Action: Generate a comprehensive HTML investigation report with executive summary, data flow diagrams, investigation queries with results, code changes, and verification queries.

Worked Example

To make this concrete, here's a simplified walkthrough of a real investigation.

Failing test: unique_order_items_item_sk

A uniqueness test on a surrogate key column in an order items model.

Iteration 1: Understand the test

Thought

A uniqueness test is failing on a surrogate key. I need to parse the manifest to find out which model and column.

Action

Run manifest parser → target model is order_items, column is item_sk, SQL file is located at models/mart/order_items.sql.

Observation

This is a uniqueness test on a surrogate key. Surrogate key duplicates usually mean rows that should be distinct are generating the same hash.

Iteration 2: Find the duplicates

Thought

Query the table for duplicated surrogate keys to see how many and which ones.

Action

Query: SELECT item_sk, COUNT(*) FROM order_items GROUP BY 1 HAVING COUNT(*) > 1 → 2 duplicate records found for key a3f8c1...

Observation

Only 2 duplicates, and the rows are completely identical—not just key collisions. This means the same source record is appearing twice.

Iteration 3: Analyze the SQL

Thought

Read the model SQL to understand how duplicates could emerge.

Action

Read SQL file → model uses UNION ALL combining 4 CTEs (one per data source), each with its own GROUP BY.

Observation

UNION ALL preserves duplicates. If any source CTE produces duplicates, they'll persist. I see a commented-out QUALIFY clause. Need to check upstream.

Iteration 4: Trace upstream

Thought

Trace the lineage for item_sk to find which source CTE the duplicate originates from.

Action

Lineage tracer → column composed from item_id, source_type, event_timestamp. One branch comes from the web_events CTE. Query that source → source has multiple records per item_id.

Observation

Root cause found: source table has duplicate records per item that produce identical surrogate keys. The GROUP BY in the CTE wasn't preventing this because it grouped on different columns.

Iteration 5: Design and implement fix

qualify
    row_number() over (
        partition by
            item_id,
            source_type,
            event_type,
            event_timestamp
        order by
            priority_rank,  -- Business logic: prefer higher priority
            event_id        -- Deterministic tie-breaker
    ) = 1

The agent chose QUALIFY ROW_NUMBER() over DISTINCT or GROUP BY because:

What Makes It Truly Agentic

This workflow goes beyond simple automation. Here are the characteristics that make it genuinely agentic:

1. Autonomous Goal Pursuit

The agent receives a high-level goal ("fix this test") and autonomously determines the investigation strategy. It doesn't follow a fixed script—it decides when to query upstream, how deep to trace, and when it has enough evidence to design a solution.

2. Dynamic Decision Making

Every investigation is different. The agent adapts its approach based on the test type:

3. Variable Investigation Depth

This is the critical agentic feature. Sometimes the root cause is in the immediate model. Sometimes it's 3–4 tables upstream. The agent doesn't stop at a fixed depth—it continues the ReACT loop until it finds the actual root cause.

Variable Depth Investigation
Level 0 Level 1 Level 2 Failing Model Upstream A Upstream B Source Table Root cause here! Source Table Agent traces

4. Context-Aware Solution Design

The agent doesn't apply template fixes. It synthesizes everything it has learned to design a solution that:

5. Tool Orchestration

Each tool invocation is informed by the previous one. The agent chains tools dynamically:

Manifest"Uniqueness test on item_sk"Warehouse"2 duplicates found, here are the IDs"File Read"Model has UNION ALL with GROUP BYs"Lineage"Column comes from web_events CTE"Warehouse"Source has multiple records per ID"Reasoning"Need dedup based on SK components"Implement"QUALIFY ROW_NUMBER() applied"

Extension: Automated Merge Requests

The base workflow ends with a human review checkpoint: the agent implements the fix and creates documentation, then waits for approval. A natural extension is automating the merge request creation.

Extended Workflow with Merge Request Automation
Agent (Autonomous) Investigate ReACT loop Implement Fix Edit SQL model Document HTML report Create MR Branch, commit, push, MR with description + report Human (Review Only) Review MR: code + investigation + verification Approve or request changes Merge to Production

With this extension, the merge request becomes a self-contained work package: code changes, investigation rationale, data evidence, and verification queries all in one reviewable artifact.

The human-in-the-loop sweet spot:

The agent automates the tedious, investigative, mechanical work. Humans retain control of the critical decision: whether to merge to production. Not too early (human doesn't need to investigate), not too late (human doesn't blindly trust the agent).

Results and Impact

~10x
Investigation Speed
Hours → minutes per failure
100%
Documentation Rate
Every investigation produces a report
24/7
Coverage
Agent investigates at any hour
Searchable
Knowledge Base
Reports become institutional memory

Qualitative Improvements

Limitations and Honest Assessment

This workflow is not a silver bullet. Being clear about limitations is important for setting expectations.

What it handles well

What it struggles with

Cost considerations

Challenges Overcome in Development

ChallengeSymptomSolution
Infinite recursion Agent traces lineage endlessly Max depth limits + "found source table" termination
Query costs Full table scans on large tables Smart sampling, partition filters, LIMIT clauses
Solution overreach Agent refactors entire models Explicit "minimal targeted fix" guidance in prompt
False root causes Stops at surface symptoms Require verification queries that prove the fix

Lessons Learned

Tool quality is the foundation

The agent is only as good as its tools. We invested heavily in the manifest parser, lineage tracer, and query executor before building the agent logic. Tools that return well-structured JSON/tables enable much better agent reasoning than free-text outputs.

Prompt engineering scales with complexity

The skill prompt is several hundred lines because the agent needs detailed guidance on when to stop investigating, how to structure reports, what makes a good solution, and how to handle edge cases. This isn't a limitation—it's how you encode domain expertise into an agentic system.

Documentation is a first-class output

The HTML investigation reports aren't afterthoughts. They're how humans understand agent decisions, how knowledge gets preserved, and how auditing happens. Treating documentation as a core deliverable (not a nice-to-have) changed how we designed the entire workflow.

Sandbox-first development

Early in development, the agent occasionally proposed aggressive fixes. Having it work in a dev environment first—and requiring verification queries before declaring success—was essential for building confidence.

The right human checkpoint

Full autonomy isn't the goal. The agent automates investigation, analysis, and implementation. Humans retain judgment over production changes. This division of labor plays to each party's strengths.

Future Directions

Self-Correction Loop

The agent implements a fix, runs the test in a sandbox, and if it still fails, analyzes why and tries an alternative approach. This moves from single-shot fixes to iterative problem-solving.

Learning from History

When a new failure occurs, the agent searches past investigation reports for similar patterns: "This looks like investigation #47: a duplicate UNION ALL issue." It applies proven solution patterns rather than investigating from scratch.

Proactive Detection

Instead of reacting to failures, the agent monitors test runs for flakiness trends—tests that intermittently fail or show degrading data quality—and investigates before a full failure occurs.

Multi-Agent Collaboration

Specialized sub-agents handle different aspects of investigation: a SQL analysis agent reviews query patterns, a data profiling agent analyzes distributions, a testing agent generates comprehensive test cases. A coordinator agent delegates and synthesizes their findings.

· · ·

Conclusion

Building an agentic workflow for dbt test debugging transformed data quality incidents from multi-hour firefighting into autonomous, well-documented resolutions. The workflow exhibits key agentic characteristics:

The future isn't about replacing data engineers. It's about elevating them from reactive debugging to proactive system design. Agentic workflows make that shift possible today.

References

Yao, S. et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv:2210.03629.

This article describes a real agentic workflow built for automated data quality debugging.
The architecture and tools described are production-tested.