Data Quality Is a Product, Not a Script
Treating data quality as a technical checklist misses the point. Quality is a product feature that requires product thinking—design, iteration, and user focus.
Most data quality initiatives fail because they're approached as technical exercises rather than product development.
I've seen teams build comprehensive data quality scripts, run them daily, and generate detailed reports—only to discover that nobody acts on the results. The scripts work perfectly. The quality problems persist.
This happens because data quality is treated as a compliance checkbox instead of a product that serves users.
The Script Mentality
The script mentality looks like this:
- Write SQL checks for common issues (nulls, duplicates, outliers)
- Schedule them to run daily
- Email reports to data engineers
- Assume problems get fixed
This approach creates technical artifacts (scripts, reports) without addressing the human and organizational factors that determine whether quality actually improves.
Product Thinking for Data Quality
Data quality as a product means:
- Users: Who needs quality signals, and what do they need to do with them?
- Value: What decisions depend on quality, and what's the cost of poor quality?
- Experience: How do people discover, understand, and act on quality issues?
- Iteration: How do we improve quality over time based on user feedback?
This shifts the focus from "did our script run?" to "did we prevent a bad decision?"
Designing Quality as a Product
Here's the framework I use:
1. Define Quality Dimensions
Not all quality issues matter equally. Define what matters for your use case:
- Completeness: Are required fields populated?
- Accuracy: Does data match reality?
- Timeliness: Is data available when needed?
- Consistency: Do values align across systems?
- Validity: Do values conform to expected formats/ranges?
Prioritize based on business impact, not technical elegance.
2. Build for Action
Every quality check should have a clear action path:
- Alert: Who needs to know, and how urgent is it?
- Triage: What's the severity and impact?
- Remediation: What's the process for fixing it?
- Prevention: How do we stop it from recurring?
If a quality check doesn't lead to action, it's noise, not signal.
3. Surface Quality Where Decisions Happen
Quality signals need to be visible where people make decisions:
- Dashboards: Show quality metrics alongside business metrics
- BI Tools: Flag reports when underlying data quality is degraded
- Data Catalogs: Display quality scores at the dataset level
- APIs: Return quality metadata with data responses
This moves quality from "separate reports" to "integrated context."
4. Iterate Based on Outcomes
Measure quality product effectiveness by outcomes:
- Are fewer bad decisions being made?
- Are issues caught earlier in the pipeline?
- Is remediation faster?
- Are stakeholders more confident in data?
Use these metrics to refine your quality checks, alerts, and processes.
The Quality Product Roadmap
Start simple, iterate based on feedback:
V1.0: Critical Issues Only
- Focus on quality problems that cause immediate business impact
- Basic alerts to data owners
- Manual remediation processes
V2.0: Visibility
- Quality dashboards for stakeholders
- Integration with BI tools
- Automated triage and routing
V3.0: Prevention
- Quality gates in pipelines
- Automated remediation where possible
- Predictive quality monitoring
Each version delivers value while gathering feedback for the next iteration.
Common Pitfalls
Avoid these product anti-patterns:
- Over-engineering: Building comprehensive quality frameworks before understanding what matters
- Siloed quality: Keeping quality checks separate from data consumption
- One-size-fits-all: Applying the same quality standards to all data regardless of use case
- Set-and-forget: Building quality checks and never revisiting them
The Bottom Line
Data quality isn't a technical problem to solve with scripts. It's a product challenge: understanding user needs, designing for action, and iterating based on outcomes.
Treat it like a product, and you'll build quality systems that actually improve decisions. Treat it like a script, and you'll have impressive technical artifacts that nobody uses.
The choice determines whether your data quality initiative succeeds or fails.