- Karina Datascientist's Newsletter
- Posts
- When Your SQL Query "Works" But Everything Is Wrong
When Your SQL Query "Works" But Everything Is Wrong
This week, I've been debugging a SQL query that runs perfectly. No errors. Executes in seconds. Returns results.
There's just one problem: it returns 148 rows.
January this year, the same query returned 1.5 million rows.
Something is very, very wrong.
And this is the part of being a data analyst that no bootcamp, course, or tutorial prepares you for.
The Story: When "Working" Code Is Actually Broken
Let me set the scene: I'm working with a long, complex query that creates a specific table. This query depends on multiple intermediate tables, each built from other queries.
It's one of those queries where you inherit someone else's work, or your own work from months ago, and you're just running it again with updated data.
The query executes. No syntax errors. No obvious issues. Everything looks fine.
Until you look at the output.
148 rows.
I stare at the number. Run the table for January 2025. 1.5 million rows.
That's not a small difference. That's not "oh, the business slowed down a bit." That's "something is fundamentally broken."
But the query works. The code is fine. So what's wrong?
The Investigation Begins
When I was just starting to work with SQL, my ultimate goal was to write a query that executes. No errors, no syntax issues — just get it to run.
What I wasn't prepared for is what to do when your technically correct query returns completely incorrect results.
This is where the real work begins.
My debugging process:
Step 1: Confirm the problem
Pull historical results to compare
Check if business reality actually changed this dramatically
Step 2: Break down the complex query
This isn't a simple SELECT statement
It's a chain of intermediate tables, each depending on the previous one
Multiple CTEs, multiple JOINs, filters, aggregations
Step 3: Debug one table at a time
When debugging complex queries, you have two approaches:
Option 1: Work backwards
Start with the final output table: 148 rows (wrong)
Go back one step: Check the table that feeds into it
Go back another step: Check that table's input
Keep going backwards until you find where the numbers diverge
Option 2: Work forwards
Start from the very beginning
Run the first query/subquery and compare to previous period
Then the next one, comparing each step
Keep going forward until you spot where it breaks
What I found: The culprit is an input table. One of the source tables feeding into this complex query.
And I'm still investigating exactly what changed and why, wish me luck!
This is the reality of being a data analyst:
Understanding your data:
Where does it come from?
How is it structured?
What does each field actually mean?
What are the business rules?
Validating your results:
Does this number make business sense?
How does it compare to previous periods?
Should this metric be going up or down?
What's normal vs. what's a red flag?
Detective work:
When something's wrong, tracing back through the pipeline
Understanding dependencies between tables
Figuring out what changed and why
Business logic:
Understanding what the number SHOULD roughly be (Feel for numbers comes with time. You will roughly know how many customers you have, what monthly revenue is etc)
Asking "does this make sense?" constantly
This is the job. The SQL is just the tool.
How to Develop Business Logic Intuition
1. Always know the rough order of magnitude
Before running a query, ask yourself:
What do I expect to see? (Roughly)
Hundreds? Thousands? Millions? - check previous periods, old files/presentations, ask colleagues
You don't need to know the exact number. But you should know if you're expecting thousands vs. millions.
2. Compare to previous periods
Always. Every time. Without exception.
Same query last month?
Same query last quarter?
Same query last year?
Another story from this week, I have a report that in Jan-Oct’24 was showing ~ 24k users, Nov’24 - 950 users, Dec’24 - 60 users. The business hasn’t changed, so….
3. Understand the business context
Did a promotion end? (Expect sales to drop)
Did we launch in a new country? (Expect new customers)
Is this our busy season? (Expect higher numbers)
Did we stop offering a product? (Expect that product's sales to be zero)
Changes should map to business reality.
4. Sanity check aggregations
Does total revenue = sum of all transactions?
Do regional sales add up to national sales?
Is average between min and max?
Are percentages between 0 and 100?
These seem obvious, but you'd be surprised how often they catch errors.
For example, the other day I did sum(sales_jan) as jan, sum(sales_jan) as oct - as you can guess my Jan and Oct were suspiciously the same. Sometimes it is important to step back and check your numbers once again.
5. Check for silent data loss
This is what happened to me this week.
Places where data can disappear without errors:
JOINs that exclude unmatched rows
WHERE clauses that filter too aggressively
Data type mismatches
Changed table structures
Incomplete data loads
Always compare row counts at each step.
Here's what I wish someone had told me when I started:
You will spend more time debugging than writing new code.
You will spend more time understanding data than analysing it.
You will question your results more than you trust them.
And this is normal. This is the job.
As I write this, I'm still debugging that query. My query isn't "broken" in the traditional sense. It runs. It returns results.
It's just returning the wrong results.
The next time your query runs successfully, don't stop there.
Ask yourself:
Does this make sense?
How does this compare to last time?
What story is this data telling, and is it a believable story?
Because "working" code isn't the goal.
Correct code is the goal.
And sometimes, you're the only one who can tell the difference.
Keep pushing 💪
Karina
SQL Tip
If you're copy-pasting the same query 4 times just to change one value, it works... but there's a better way.
SELECT * FROM Orders WHERE Region = 'West';
SELECT * FROM Orders WHERE Region = 'East';
SELECT * FROM Orders WHERE Region = 'South';
SELECT * FROM Orders WHERE Region = 'Central';Copy. Paste. Change value. Repeat. It works. But it's painful.
The better way: Use a variable
Change the region? Change one line. Done.
DECLARE @Region VARCHAR(50) = 'West';
SELECT *
FROM Orders
WHERE Region = @Region;No more find-and-replace across 50 lines
Change values in one place
Easier to test different scenarios
Cleaner stored procedures
Reusable scripts
Grab your freebies if you haven’t done already:
Data Playbook (CV template, Books on Data Analytics and Data Science, Examples of portfolio projects)
Need more help?
Just starting with Python? Wondering if programming is for you?
Master key data analysis tasks like cleaning, filtering, pivot and grouping data using Pandas, and learn how to present your insights visually with Matplotlib with ‘Data Analysis with Python’ masterclass.
Building your portfolio?
Grab the Complete EDA Portfolio Project — a full e-commerce analysis (ShopTrend 2024) with Python notebook, realistic dataset, portfolio template, and step-by-step workflow. See exactly how to structure professional portfolio projects.
Grab your Pandas CheatSheet here. Everything you need to know about Pandas - from file operations to visualisations in one place.
![]() | Data Analyst & Data Scientist |
