No matter where I work or what data project I've worked on - there's always SQL or some variation of it.

BigQuery. PostgreSQL. MySQL. Metabase.

One way or another, you need to type SELECT * FROM to extract data.

Why? Because data speaks SQL-ish.

And you need to master it.

Today, let's talk about where to start, what to focus on, and what to skip.

How I Actually Learnt SQL

I learnt SQL at work. I used to work with SSRS reports.

But every time I needed some tiny change - an extra column to add, extra grouping, or a calculation - I had to raise a request and then wait for it to be developed. Just to realise I forgot to mention that breakdown should be daily not monthly.

That bottleneck drove me mad.

So I decided to learn SQL myself. I started by recreating a report I already knew, just to make sure I could get the data correctly.

My colleague suggested W3Schools. I opened it on one screen, opened SQL on another, and typed SELECT * FROM table_name LIMIT 1000.

Then came a lot of frustration.

I couldn't understand how to aggregate data. What "ambiguous column" meant. Or why "at least one column must appear in GROUP BY."

I thought I'd never master this.

But years passed. Now it feels very natural.

Where to Start: The Essentials

Here's what you'll actually use constantly:

1. SELECT, FROM, WHERE, GROUP, ORDER, COUNT, DISTINCT

You'll use these in every single query.

SELECT DISTINCT customer_id
FROM orders

This is your foundation. Master this before moving on.

Try count(*) and then count(distinct column_name) compare the results, it helps to understand the difference.

2. CASE WHEN

Very useful for adding logic to your queries.

SELECT 
    customer_id,
    CASE 
        WHEN total_purchases > 1000 THEN 'Premium'
        WHEN total_purchases > 500 THEN 'Standard'
        ELSE 'Basic'
    END as customer_segment
FROM customers

3. Date functions

Working with dates is unavoidable in analytics.

Learn DATE_TRUNC, DATEADD, DATEDIFF (syntax varies by SQL flavour, but concepts are the same).

I use datediff all the time to calculated customers age - it is datediff between current date/snapshot date and DOB.

4. JOINs - Essential

Data lives in separate tables. You need to combine them.

SELECT 
    c.customer_name,
    o.order_amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id

Most common: LEFT JOIN (keeps all records from left table), exactly as VLOOKUP in Excel

5. UNION

Not used as often as JOINs, but very handy.

Example: At work, I need to get data for iOS users and Android users, then stack them on top of each other to analyse.

SELECT user_id, platform, activity
FROM ios_users
UNION ALL
SELECT user_id, platform, activity
FROM android_users

Key difference:

  • UNION adds rows (stacks data vertically)

  • JOIN adds columns (like VLOOKUP in Excel)

What I Wish I'd Known Earlier

CTEs (Common Table Expressions)

I didn't know about CTEs for years. Instead, I created temporary tables.

CREATE TABLE temp_data... then DROP TABLE temp_data.

This works, but it's memory-heavy and messy.

CTEs are cleaner:

WITH monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', order_date) as month,
        SUM(order_amount) as total_sales
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT * 
FROM monthly_sales
WHERE total_sales > 10000

Why CTEs are better:

  • Easy to read

  • No cleanup needed

  • Can reference earlier CTEs in later ones

  • Perfect for breaking down complex queries

You need to master these.

Window Functions - A Must

These are incredibly powerful for analytics.

Most of the time, I use ROW_NUMBER().

SELECT 
    customer_id,
    order_date,
    order_amount,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as order_number
FROM orders

This shows which order number each purchase was for each customer (1st order, 2nd order, etc.). I use it to find the latest address for a customer, or very first purchase date. Sometimes you can get away with MAX() or MIN(), but for complex cases you will have to use Window Functions

Window functions are essential for:

  • Ranking

  • Running totals

  • Comparing to previous periods

  • Cohort analysis

What You Can Learn Later

CREATE, DELETE, TRUNCATE, ALTER TABLE

It's good to know these exist. But you'll rarely use them. Also DELETE is very dangerous.

I was told a story, how a person ALTERED table forgetting to use WHERE, so he updated passwords for all the users in the table instead of a specific user.

GRANT, COMMIT, ROLLBACK

I'll be honest: I only learnt GRANT and COMMIT last year, and not knowing this didn’t effect my career.

After 11 years of writing SQL.

So be ready to learn throughout your entire career. There's always something new.

And don't be afraid to ask questions. There's always something to learn from your colleagues.

How to Start Today

Here's the practical path:

1. Download a database system

  • MS SQL Server

  • MySQL

  • PostgreSQL

Pick one. They're all free. The syntax is 95% the same.

2. Get a workbench

You need something to write and run queries.

I use:

  • MS SQL Server Management Studio (for MS SQL)

  • DBeaver (my new favourite - works with everything, and I'm actually enjoying it)

Both are free.

3. Download a dataset

Go to Kaggle and download a database. Look for:

  • E-commerce data

  • Sales data

  • Customer data

Something with multiple tables so you can practice JOINs.

4. Start querying

Don't just read tutorials. Actually write queries.

Start with:

  • SELECT * FROM table_name LIMIT 100 (see what's in the table)

  • Count rows: SELECT COUNT(*) FROM table_name

  • Group by something: SELECT category, COUNT(*) FROM products GROUP BY category

  • Join two tables

  • Add a CASE statement

  • Try a CTE

Make mistakes. Debug them. That's how you learn.

Resources That Actually Helped Me

W3Schools SQL Tutorial - This is what I had open on my screen constantly. Free, practical examples.

Mode Analytics SQL Tutorial - Business-focused, great for analysts.

SQLZoo - Practice problems to test yourself.

Your actual work - Honestly, this is where I learnt most. Real data, real problems, real deadlines.

The Frustration Is Normal

When I started, I couldn't understand why my GROUP BY queries kept failing.

I didn't understand the difference between WHERE and HAVING.

I wrote INNER JOIN when I needed LEFT JOIN and couldn't figure out why data was missing.

This is normal.

SQL has a learning curve. The syntax feels backwards at first (you write SELECT first but it executes last).

But it clicks. I promise.

One day, you'll write a complex query with CTEs and window functions without thinking about it.

And you'll remember when SELECT * felt intimidating.

I've been writing SQL for many years. I still Google syntax and ask ChatGPT to debug my query/optimise it. I still learn new functions. I still ask colleagues questions.

Download PostgreSQL. Get DBeaver. Grab a Kaggle dataset.

Type SELECT * FROM and start exploring.

Keep pushing 💪,

Karina

Python Tip

When I needed to analyse huge CSV files Pandas crash. In that case I use DuckDB.

DuckDB queries files directly.

No loading. No memory issues. Just pure SQL speed.

Works on:

CSV files (any size)

Parquet files

Multiple files at once

Even pandas DataFrames

Grab your freebies if you haven’t done already:

Data Playbook (CV template, Books on Data Analytics and Data Science, Examples of portfolio projects)

Need more help?

Just starting with Python? Wondering if programming is for you?

Master key data analysis tasks like cleaning, filtering, pivot and grouping data using Pandas, and learn how to present your insights visually with Matplotlib with ‘Data Analysis with Python’ masterclass.

Building your portfolio?
Grab the Complete EDA Portfolio Project — a full e-commerce analysis (ShopTrend 2024) with Python notebook, realistic dataset, portfolio template, and step-by-step workflow. See exactly how to structure professional portfolio projects.

Grab your Pandas CheatSheet here. Everything you need to know about Pandas - from file operations to visualisations in one place.

More from me: YouTube | TikTok | Instagram | Threads | LinkedIn

Data Analyst & Data Scientist

Keep Reading

No posts found