The Data Science Roadmap: What to Actually Learn

"I want to become a data scientist. Where do I start?"

I get this question constantly. And I get it—data science sounds exciting. Machine learning! AI! Predictive models!

Here's the roadmap that actually makes sense, based on working as both a data analyst and data scientist for over 11 years.

The Reality Check First

Before we dive into the roadmap, let's be honest about something:

Data science isn't always more advanced than data analytics. It's just different.

Data analysts answer "what happened and why?" with data. Data scientists predict "what will happen?" using models.

Both require:

  • Strong analytical thinking

  • Business understanding

  • Communication skills

  • Technical proficiency

The difference is in the tools and the questions you're answering, not in who's "smarter."

"Can I Skip Data Analytics and Go Straight to Data Science?"

I get asked this often. The answer is: Yes, but...

You can absolutely skip having "Data Analyst" as a job title. You don't need to spend 2-3 years as an analyst before becoming a data scientist.

However—and this is crucial—you CANNOT skip data analytics skills.

Here's why:

Data scientists spend 60-80% of their time doing analyst work:

  • Cleaning messy data

  • Exploring datasets to understand patterns

  • Validating data quality

  • Creating visualisations

  • Communicating findings to stakeholders

The machine learning part? That's maybe 20-40% of the job.

What happens when people skip analyst skills:

I've interviewed data scientists who:

  • Could build complex neural networks

  • Couldn't write a proper SQL JOIN

  • Built models on data they didn't understand

  • Couldn't explain their findings to stakeholders

  • Had no idea if their results made business sense

They didn't get the job.

The skills you absolutely need (regardless of title):

  • Data manipulation - SQL and Python/pandas. Non-negotiable.

  • Exploratory data analysis - Understanding your data before modelling

  • Data visualisation - Communicating insights clearly

  • Business context - Knowing what questions matter and why

  • Stakeholder communication - Explaining technical work to non-technical people

So yes, you can go straight for data science roles. But you'll need to learn all the analyst skills along the way. There's no shortcut around understanding data fundamentally.

My recommendation? Learn data analytics skills FIRST, then build up to machine learning. Not because you need the job title, but because you need the foundation.

Are You Ready for Data Science?

You should have these foundations first:

  • Comfortable with data manipulation (SQL, Excel, or Python/pandas)

  • Understanding of basic statistics (mean, median, distributions, correlation)

  • Experience answering business questions with data

  • Ability to communicate findings clearly

Don't have these yet? Spend 2-3 months building analyst skills first. You can learn them without being employed as an analyst—through projects, courses, and practice.

Then move into machine learning. Your data science work will be infinitely better because of it.

Phase 1: The Maths You Actually Need

Everyone panics about maths. "Do I need a PhD in mathematics?!"

No. But you do need to understand the fundamentals.

Statistics and Probability - Non-Negotiable

  • Hypothesis testing (p-values, confidence intervals)

  • Probability distributions (normal, binomial, Poisson)

  • Regression (linear and logistic)

  • Understanding bias and variance

  • Overfitting vs underfitting

Why this matters: You need to know if your model is actually working or if it's just memorising data.

Linear Algebra - Just Enough

  • Vectors and matrices

  • Matrix multiplication

  • Understanding what dimensions mean

Why this matters: Machine learning is matrix operations under the hood. You don't need to be an expert, but you need to understand what's happening.

Calculus - The Bare Minimum

  • Derivatives and gradients

  • Understanding optimisation

Why this matters: Models "learn" by minimising error through gradient descent. You need to understand the concept, not solve equations by hand.

How to learn: Khan Academy (free), StatQuest on YouTube, or 3Blue1Brown for visual understanding.

Don't overthink this phase. You need understanding, not expertise. Move on once you grasp the concepts.

Phase 2: Machine Learning Fundamentals

This is where it gets interesting.

Supervised Learning - Start Here

Regression:

  • Linear regression (you probably know this)

  • Polynomial regression

  • Regularisation (Ridge, Lasso)

Classification:

  • Logistic regression

  • Decision trees

  • Random forests

  • Support Vector Machines (SVM)

  • K-Nearest Neighbours (KNN)

Ensemble Methods:

  • Bagging

  • Boosting (XGBoost, LightGBM)

Unsupervised Learning

Clustering:

  • K-means

  • Hierarchical clustering

  • DBSCAN

Dimensionality Reduction:

  • PCA (Principal Component Analysis)

  • t-SNE

The Critical Skills Here:

  • Understanding when to use which algorithm

  • Feature engineering (creating useful inputs)

  • Model evaluation (accuracy, precision, recall, F1-score, ROC-AUC)

  • Cross-validation

  • Hyperparameter tuning

What's more important than knowing algorithms? Knowing which algorithm to use and why.

I've seen people throw neural networks at problems that linear regression would solve perfectly.

Phase 3: Python for Machine Learning (Overlaps with Phase 2)

If you're coming from data analytics, you might know Python basics. Now you need the ML libraries.

Essential Libraries:

scikit-learn - Your Main Tool

  • Model training and evaluation

  • Pre-processing and feature scaling

  • Pipeline creation

  • Every common ML algorithm

pandas - Data Manipulation

  • You should already know this from analyst work

  • If not, learn it first

NumPy - Numerical Computing

  • Array operations

  • Mathematical functions

Matplotlib/Seaborn - Visualisation

  • Model performance visualisation

  • Feature importance plots

What to focus on:

  • Loading and preparing data

  • Splitting data (train/test/validation)

  • Training models

  • Evaluating performance

  • Tuning hyperparameters

  • Making predictions

Don't worry about: Building algorithms from scratch. No need to reinvent the wheel. Use the libraries. That's what they're for.

Phase 4: The Real-World Skills

Feature Engineering Creating useful features from raw data. This is often more important than choosing the "perfect" algorithm.

Examples:

  • Creating date features (day of week, month, is_weekend)

  • Combining features (revenue = price × quantity)

  • Encoding categorical variables

  • Handling missing values intelligently

Model Deployment Your model is useless if it only runs on your laptop.

Learn:

  • Saving models (pickle, joblib)

  • Creating simple web apps (Streamlit or Gradio) - easiest way to demo your models

  • Basic API creation (Flask or FastAPI) - for integrating models into applications

  • Version control (Git/GitHub) - tracking your code and models

  • Understanding production constraints (speed, memory)

Optional but valuable:

  • Docker basics - packaging your model and dependencies

  • Cloud deployment (AWS, GCP, or Azure) - getting your model online

You don't need to be a DevOps expert, but you should understand how models get into production.

A/B Testing and Experimentation How do you know your model is actually better than the current process?

Learn:

  • Experimental design

  • Statistical significance

  • Measuring incremental impact

MLOps Basics

  • Version control for data and models

  • Monitoring model performance over time

  • Retraining strategies

Phase 5: Deep Learning

Here's the controversial bit: You might not need deep learning.

Seriously. Most business problems don't require neural networks. They require good feature engineering and the right algorithm.

When you DO need deep learning:

  • Image recognition

  • Natural language processing (text)

  • Speech recognition

  • Time series with complex patterns

  • Very large, unstructured datasets

When you DON'T need deep learning:

  • Tabular data (use XGBoost instead)

  • Small datasets (neural networks need lots of data)

  • Problems where interpretability matters

  • When simpler models work fine

If you decide to learn it:

Frameworks:

  • TensorFlow or PyTorch (pick one, don't learn both at once)

  • Start with Keras (simpler interface)

Concepts:

  • Neural network architecture

  • Activation functions

  • Backpropagation (conceptually)

  • CNNs for images

  • RNNs/LSTMs for sequences

  • Transfer learning

My honest advice: Unless you're specifically targeting computer vision or NLP roles, spend your time getting really good at Phases 1-4 first.

Your Realistic Timeline

Can you learn data science in 3 months? Enough to apply for junior roles? Maybe, if you're coming from data analytics and focused full-time.

More realistic:

  • 6-9 months: Job-ready for junior data scientist roles (coming from analyst background)

  • 12-18 months: Comfortable and confident

  • 2+ years: Actually good at this

If you're starting from zero (no programming, no analytics): Add 6-12 months for foundations.

The Skills That Actually Matter

After 11 years, here's what I've learned:

Technical skills get you the interview. These skills get you the job:

  • Problem framing: Understanding what problem you're actually solving

  • Business sense: Knowing when a model is worth building

  • Communication: Explaining models to non-technical stakeholders

  • Critical thinking: Knowing when your model is wrong

  • Experimentation: Proper A/B testing and measurement

Your Action Plan Based on Where You Are

Complete beginner (no tech background): Start with data analyst skills first. Seriously. You need the foundation.

Data analyst wanting to transition: Phase 1 (maths fundamentals) is your priority. You likely have the tools, need the theory.

Have some Python, want to level up: Phase 2 (ML fundamentals) and Phase 3 (scikit-learn). Build projects while learning.

Know the basics, struggling to get hired: Phase 4 (real-world skills) and build portfolio projects that show business impact, not just model accuracy.

Data science isn't magic. It's not always more advanced or more valuable than data analytics.

It's a different toolkit for different problems.

Some projects need prediction (data science). Some need understanding (data analytics). Many need both.

Keep pushing 💪

Karina

Python Tip

Real-world data has dates like this:

2024-01-05

01/05/2024

05.01.24

Trying to parse them manually? Nightmare.

Try these key parameters:

errors='coerce' → Invalid dates become NaT (Not a Time)

dayfirst=True → Interprets 05/01/2024 as Jan 5, not May 1

import pandas as pd

df = pd.DataFrame({
    'date': ['2024-01-05', '01/05/2024', '05.01.24', 'invalid'],
    'sales': [100, 200, 150, 300]
})


df['date'] = pd.to_datetime(
    df['date'], 
    errors='coerce',      # Invalid → NaT (not error)
    dayfirst=True        # Handles EU format (DD/MM/YYYY)
)

df

Grab your freebies if you haven’t done already:

Data Playbook (CV template, Books on Data Analytics and Data Science, Examples of portfolio projects)

Need more help?

Just starting with Python? Wondering if programming is for you?

Master key data analysis tasks like cleaning, filtering, pivot and grouping data using Pandas, and learn how to present your insights visually with Matplotlib with ‘Data Analysis with Python’ masterclass.

Building your portfolio?
Grab the Complete EDA Portfolio Project — a full e-commerce analysis (ShopTrend 2024) with Python notebook, realistic dataset, portfolio template, and step-by-step workflow. See exactly how to structure professional portfolio projects.

Grab your Pandas CheatSheet here. Everything you need to know about Pandas - from file operations to visualisations in one place.

More from me: YouTube | TikTok | Instagram | Threads | LinkedIn

Data Analyst & Data Scientist