📊
🎲

Statistics Literacy Basics

Core concepts for reading and reasoning about data.

📈
🚫

Correlation is not causation

Voorkant

Two variables moving together does not prove one causes the other.

Achterkant
👥
📏

Sample size

Voorkant

More data reduces random noise, but it does not fix biased sampling or bad measurement.

Achterkant
📉
🎯

Regression to the mean

Voorkant

Extreme results tend to move closer to average on the next measurement.

Achterkant
↔️
📍

Median

Voorkant

The middle value; less affected by outliers than the mean.

Achterkant
⚖️
🧱

Confidence interval

Voorkant

A range that would contain the true value in ~X% of repeated samples (given assumptions).

Achterkant
🧮
⚖️

Mean

Voorkant

The average value; sensitive to outliers.

Achterkant
↔️
💨

Variance

Voorkant

Average squared distance from the mean; a measure of spread.

Achterkant
📏
〰️

Standard deviation

Voorkant

Square root of variance; typical distance from the mean.

Achterkant
👽
📍

Outlier

Voorkant

A value far from others; can strongly affect averages and models.

Achterkant
🧺

Selection bias

Voorkant

Sampling that is not representative.

Fix: check who is missing and why.

Achterkant
🕵️
🔗

Confounder

Voorkant

A hidden variable that influences both cause and effect, creating a false relationship.

Achterkant
🎲
🔀

Randomization

Voorkant

Assigning by chance helps balance confounders across groups.

Achterkant
🛡️
🧪

Control group

Voorkant

A comparison group that does not receive the treatment.

Achterkant
🎲

P-value

Voorkant

Probability of data at least this extreme under the null hypothesis (not the probability the hypothesis is true).

Achterkant
🌟
📊

Statistical significance

Voorkant

A threshold decision about evidence against null; not the same as importance.

Achterkant
🐘
📏

Effect size

Voorkant

How big the difference/relationship is (magnitude matters).

Achterkant
🌍

Practical significance

Voorkant

Whether an effect is large enough to matter in real life.

Achterkant
🚨
🤡

False positive

Voorkant

Detecting an effect that is not real (Type I error).

Achterkant
📉
🧠

Base rate fallacy

Voorkant

Ignoring prior probability when interpreting new evidence.

Achterkant
🧩
🔄

Simpson's paradox

Voorkant

A trend appears in groups but reverses when groups are combined.

Achterkant
🔍
☣️

Relative vs absolute risk

Voorkant

Relative change can mislead; always look at absolute difference too.

Achterkant
🌊
⚖️

Law of large numbers

Voorkant

As sample size grows, averages tend to stabilize near the expected value.

Achterkant
⛏️
🎣

Data dredging (p-hacking)

Voorkant

Trying many analyses until something is significant.

Fix: pre-register or correct for multiple tests.

Achterkant
🛤️
🤔

Causal counterfactual

Voorkant

Causation asks: what would happen if the same case were different in one factor?

Achterkant