Correlation is not causation
Two variables moving together does not prove one causes the other.
Sample size
More data reduces random noise, but it does not fix biased sampling or bad measurement.
Regression to the mean
Extreme results tend to move closer to average on the next measurement.
Median
The middle value; less affected by outliers than the mean.
Confidence interval
A range that would contain the true value in ~X% of repeated samples (given assumptions).
Mean
The average value; sensitive to outliers.
Variance
Average squared distance from the mean; a measure of spread.
Standard deviation
Square root of variance; typical distance from the mean.
Outlier
A value far from others; can strongly affect averages and models.
Selection bias
Sampling that is not representative.
Fix: check who is missing and why.
Confounder
A hidden variable that influences both cause and effect, creating a false relationship.
Randomization
Assigning by chance helps balance confounders across groups.
Control group
A comparison group that does not receive the treatment.
P-value
Probability of data at least this extreme under the null hypothesis (not the probability the hypothesis is true).
Statistical significance
A threshold decision about evidence against null; not the same as importance.
Effect size
How big the difference/relationship is (magnitude matters).
Practical significance
Whether an effect is large enough to matter in real life.
False positive
Detecting an effect that is not real (Type I error).
Base rate fallacy
Ignoring prior probability when interpreting new evidence.
Simpson's paradox
A trend appears in groups but reverses when groups are combined.
Relative vs absolute risk
Relative change can mislead; always look at absolute difference too.
Law of large numbers
As sample size grows, averages tend to stabilize near the expected value.
Data dredging (p-hacking)
Trying many analyses until something is significant.
Fix: pre-register or correct for multiple tests.
Causal counterfactual
Causation asks: what would happen if the same case were different in one factor?