Core concepts for reading and reasoning about data.
Two variables moving together does not prove one causes the other.
More data reduces random noise, but it does not fix biased sampling or bad measurement.
Extreme results tend to move closer to average on the next measurement.
The middle value; less affected by outliers than the mean.
A range that would contain the true value in ~X% of repeated samples (given assumptions).
The average value; sensitive to outliers.
Average squared distance from the mean; a measure of spread.
Square root of variance; typical distance from the mean.
A value far from others; can strongly affect averages and models.
Sampling that is not representative. **Fix:** check who is missing and why.
A hidden variable that influences both cause and effect, creating a false relationship.
Assigning by chance helps balance confounders across groups.
A comparison group that does not receive the treatment.
Probability of data at least this extreme under the null hypothesis (not the probability the hypothesis is true).
A threshold decision about evidence against null; not the same as importance.
How big the difference/relationship is (magnitude matters).
Whether an effect is large enough to matter in real life.
Detecting an effect that is not real (Type I error).
Ignoring prior probability when interpreting new evidence.
A trend appears in groups but reverses when groups are combined.
Relative change can mislead; always look at absolute difference too.
As sample size grows, averages tend to stabilize near the expected value.
Trying many analyses until something is significant. **Fix:** pre-register or correct for multiple tests.
Causation asks: what would happen if the same case were different in one factor?
Sampling that is not representative.
Fix: check who is missing and why.
Trying many analyses until something is significant.
Fix: pre-register or correct for multiple tests.