Causal Inference with Propensity Score Matching: Reducing Bias in Non-Randomised Studies

Education

Causal Inference with Propensity Score Matching: Reducing Bias in Non-Randomised Studies

Jade

January 29, 2026

Causal Inference with Propensity Score Matching: Reducing Bias in Non-Randomised Studies

In an ideal world, we would run randomised controlled trials for every important business or policy decision. In reality, many questions are answered using observational data: marketing campaigns are targeted, product features roll out to some users first, and healthcare treatments are chosen based on patient characteristics. This creates a core challenge for causal inference: the “treatment” and “control” groups are different even before the intervention happens. Propensity Score Matching (PSM) is a practical approach to reducing this selection bias by creating comparable groups using statistical techniques. If you are studying causal methods through data science classes in Bangalore, PSM is one of the most commonly applied tools for turning messy real-world data into more credible impact estimates.

Why Selection Bias Happens in Observational Data

Selection bias occurs when treatment assignment is not random. For example:

High-intent customers are more likely to receive a discount.
Faster learners are more likely to opt into an advanced programme.
Hospitals may prescribe a treatment to sicker patients.

If you simply compare average outcomes between treated and untreated groups, your estimate mixes two effects:

the true effect of the treatment, and
pre-existing differences between groups.

PSM aims to reduce this second component by aligning the groups on observed characteristics (covariates) such as age, baseline activity, income segment, past purchases, severity scores, or prior performance.

What a Propensity Score Is (and the Assumptions Behind It)

A propensity score is the probability that an observation receives the treatment given its observed covariates:

Propensity score = P(Treatment = 1 | Covariates)

The key idea is balancing: if treated and control units have similar propensity scores, they should have similar distributions of observed covariates. After matching on this score, remaining outcome differences are more plausibly attributable to the treatment.

Two assumptions matter in practice:

Unconfoundedness (no unmeasured confounding): After controlling for observed covariates, treatment assignment is “as good as random.” If an important driver is missing (e.g., motivation, hidden risk factors), PSM cannot fix that.
Overlap (common support): There must be comparable treated and control units. If treated units always have much higher propensity scores than controls, matching will discard many samples or become unreliable.

These assumptions are often discussed in applied causal modules within data science classes in Bangalore because they determine whether PSM is appropriate or whether alternative approaches are needed.

The Practical Workflow: From Data to Matched Groups

A disciplined PSM workflow usually follows these steps:

1) Define treatment, outcome, and covariates

Be explicit about:

Treatment: what counts as receiving the intervention (e.g., “got the offer” or “used the feature”).
Outcome: what you want to influence (conversion, revenue, churn, time saved).
Covariates: pre-treatment variables that influence both treatment assignment and outcomes. Avoid using variables that are affected by the treatment (post-treatment variables), because they can introduce bias.

2) Estimate propensity scores

Propensity scores are often estimated using logistic regression. However, machine-learning models (trees, boosting) can also be used when relationships are non-linear. The goal is not perfect prediction; it is covariate balance after matching.

3) Match treated and control observations

You then match each treated unit with one or more control units having similar propensity scores. You can choose:

1:1 matching (simple, interpretable)
1:k matching (more precision, potentially more bias if matches are weaker)
With or without replacement (trade-off between match quality and sample diversity)

4) Check balance diagnostics

Balance is the “quality check.” Common diagnostics include:

Standardised mean differences (SMD) before vs after matching
Covariate distribution plots
Overlap plots of propensity scores

If balance is poor, revisit covariates, modelling choices, or matching settings.

Matching Choices, Bias–Variance Trade-offs, and Common Pitfalls

Different matching strategies are used depending on data size and overlap:

Nearest neighbour matching: matches based on the closest propensity score; easy, but can create poor matches if the overlap is weak.
Calliper matching: only matches within a maximum distance (calliper). This often improves match quality and reduces bias, but may drop more observations.
Stratification/subclassification: groups data into propensity score “bins” and compares outcomes within bins.
Weighting (IPTW): uses propensity scores to weight observations rather than directly matching.

Common pitfalls to avoid:

Including post-treatment variables in the propensity model.
Overfitting the propensity model and assuming high predictive accuracy means good causal validity.
Ignoring overlap issues and forcing matches that are not genuinely comparable.
Reporting results without balance checks, which makes conclusions hard to trust.

In many applied projects from data science classes in Bangalore, the biggest improvements come from careful covariate selection and honest reporting of balance and sample loss.

Estimating the Treatment Effect and Stress-Testing the Result

Once groups are matched, you estimate the effect, often the Average Treatment Effect on the Treated (ATT), by comparing outcomes between treated units and their matched controls. Because matching changes the sample structure, standard errors should be handled carefully (for example, through bootstrapping or methods appropriate to the matching design).

You should also stress-test your conclusion:

Sensitivity analysis: asks how strong an unmeasured confounder would need to be to overturn the result.
Placebo tests: check outcomes that should not be affected by the treatment.
Robustness checks: vary calliper sizes, matching ratios, or propensity models to see if conclusions hold.

Conclusion

Propensity Score Matching is a practical bias-reduction method for causal inference when randomisation is not possible. By estimating the probability of treatment and matching treated and control units with similar propensity scores, PSM helps create more comparable groups and improves the credibility of estimated impacts. It is not a magic fix; its validity depends on observed covariates, overlap, and careful diagnostics, but when applied well, it turns observational data into stronger evidence for decision-making. For learners building applied causal skills through data science classes in Bangalore, mastering PSM is a solid step toward more reliable, real-world analytics.