Fair Binning Cuts Mortgage Bias but Costs 29.4% in Utility

9.63% racial bias baked into income data before any machine learning model sees it—that's what standard equal-frequency binning does to 103,481 mortgage applications from the Chicago metro area. That figure comes straight from a three-stage pipeline built on the HMDA 2023 dataset, using PySpark to clean and bin applicant attributes.

Standard Binning Injects 9.63% Bias into Income Discretization

The researchers compared two binning methods: standard equal-frequency and the epsilon-biased fair binning algorithm from Asudeh et al. Standard binning produced a 9.63% racial bias in the way income values were discretized, landing squarely in the 8–10% range reported by prior work. That bias is invisible to downstream pattern mining unless you explicitly audit the preprocessing step. Fair binning with seven race groups simply failed at epsilon=0.03—the algorithm couldn't find a solution that satisfied the fairness constraint. Only at epsilon=0.08 did fair binning succeed, and the price was steep: a Price of Fairness of 29.4%, meaning nearly a third of the information utility in the income discretization was sacrificed.

FP-Growth Finds DTI Dominates, But Clustering Exposes Racial Disparities

Running FP-Growth association rule mining on both binned datasets produced a clear top predictor: high debt-to-income ratio drives denial decisions with 67.2% confidence and a lift of 2.81. Explicit racial bias didn't appear as high-support rules—the market's favorite excuse for fairness. But K-Means clustering followed by a disparate impact audit told a different story. Out of 45 cluster-group pairs, 10 were flagged as having significantly higher denial rates for Black applicants even when grouped with financially similar White applicants. The bias isn't in the rules; it's woven into the clusters.

This pipeline proves that fairness audits can't stop at the model output. Preprocessing choices like binning inject measurable bias before any algorithm runs—and that bias persists even when race never appears in a high-confidence rule. Ignoring the 9.63% head start on bias means your downstream model is already cooked.

Source: Auditing Discriminatory Patterns in Mortgage Lending Through Association Rules and Fair Binning
Domain: arxiv.org

Fair Binning Cuts Mortgage Bias but Costs 29.4% in Utility

Standard Binning Injects 9.63% Bias into Income Discretization

FP-Growth Finds DTI Dominates, But Clustering Exposes Racial Disparities

More in Science & Research