search for


A study on discrimination in mortgage lending in the United States: A revisit by random forest method
Journal of the Korean Data & Information Science Society 2019;30:261-370
Published online March 31, 2019;
© 2019 Korean Data and Information Science Society.

Xiaoting Qiu1 · Pilsun Choi2

12Department of International Trade, Konkuk University
Correspondence to: Professor, Department of International Trade, Konkuk University, Seoul 05029, Korea. E-mail:
This paper was supported by Konkuk University in 2017. This paper is extracted from Xiaoting Qiu’s Master Thesis.
Received December 24, 2018; Revised March 11, 2019; Accepted March 11, 2019.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
In the 1990s, there were many empirical analyses and debates surrounding the existence of racial discrimination in home mortgage lending in the United States. In particular, FRB Boston, a financial supervisory authority, analyzed 1990 HMDA data using a logit model by supplementing the credit information collected by itself, and found that racial discrimination existed in bank lending decisions. This study raises the problem that the traditional regression method only cares about the coefficient magnitude and statistical significance of the race variable, but does not verify how important the factor of race is in the loan decision than the other factors. Therefore, this study uses the method of evaluating the importance of variables in the random forest technique, one of the core methodologies of machine learning, to measure how important race variable plays in determining the success of the loan. Estimation results show that in the logit estimation, race variable is statistically significant at the 0.1% level, but in the variable importance measures by the random forest technique, it is found that the importance of race variable is lower than medium. This is in contrast to the racial discrimination in mortgage lending that the previous studies claim.
Keywords : Mortgage lending, race discrimination, random forest, variable importance measures.