Building Credit Models with GSE Data

We explore the usefulness of loan-level performance data now being published by Fannie Mae and Freddie Mac, the two main US housing GSEs. A recent article by Scott Anderson and Janet Jozwik, titled “Building a Credit Model Using GSE Loan-Level Data” published in the Spring 2014 “Journal of Structured Finance”, gives us the opportunity to explore the practical usefulness of this data. Their work confirms that the GSE data is suitable for building a credit model, as it provides enough information to model the behavior of the borrower, including the probability of becoming delinquent.

Through a step-by-step approach, the authors build a framework for developing a credit model, based on this newly released loan level data and on macroeconomic variables, to evaluate GSE mortgage issuance.

First, in order to measure credit risk, the authors choose the MDR (Monthly Default Rate) as a metric, in which “default” is defined as the 180-day delinquency credit event. To validate this choice, they make a comparison on loan performance data in the years 1999-2013, showing that the chosen measure tracks well with the net roll rate (which captures the flow of loans shifting from performing to nonperforming status), with an average MDR of 40 bps against an average net roll rate of about 45 bps.

In the second step, the sample population of loans is chosen. The authors start from Freddie Mac sample datasets, comprising 50,000 loans from each origination vintage from 1999 to 2012. Then, they narrow the dataset by applying the Freddie Mac STACR (Structured Agency Credit Risk) deal criteria, and excluding loans that experience a delinquency or prepayment event within 7 months of origination.

The population of loans is considered appropriate for two main reasons:

  • The loans have a credit performance profile which is similar to the full population of loans on which the model should be applied;
  • The time span (1999-2012) covers a full macroeconomic cycle, comprising all the phases that could affect the modeled loans: the average growth of the early 2000s, the bubble years of 2004-2007, the distressed years of the financial crisis and the recent recovery period.

The authors then conduct a bivariate analysis of the full-sample population, to identify potential explanatory variables for variability in default rates, and obtain the following results:

  • A strong relationship exists between MDR and FICO score at origination: the lower the FICO, the higher the default rate;
  • There is sizeable correlation between the original CLTV (combined loan-to-value ratio) and MDR;
  • Using the FHFA (Federal Housing Finance Agency) all transaction index, the authors calculate a mark-to-market CLTV (MTMCLTV) for each loan and show that MDR rises dramatically as loans enter negative equity;.
  • There is no significant relationship between MDR and loan size. The relationship between these two variables seems, instead, explained to the origination year, as the maximum loan limit changed through time;
  • Cash-out refinance loans perform more poorly than purchase and rate refinance, but again the performance is likely related to the higher FICO scores and lower CLTV levels of the first category;
  • Mortgages with two or more borrowers perform better than mortgages with only one borrower;
  • Occupancy status is another strong indicator of performance.

Based on such findings, the authors develop a logistic regression with the following variables: FICO, MTMCLTC, loan purpose, number of borrowers, occupancy status, unpaid principal balance (UPB) and loan age.

Though the model captures the overall behavior of actual MDR over time, the comparison with the historical MDR shows that the fit and predictive power of this model are weak. Consequently, Anderson and Jozwik incorporate two more factors, in order to better capture the impact of macroeconomic conditions:

  • Housing prices, through a 16-month lag of the yearly changes in the FHFA all-transactions HPI (House Price Index) at the state level;
  • Unemployment, through the national unemployment rate.

The authors find that the inclusion of these macroeconomic variables greatly improves both the fit and explanatory power of the model.

Lastly, the authors incorporate a risk-layering metric as a proxy for capturing the effect of tightening underwriting standards on credit performance, including risk factors such as low FICO, high LTV ratio, single borrower, high debt-to-income ratio. After this last adjustment, when the model is applied to the entire sample, the authors find that it lines up as closely with the full data set as it does with the sample data set.

Sein’s only concern with the model is the assumption that the yearly change in the FHFA HPI is a consistent stationary process for all MSAs. In a recent article we discussed why this assumption is faulty. However, the model proposed by Anderson and Jozwik is a great illustration of the power and use of the loan level data from the GSEs.