Prepayment involves borrowers paying off their loans before the scheduled maturity date, either partially or in full. This article focuses specifically on full prepayment. In the realm of student loans, there are generally two types: borrower payoff and consolidation. Borrower payoff occurs when individuals have surplus funds and opt to repay their loans ahead of schedule. This repayment rate typically remains consistent over time and isn’t typically the primary focus of the model. Consolidation, on the other hand, involves repaying an existing loan by obtaining a new loan at a lower interest rate. This form of prepayment essentially functions as a call option embedded within the loan. When prevailing market interest rates fall below the loan’s rate, borrowers find themselves “in the money” and are more likely to exercise this option by repaying the loan. This process is often referred to as refinancing within the mortgage industry.
During the model development, two primary challenges emerged. Firstly, unlike the mortgage industry, there’s no universally recognized market rate for student loans. Consequently, we lacked a direct comparison between the existing loan rate and the market rate to determine whether borrowers were “in the money” or “out of the money.” To address this, we opted to utilize the 10-year Treasury note rate as a proxy for the market rate. The incentive was then calculated as the variance between the loan rate and the 10-year Treasury note rate. However, it’s important to note that the sign of the incentive alone doesn’t indicate the borrower’s position. This incentive works not only for the fixed-rate loans, but also for the variable-rate loans. We integrated both fixed-rate and variable-rate loans into one model. For the variable-rate loans, this incentive is essentially the difference between the short-term rate and long-term rate, which is one of the most drivers of prepayment for the variable-rate loans, as the existing loan rate typically comprises the index rate plus the margin, with the index rate typically representing one of the short-term interest rates. This approach enabled us to construct a coherent monotonic response curve for the incentive, which flattens as the incentive grows sufficiently large.
The second challenge revolves around the significant increase in observed prepayment rates between 2014 and 2021, while the incentive remained relatively stable during this period. The existing model attributed this phenomenon to house price appreciation, suggesting that borrowers tapped into their home equity to repay student loans. However, this reasoning poses an issue, considering that student loan borrowers typically belong to a demographic just starting their careers, making it improbable for them to have accumulated substantial home equity. Following discussions with clients, it was suggested that the expansion of the consolidation market, exemplified by companies like SoFi, which partnered with Fannie Mae to allow homeowners to pay down their student debts with cash out refinance, could be the driving force behind this trend. Viewing consolidation desires as demand-side factors, this expansion represents an external shift on the supply side. Lacking pertinent market data, we were unable to control for this within the model. Fortunately, internal data since 2015 provided insight into distinguishing between borrower payoff and consolidation. Leveraging this information, we constructed a consolidation index over time and incorporated it into the model by interacting it with the incentive, as expressed below \[ \beta \times consolidationIndex \times incentive, \] where $\beta$ is the coefficient. This expression can be interpreted in two ways. Firstly, considering $consolidationIndex$ and $incentive$ as components of a new composite incentive. This suggests that the presence of the consolidation market enhances the overall incentive. Thus, having the same level of incentive within a larger consolidation market implies a greater real incentive. Alternatively, one can view this aspect as combining $\beta$ and $consolidationIndex$ into a new coefficient. This implies that with a larger consolidation market, the same incentive yields a larger marginal effect. In practice, it’s assumed that the consolidation market reached maturity around 2019 and has remained stable since then. By incorporating this interaction, the model effectively captures the increasing trend over the observed period.
In our model, we calculated the incentive as the difference between loan rate and 10-year Treasury note rate. In order to capture the impact of the expansion of the consolidation market, we constructed an index and interacted it with the incentive to estimate a more accurate marginal effect of the incentive.
]]>Credit risk refers to the potential risk of loss that arises from a borrower’s failure to repay a debt obligation as agreed. It is the risk that a borrower or counterparty may default on their financial obligations, leading to a loss for the lender or creditor.
Credit risk is commonly encountered in various financial transactions, such as loans, bonds, mortgages, credit cards, and trade credit. When extending credit, lenders assess the creditworthiness of borrowers by considering factors such as their credit history, income, assets, and financial stability. Based on this assessment, lenders assign a credit rating or score to determine the level of risk associated with lending to a particular individual or entity.
A credit risk model is a quantitative tool or framework used by financial institutions and lenders to assess and quantify the credit risk associated with lending to borrowers or counterparties. These models help institutions make informed decisions regarding credit extension, pricing, and risk management.
Credit risk models aim to estimate the likelihood of default by borrowers and the potential loss severity in the event of default. They typically incorporate a variety of factors and data points to evaluate the creditworthiness of borrowers. Some common components of credit risk models include:
Credit scoring: Credit scoring models assign numerical scores to borrowers based on their credit history, payment patterns, outstanding debt, income, and other relevant factors. These scores help lenders categorize borrowers into different risk tiers and determine their likelihood of default.
Financial ratios and indicators: Credit risk models often consider financial ratios and indicators, such as debt-to-income ratio, leverage ratio, profitability measures, and liquidity measures. These indicators provide insights into the financial health and stability of borrowers.
Collateral valuation: In cases where loans are secured by collateral, credit risk models may incorporate the value and quality of the collateral to assess potential loss severity in the event of default. This helps lenders determine the loan-to-value ratio and mitigate risk.
Macroeconomic factors: Some credit risk models consider macroeconomic indicators and trends, such as GDP growth, interest rates, unemployment rates, and industry-specific factors. These factors provide a broader context for evaluating credit risk, as economic conditions can impact the ability of borrowers to repay their obligations.
Behavioral scoring: In addition to credit history, credit risk models may also incorporate behavioral scoring, which analyzes patterns of borrower behavior, such as payment behavior, credit utilization, and past delinquencies. This provides insight into the borrower’s credit management practices and potential future behavior.
Credit risk models are commonly used in two areas: underwriting and future loss projection. In the underwriting case, the model is used to determine borrowers’ creditworthiness which in turn helps lenders to decide whether or not to issue a loan or credit card to borrowers. It usually generates the probability of default within a certain period (i.e. 6 months) in the future. In the loss projection case, the model is used to predict the the probability of default every month in the future. In this article we will discuss the model used in the loss projection case. We will have a separate article for the underwriting model.
The occurrence of a default event can be regarded as a termination event, signifying the conclusion of the lending relationship between borrowers and lenders. Thus, predicting the timing of default essentially involves forecasting the duration of the relationship’s survival. In econometrics, this field is referred to as duration analysis or survival analysis. The fundamental concept behind such models is to define the functional form of either the cumulative distribution function or the hazard function for the duration. The factors that influence credit risks, as discussed earlier, are incorporated into these functions in various ways. The estimation process typically involves maximum likelihood estimation (MLE).
Before delving into the general form of the likelihood function, it is important to discuss the structure of the data and available information. The sample consists of $n$ loans, with monthly status (default or non-default) observed between $t_{1}$ and $t_{2}$. In addition to loan status, we also observe some covariates, $x$, including loan characteristics, denoted as $x_i$, and time specific characteristics, denoted as $x_t$. The loan status at a specific month $t$ for loan $i$ is denoted as $y_{it}$, where $y_{it}$ equal to 1 if the loan defaults in that month and 0 otherwise. It’s worth noting that since loan or credit card payments are typically made on a monthly basis, the duration discussed in this article is discrete.
In survival analysis, the primary challenge lies in handling censored data. For loans that have not defaulted within the observation window, we lack information regarding their eventual default and the exact timing of it. We can only infer that these loans have survived up to $t_2$. This type of data is referred to as right-censored. It is incorrect to simply assign the duration as $t_2$ for these loans. Instead, the appropriate approach is to acknowledge that the durations of these loans extend beyond $t_2$ and incorporate this understanding into the likelihood function used for maximum likelihood estimation (MLE).
The likelihood function for a non-censored loan represents the probability that the loan survives until time $t_{i-1}$ and defaults at time $t_i$, denoted as $Pr(T=t_i|x_i)$. On the other hand, the likelihood for a censored loan captures the probability that the loan survives until time $t_2$, denoted as $Pr(T>t_2|x_i)$. To distinguish between censored and non-censored loans, we use $c_i=\mathbb{1}(t_i=t_2)$, where $c_i$ equals 1 if loan $i$ is censored and 0 otherwise. Then the general form of the likelihood function for loan $i$ can be written as $$ L_i=Pr(T=t_i|x_i)^{(1-c_i)}Pr(T>t_2|x_i)^{c_i} $$ and the likelihood function for the whole sample is then \[ L = \prod_{i=1}^{n}L_i, \] assuming loans are independent from each other.
In this approach, we specify the cumulative distribution function of duration, denoted as $F(t|x; \beta)$, parameterized with $\beta$, along with the corresponding probability density function $f(t|x; \beta)$. Since our focus is no discrete duration, $f(t|x; \beta)$ represents the probability of the duration being equal to $t$. The likelihood function for loan $i$ can now be expressed as: \[ L_i=f(t_i|x_i; \beta)^{(1-c_i)} (1-F(t_2|x_i;\beta))^{c_i}, \] It is important to note that this estimation method relies on a single observation from each loan, making it challenging to incorporate time-varying covariates into the analysis.
When the distribution function $F(t|x;\beta)$ is assumed to follow the form $F_0(e^{-x\beta}t)$, where $F_0$ represents the distribution function of a baseline survival time $T_0$ that is independent of covariates, the specific model is referred to as an accelerated failure time model. The assumption essentially implies that actual survival time $T$ can be expressed as $T=e^{x\beta}T_0$, as one can see from the equation: \[F(t|x;\beta) = Pr(T<t|x;\beta)=F_0(e^{-x\beta}t)=Pr(T_0<e^{-x\beta}t)=Pr(e^{x\beta}T_0<t).\] Hence, it is evident that the model can be represented as a log-linear model, given by $\log T=x\beta + \epsilon$, where $\epsilon$ corresponds to $\log T_0$. The distribution of $\log T_0$ determines the distribution of $\log T$. Estimating this model is relatively straightforward.
This approach addresses the problem by examining the conditional survival probability on a monthly basis, specifically focusing on the probability of default in month $t$ given that the loan has survived until month $t_1$. The discrete version of the hazard function is defined as $h(t|x) = \frac{Pr(T= t|x)}{Pr(T\ge t-1|x)}$.
By using the properties of conditional probabilities, we can express the probability of the loan defaulting at time $t_i$ as: \[Pr(T=t_i|x)=\prod_{j=1}^{t_{i-1}}Pr(T>j|T>j-1)h(t_i|x)=\prod_{j=1}^{t_{i-1}}(1-h(j|x))h(t_i|x).\] Similarly, the probability of the loan surviving beyond $t_2$ can be represented as: \[ Pr(T>t_2|x)=\prod_{j=1}^{t_{2}}(1-h(j|x)) \] For loans that defaulted during the observation window, the sub-sample for that loan includes all observations until time $t_i$. The likelihood function for this loan can then be written as: \[ L_{i}=\prod_{t=1}^{t_i-1}(1-h(t|x_i, x_{t}))h(t_i|x_i,x_{t_i}). \] For loans that were censored, the likelihood function is simply \[ L_i=\prod_{t=1}^{t_i}(1-h(t)|x_i,x_{t}) \] and the likelihood function of the whole sample is therefore \[ L =\prod_{i=1}^{N}(\prod_{t=1}^{t_i-1}(1-h(t)|x_i, x_{t})h(t_i|x_i, x_{t_i}))^{(1-c_i)}(\prod_{t=1}^{t_i}(1-h(t)|x_i, x_{t}))^{c_i}. \] It is evident that this method utilizes a total of $\sum_{i=1}^{n}t_i$ samples.
Recall that we use $y_{it}$ to denote whether loan $i$ is terminated at time $t$. Therefore, For censored loans, $y_{it}=0$ for all $t$; while for loans that were not censored, $y_{it}=0$ for $t<t_i$ and 1 when $t=t_i$. The likelihood function of both types of loans can be simplified as \[ L_{i}=\Pi_{t=1}^{t_i}(1-h(t|x_i,x_{t}))^{1-y_{it}}h(t_i|x_i, x_{t_i})^{y_{it}}, \] and the likelihood of the whole sample then becomes \[ L =\Pi_{i=1}^{N}\Pi_{t=1}^{t_i}(1-h(t|x_i, x_{t}))^{1-y_{it}}h(t_i|x_i, x_{t_i})^{y_{it}}, \] which can be further expressed as the log-likelihood function: \[ l = \sum_{i=1}^N\sum_{t=1}^{t_i}(1-y_{it})\log(1-h(t|x_i, x_{t})) + y_{it}\log(h(t_i|x_i, x_{t_i})) \] Assuming the hazard function follows the cumulative distribution function of the logistic distribution, $h(t|x)=\frac{1}{1+\exp(-x_t\alpha-x_i\beta)}$, we arrive at a familiar logistic regression framework.
From this perspective, we observe that the widely used logistic credit risk model handles the issue of right censoring.
In this article we discussed several credit risk models. We started from the general form of likelihood function to two specific methods, accelerated failure time model and logistic regression model. We also showed that the logistic model has handled the issue of right censoring.
]]>All the models are wrong, but some are useful. — George Box
According to OCC Bulletin 2011-12^{1}, a model is defined as a “quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.” From the definition, one can see that the model risk can arise from: unreasonable assumptions, wrong or inappropriate theories or techniques, and low-quality input data. Different types of models are widely used in financial institutes for various purposes, such as underwriting, investment, stress test. No model is perfect but it is important to manage the model risk to an acceptable range.
Model validation is the process of evaluating and testing models to ensure that they are accurate, reliable, and perform as expected. This process involves a thorough examination of the model’s assumptions, data inputs, calculations, and outputs. The goal of model validation is to identify any weaknesses or potential sources of error in the model, and to ensure that the model meets the business needs and is compliant with regulatory requirements. Model validation is an essential step for any business that relies on models to make important decisions.
We believe that a good model validation process should follow a structured and systematic approach to ensure that all aspects of the model are thoroughly evaluated. Here are some key steps that should be included in a good model validation process:
Define the scope and objectives of the model: Before beginning the validation process, it’s important to clearly define the scope and objectives of the model. This includes identifying the intended use of the model, the data inputs and outputs, and any assumptions or limitations of the model.
Evaluate conceptual soundness: This involves assessing the model design, the main method employed for model estimation and its core assumptions.
Evaluate model inputs: The data inputs used in the model should be carefully evaluated to ensure that they are accurate, complete, and appropriate for the model’s intended use. Any data quality issues should be identified and addressed.
Assess the impact of model limitations: No model is perfect, but it is important to understand the impacts of the limitations.
Validate the model implementation: The implementation should be perfectly aligned with the model estimation. This requires comprehensive tie-out between the model implementation and model outputs.
In this article we discussed the importance of the model validation and our understanding of a good model validation.
The Dodd-Frank Act Stress Test (DFAST, aka stress test) and Comprehensive Capital Analysis and Review (CCAR) are two primary components of the Federal Reserve’s capital assessment of large banks. DFAST or the stress test is a forward-looking quantitative evaluation of banks’ capital adequacy under a range of stress scenarios to evaluate their capital adequacy. CCAR included both qualitative and quantitative assessments during the first several years. The quantitative assessment in CCAR is conducted using DFAST. The qualitative part has been replaced with the stress capital buffer.
The stress test scenarios are designed by Federal Reserve. They include a baseline scenario and a severely adverse scenario.
As is seen in 2022 Federal Reserve Stress Test Results, the final package submitted by the banks include four parts: capital, pre-tax net income, losses, and pre-provision net revenue under severely adverse scenario.
Knowledge Sharing
- The risk-weighted assets are calculated based on standardized approach described in 12 C.F.R. part 217 Subpart D.
- The risk free asset such as government bond is not included in the risk-weighted assets since the risk weight is 0.
Knowledge Sharing
- Amortized cost: initial amount of the loan - principal repayments + unamortized premium - unamortized discount
- FVO: fair market value. It is adjusted to reflect changes in market conditions or other factors that may affect its value.
As one can see, the final submission involves lots of forecast and calculation. Let’s check one by one.
Capital ratios
Since the calculation of risk-weighted assets is rule based as discussed above, there are no prediction models involved in this part.
Loan losses
In essence, the projected losses can be computed as the product of the probability of default and the loss given default, which is determined by multiplying the exposure at default with the severity rate (1-recovery rate). These components necessitate the use of statistical models to generate accurate predictions. For instance, consider mortgage loans. The probability of default is influenced by various factors, such as borrower characteristics (credit history, income, etc.), loan features (type, interest rate, purpose, etc.), and macroeconomic factors (housing price, unemployment rate, interest rate, etc.). The exposure at default, on the other hand, depends on the timing of default, with a significant difference in exposure between early-stage and late-stage defaults. Finally, the recovery rate is influenced by factors such as the condition of the collateral, availability of mortgage insurance, and housing market conditions at the time of default.
Pre-provision net income
Forecasting future income and expenses involves taking into account numerous factors. For instance, when predicting income from mortgage loans, there are two components: income from existing loans and income from future loans. The income from existing loans depends on factors such as prepayment speed and probability of default, while income from future loans depends on predictions of loan volumes as well as prepayment and credit risks. There are various methods for predicting future loan volumes, which are dependent on the type of asset. A common approach is to use a time series model that incorporates economic conditions.
The Federal Reserve regulates the DFAST stress test, which is aimed at evaluating the performance of large banks in hypothetical severely adverse scenarios. To accurately assess sensitivities to economic conditions, a set of robust statistical models are required. These models encompass credit risk models for various loan types, prepayment risk models, and time series models for volume forecasting.
Note: all the data are downloaded from Federal Reserve’s website.