What Is a Variance Inflation Factor (VIF)? A variance inflation factor (VIF) is a measure of the amount of
in regression analysis. Multicollinearity exists when there is a correlation between multiple independent variables in a multiple regression model.
Small VIF values, VIF < 3, indicate low correlation among variables under ideal conditions. The default VIF cutoff value is 5; only variables with a VIF less than 5 will be included in the model. However, note that many sources say that a VIF of less than 10 is acceptable.
A VIF of 1.5 means that the variance is 50% higher than what could be expected if there was no multicollinearity between the independent variables. As a general rule of thumb, if the VIF is more than 5, the regression analysis is said to be highly correlated.
A VIF less than 5 indicates a low correlation of that predictor with other predictors. A value between 5 and 10 indicates a moderate correlation, while VIF values larger than 10 are a sign for high, not tolerable correlation of model predictors ( James et al. 2013 ).
A VIF of four means that the variance (a measure of imprecision) of the estimated coefficients is four times higher because of correlation between the two independent variables.
In general terms, VIF equal to 1 = variables are not correlated. VIF between 1 and 5 = variables are moderately correlated. VIF greater than 5 = variables are highly correlated2.
VIF = 1, no correlation between the independent variable and the other variables. VIF exceeding 5 or 10 indicates high multicollinearity between this independent variable and the others.
Generally, a VIF above 4 or tolerance below 0.25 indicates that multicollinearity might exist, and further investigation is required. When VIF is higher than 10 or tolerance is lower than 0.1, there is significant multicollinearity that needs to be corrected.
A VIF around 1 is very good.
In factor analysis, principle component analysis is used to drive the common score of multicollinearity variables. A rule of thumb to detect multicollinearity is that when the VIF is greater than 10, then there is a problem of multicollinearity.
The higher the value, the greater the correlation of the variable with other variables. Values of more than 4 or 5 are sometimes regarded as being moderate to high, with values of 10 or more being regarded as very high.
The higher the value of VIF the higher correlation between this variable and the rest. If the VIF value is higher than 10, it is usually considered to have a high correlation with other independent variables. However, the acceptance range is subject to requirements and constraints.
Multicollinearity is a statistical concept where several independent variables in a model are correlated. Two variables are considered to be perfectly collinear if their correlation coefficient is +/- 1.0. Multicollinearity among independent variables will result in less reliable statistical inferences.
In general, a VIF above 10 indicates high correlation and is cause for concern. Some authors suggest a more conservative level of 2.5 or above. Sometimes a high VIF is no cause for concern at all. For example, you can get a high VIF by including products or powers from other variables in your regression, like x and x2.
The general rule of thumb is that VIFs exceeding 4 warrant further investigation, while VIFs exceeding 10 are signs of serious multicollinearity requiring correction.
If the R-Squared for a particular variable is closer to 1 it indicates the variable can be explained by other predictor variables and having the variable as one of the predictor variables can cause the multicollinearity problem.
Among all these tests, Pearson’s coefficient and VIF are the most used tests for examining the presence of multicollinearity.
Multicollinearity makes it hard to interpret your coefficients, and it reduces the power of your model to identify independent variables that are statistically significant. These are definitely serious problems.
In general, the higher the R-squared, the better the model fits your data.
Collinearity is a linear association between two predictors. Multicollinearity is a situation where two or more predictors are highly linearly related. In general, an absolute correlation coefficient of >0.7 among two or more predictors indicates the presence of multicollinearity.
Multicollinearity does not increase bias, but it increases variance (overfitting).