Interpreting Coefficients

Need to Know

Remember practical vs statistical significance!

Interpreting Coefficients

Interpreting regression coefficients substantively involves understanding the relationship between the independent variable and the dependent variable in terms of their real-world meaning.

As with our other tests, the relationship can be characterized by its significance, direction, and magnitude. 

Significance: How confident are we that his relationship can be generalized to the population?

Direction: Does changes to the independent variable lead to an increase or decrease in the dependent variable.

Magnitude: To what extent do we expect Y to change, given a change in X.

Interpreting Significance (and insignificance)

As with our other tests, you can assess the significance of a coefficient by looking at its p-value. While our critical value of .05 is somewhat arbitrary, output in R will mark tests that are statistically significant at the .05 level with at least one *. 

Remember: tests with p values of greater than .05 do not provide enough evidence to conclude that there is or is not a relationship or effect. 

Note: Be careful when interpreting the magnitude and direction for insignificant coefficients. 

Note: We often designate significant effects using parentheticals at the end of the sentence (i.e., p < .05)

Interpreting the Intercept

The intercept can be understood as the expected or predicted value of Y given that X is 0. Note that the substantive interpretation will depend upon if the IV is quantitative or qualitative.

Reminder: We are always predicting change in the dependent variable!

The interpretation of the coefficient depends upon the variable type

Consider the example from a dig site showing the relationship between dinosaur's femur length and body mass. 

For a one cm increase in femur length, we expect body mass to increase by 49 KG, holding all else constant (p < .05)

The expected weight for a dinosaur whose femur is 0 cm long is 255 kgs. 

Interpreting coefficients for Quantitative IVs

The coefficient of a quantitative independent variable in a regression model represents the change in the dependent variable that is expected given a one-unit change in the independent variable, while holding all other variables constant.

Take note of this formulaic explanation: For a one-unit increase in X, we can expect Y to increase/decrease by BETA units, holding all else constant (p<.05). 

The y-intercept can be interpreted as the expected value of Y when X is 0. 


The boxplot shows the distribution of SAT scores for two types of high schools - public and Catholic. The x-axis shows the type of high school, and the y-axis shows the SAT score. The box for each school type shows the median, interquartile range, and range of SAT scores. The red line in the middle of each box represents the median SAT score. The whiskers extend to the minimum and maximum non-outlier values, and any points outside the whiskers are considered outliers. The text annotation at the top right shows the unstandardized regression coefficient (b), which measures the difference in SAT score between Catholic and public high schools.

Interpreting coefficients for Qualititative IVs

Because a 1-unit increase in the qualitative variable doesn't make any substantive sense, we instead interpret the direction relative to a reference category. The reference category is the qualitative category left out of the regression against which we compare our regression coefficient.

The significance test is between the specified group and the reference group. In this case, we cannot be confident that public and catholic school seniors score differently on the SAT (p > .05).

Because the reference category is not included in the regression, the y-intercept is the expected value of Y for the reference category. 

For binary qualitative variables, the reference category is often implicit or unstated. For example, the regression output on the left doesn't show the reference category "catholic."

Relative to Catholic School seniors, the model predicts that public school graduates score 35.75 lower on the SAT.

Catholic school seniors are estimated to score 1215. 

Changing the reference category to "public school seniors" would give the same substantive conclusion. Relative to Public school seniors, the model predicts that catholic school graduates score 35.75 points higher on the SAT.

Public school seniors are expected to score 1179. 

Should Know

Interpet units

It's important to note that the substantive interpretation of regression coefficients depends on the units of measurement of the independent and dependent variables. 

For example, suppose the independent variable "time spent studying" is measured in hours and the dependent variable "score on a test" is measured in points. In this case, the coefficient represents the change in points associated with a one-hour change in time spent studying. 

I.e., For every hour you spend studying, your score on a 200-point test will increase by 20 points.

However, if the independent variable is measured in minutes and the dependent variable is measured in percent, the interpretation of the coefficient would be entirely different.

i.e., For every minute you spent studying, your score on a math test measured out of 100 percent will increase by .17 percent. 


Which do you think is more substantive?

Interpreting Qualititative IVs with more than two categories

Specifying the reference is especially important if the independent, qualitative variable has more than 2 categories.

Relative to those whose major is in the business school, the employer-rated people skills among those whose major is in the engineering school is rated 2.25 points lower.  

Relative to those whose major is in the business school, the employer-rated people skills among those whose major is in the humanities school is rated 1.56 points higher.  

Employers rate business school majors' people skills as 5.41.

Interpreting Standardized Quantitative Independent Variables

When quantitative independent variables are standardized, they are transformed to have a mean of 0 and a standard deviation of 1, which makes them more comparable to other standardized coefficients.

Standardized regression coefficients represent the change in the dependent variable that occurs when the independent variable increases by one standard deviation, while holding all other independent variables constant. 

Calculating estimated differences between two categories that aren't the reference category

If Engineering majors are rated 2.25 points lower than Business Majors, and Humanities Majors are rated 1.56 points higher, then those whose major was in the business school are rated 3.81 points lower than those who majored in the humanities (2.25+1.56). 

However, note that we don't have a significance test for this relationship. This is one reason among many to think carefully about the reference category! 

Interpreting the Intercept

Whether or not the intercept needs to or can be substantively interpreted depends upon the model at hand. There are many perspectives on intercepts, but suffice it to say that you should take care when interpreting intercepts - especially for non-specialists.

Think carefully when substantively interpreting intercepts.  

Interpreting significance using confidence intervals

Does the confidence interval include the possibility that the population parameter for the slope is 0?

Could Know

Inferences in Regression

We use statistics to generalize from the sample regression line (our predicted equation line) to a theoretical population regression line

Calculating Confidence Intervals of the slope

The confidence interval of the estimate represents a range of potential regression lines that we are to some degree confident includes the population regressioon line. 

Calculating t statistic

R uses this t-value to determine the p value. 

Standardizing with Qualitative Independent Variables

Generally, most scholars will agree that it isn't best practice to standardize qualitative or categorical independent variables. 

Fitting Non-linear Relationships 

When the relationship between X and Y is not linear, you can transform the Y or X variable to better match the relationship.  

How do we determine which transformation will best fit a non-linear relationship? Look at the residuals. 


Interpreting non-linear regressions

Mathematically, interpreting models with non-linear relationships is the same. However, it is substantively more complicated because knowing whether an increase in X will lead to an increase or decrease (and to what extent) becomes dependent on much more context.  

For this class, it is best to examine a plot, create hypothetical examples and calculate the expected values using observed values that make sense, and use deductive reasoning.

Note: Some non-linear transformations change the interpretation of the intercept, but that change is beyond the scope of this class. 

Polynomials

In the social sciences, it is common for a relationship to be best represented by a polynomial function. Simply, polynomials include more than one term that each contains or represents the same variable. 

expected income = 6867+1135(age)-11(age*age)

Polynomials can help to model many curvilinear relationships - not just the inverse U shape shown on the left. 

Log-Linear Model

In the log-linear model, we predict the natural log of the dependent variable. 

When the dependent variable is logged, we are predicting a percent change in Y.

If the independent variable is logged, then we are predicting the expected change in Y in its native unit given a 1 percent increase in X.

If both the independent and dependent variables are logged, then we are predicting a percent change in Y given a percent change in X. 

Logged models describe a model where all variables are logged.