In this article I will explain how to interpret regression coefficients when dealing with variables that have been logged.
Why would we work with logged variables? Firstly, we might take the log of a non-linear model, to make it linear in parameters, to satisfy the Gauss-Markov assumptions (these are required for the OLS method of estimation to be the BLUE – the best linear unbiased estimator). Secondly, as we will see below, it can sometimes be easier to interpret coefficients which have been logged, as we can talk about percentage changes, which might make more sense than talking about unit changes, in some contexts.
Level-Level Regression
Suppose we have a normal linear regression:
Y = a + bX + e
Y is our dependent variable, a is the constant (intercept) term, b is the coefficient of interest and X is the independent variable. The error term is represented by e. This is known as a level-level regression.
In the current set-up, we would estimate this equation and find a value for a and b. We could then say that if X=0, Y=a (this is why a is known as the intercept term). We can also say that a unit increase in X is associated with a b unit increase in Y.
This makes sense, as if we look at the change in Y then it is clearly equal to b (differentiate Y and this is obvious – dy/dx = b).
To make this concrete let us assume that Y is wage, in thousands £, and X is years of experience in the labour market. We get some data and estimate a and b (this is quite a simple model, and note that we are only talking about “associations” or correlations and not implying causation here), finding that a=10 and b=2. Therefore, if a given individual had zero years of experience then, on average, they would earn £10,000. For every additional year of experience they had, their wage would increase, on average, by £2,000.
So a unit increase in X (years of experience) is associated with a b*unit increase in Y (wage).
Level-Log Regression
Now, let us suppose that we had the (natural) log of X:
Y = a + bln(X) + e
Now we interpret the coefficient as a % increase in X, results in a (b/100)*unit increase in Y. This is known as a semi-elasticity or a level-log model.
In our example, this would mean that a 1% increase in years of experience results in a £(b/100) increase in wage.
Log-Level Regression
Next, let us turn to a model where the dependent variable is logged but the independent variable is not:
ln(Y) = a + bX + e
This is known as a log-level model and the interpretation is that a unit increase in X results in a 100*b% increase in Y (we multiply by 100 because b is a percentage).
This is a rough approximation, assuming that b is small (approximately less than 0.15 in absolute value). More formally, we should exponeniate the coefficient, subtract one and multiply by 100: (exp(b)-1)*100.
This would mean that a year increase in experience is associated with a roughly 100*b% increase in wage.
Log-Log Regression
Our final model is a log-log model, with both dependent and independent variable appearing as (natural) logs:
ln(Y) = a + bln(X) + e
This is interpreted as a 1% increase in X results in a b% increase in Y.
Therefore, for a 1% increase in experience we would expect wages to rise by b%.
Maths
The above tells you how the interpretation works, this section is the boring bit which explains why it works, so skip if you’re not keen on maths!
First, we have our model which is (1) Y = a + b(X) + e and we then deal with an increase in X: (2) Y = a + b(X+ΔX) + e.
Let us take the difference of (1) and (2) (i.e. subtract (1) from (2)) so that we can find ΔY:
ΔY = (a + b(X+ΔX) + e) – (a + b(X) + e)
ΔY = (a + bX + bΔX + e) – (a + bX + e)
ΔY = bΔX
We want to find out the change in Y with respect to X, ΔY/ΔX:
ΔY/ΔX = b
Et voila! This is what we saw earlier, if we were to differentiate the simple level-level model.
Secondly, we have a model of the form (1) Y = a + bln(X) + e. Again we deal with a unit change in X: (2) Y = a + bln(X+ΔX) + e. Again we subtract these:
(2) – (1) => ΔY = (a + bln(X+ΔX) + e) – (a + bln(X) + e)
ΔY = bln(X+ΔX) – bln(X)
ΔY = b[ln(X+ΔX) – ln(X)]
ΔY = b[ln((x+Δx)/x]
The last line came from the fact that ln(a) – ln(b) = ln(a/b).
ΔY = b[ln(1 + (ΔX/X))]
This is approximately equal to
ΔY = (bΔX)/X
This comes from the fact that ln(1 + X) approximately equals X when X is small. This can be shown by taking a Taylor series expansion about x=0.
Finally, we can rearrange:
(ΔY/ΔX)*X = b
Note that ΔX/X is the formula for percentage change. Hence b is the change in Y for a 100% increase in X.
Thirdly, we have a model of the form: (1) ln(Y) = a + bX + e which we we exponeniate to get Y = exp^(a + bX + e). We subtract this from (2): Y = exp^(a + b(X+1) + e):
(2)-(1) => ΔY = exp^(a + b(X+ΔX) + e) – exp^(a + bX + e)
Note that the second term is obviously Y and the first term is exp^(bΔX)*Y (this comes from the fact that addition within an exponential is the same as multiplying the exponentials).
ΔY = Yexp^(bΔX) – Y
ΔY + Y = Yexp^(bΔX)
(ΔY + Y)/Y = exp^(bΔX)
Now we take logs of both sides:
ln[(ΔY + Y)/Y] = bΔX
Note that the left hand side equals ln(1 + (ΔY/Y)) which as we same above is approximately equal to ΔY/Y:
ΔY/Y = bΔX
ΔY/(ΔX*Y) = b
Hence, b equals the change in (ΔY/Y) for a unit change in X. So we multiply by 100 to get in percent.
Finally, we consider the log-log model: (1) ln(Y) = a + bln(X) + e, (2) ln(Y) = a + bln(X+ΔX) + e.
Taking the exponential and subtracting:
ΔY = exp^(a + bln(X+ΔX) + e) – exp^(a + bln(X) + e)
Now we are going to add and subtract bln(X) in the first exponential:
ΔY = exp^(a + bln(X+ΔX) + bln(X) – bln(X) + e) – Y
ΔY + Y= exp^(a + bln[(X+ΔX)/X] + bln(X) + e)
ΔY + Y = Y*exp^(bln[(X+ΔX)/X])
(ΔY + Y)/Y = exp^(bln(1+(ΔX/X)))
Take logs of both sides:
ln(1 + (ΔY/Y)) = bln(1 + (ΔX/X))
ΔY/Y = bΔX/X
(ΔY/ΔX)*(X/Y) = b
The left hand-side is an elasticity which tells us that for a 1% change in X we expect a b% change in Y.
Percentage Points
As a side note, let us consider what happens when we are dealing with ndex data. For example, suppose that we want to see the impact of employment rates on GDP:
GDP = a + bEmployment + e
Employment is now a rate, e.g. 80 percent of people are employed. So a unit increase in x is a percentage point increase. However, the model stays the same, so we have a unit increase in X (employment) is associated with a unit increase in Y (GDP). We would interpret this as a percentage point increase in employment is associated with a £b increase in GDP. Make sure not to get muddled up between percentages and percentage points!
Summary
Regression type | Regression | Interpretation |
---|---|---|
Level-Level | Y = a + bX + e | A unit increase in X results in a b*unit increase in Y |
Level-Log | Y = a + bln(X) + e | A percentage increase in X results in a (b/100)*unit increase in Y |
Log-Level | ln(Y) = a + bX + e | A unit increase in X results in a 100*b% increase in Y |
Log-Log | ln(Y) = a + bln(X) + e | A percentage increase in X results in a b% increase in Y |
How do you interpreting the intercept in a log log regression model
Good question! The answer is that the intercept is the value of log(y) in the case when x=1 [i.e. log(X)=0].
Another way to put this is that in the case when log(X)=0, exp(Y) = intercept. In practice, I am not sure that this is very useful.
how do you interpret the intercept in a log-level model? thanks
Hi, thank you for this useful post.
What happens if my dependent variable is a natural logarithm and my independent variable is a growth rate?
How would I interpret the coefficient if I use a growth rate??
In my example, the dependent variable is Log(House Price Index) and the independent variable is the M4 Money Supply Growth Rate.
Hi Alyssa,
Thanks for your question. So it seems like you have a log-(growth)level model: in which case, you would interpret a percentage point increase in X (M4 Money Supply Growth Rate) as a 100*b% increase in Y (house price index). Here you need to be careful that the house price measure is an index and not absolute prices. If you are comparing the index with the benchmark year (which usually equals 100) then this itself would be a growth rate.
Best,
Rhys
How do you interpret each of the coefficients in the following regression:
Log GDP = 0.2 + 0.32LP + 0.22S – 0.14 π
You’re going to need to provide a bit of explanation for the other variables. Presumably pi is inflation, what are LP and S?
This was such a helpful, easy to digest article. That said, I do have some questions as there is just so much information online. How do the following fit within the context of this article?
Let’s use employment noted by emp. If a formula has:
log emp
d emp
dlog emp
ln emp
lag emp
lead emp
What do these mean? How are they interpreted? Thank you.
How do you interpret log transformed coefficients that include +1 to include values of zero in the regression analysis? So: log (y +1) = log(x +1)