In this article I will explain how to interpret regression coefficients when dealing with variables that have been logged.
Why would we work with logged variables? Firstly, we might take the log of a non-linear model, to make it linear in parameters, to satisfy the Gauss-Markov assumptions (these are required for the OLS method of estimation to be the BLUE – the best linear unbiased estimator). Secondly, as we will see below, it can sometimes be easier to interpret coefficients which have been logged, as we can talk about percentage changes, which might make more sense than talking about unit changes, in some contexts.
Suppose we have a normal linear regression:
Y = a + bX + e
Y is our dependent variable, a is the constant (intercept) term, b is the coefficient of interest and X is the independent variable. The error term is represented by e. This is known as a level-level regression.
In the current set-up, we would estimate this equation and find a value for a and b. We could then say that if X=0, Y=a (this is why a is known as the intercept term). We can also say that a unit increase in X is associated with a b unit increase in Y.
This makes sense, as if we look at the change in Y then it is clearly equal to b (differentiate Y and this is obvious – dy/dx = b).
To make this concrete let us assume that Y is wage, in thousands £, and X is years of experience in the labour market. We get some data and estimate a and b (this is quite a simple model, and note that we are only talking about “associations” or correlations and not implying causation here), finding that a=10 and b=2. Therefore, if a given individual had zero years of experience then, on average, they would earn £10,000. For every additional year of experience they had, their wage would increase, on average, by £2,000.
So a unit increase in X (years of experience) is associated with a b*unit increase in Y (wage).
Now, let us suppose that we had the (natural) log of X:
Y = a + bln(X) + e
Now we interpret the coefficient as a % increase in X, results in a (b/100)*unit increase in Y. This is known as a semi-elasticity or a level-log model.
In our example, this would mean that a 1% increase in years of experience results in a £(b/100) increase in wage.
Next, let us turn to a model where the dependent variable is logged but the independent variable is not:
ln(Y) = a + bX + e
This is known as a log-level model and the interpretation is that a unit increase in X results in a 100*b% increase in Y (we multiply by 100 because b is a percentage).
This is a rough approximation, assuming that b is small (approximately less than 0.15 in absolute value). More formally, we should exponeniate the coefficient, subtract one and multiply by 100: (exp(b)-1)*100.
This would mean that a year increase in experience is associated with a roughly 100*b% increase in wage.
Our final model is a log-log model, with both dependent and independent variable appearing as (natural) logs:
ln(Y) = a + bln(X) + e
This is interpreted as a 1% increase in X results in a b% increase in Y.
Therefore, for a 1% increase in experience we would expect wages to rise by b%.
The above tells you how the interpretation works, this section is the boring bit which explains why it works, so skip if you’re not keen on maths!
First, we have our model which is (1) Y = a + b(X) + e and we then deal with an increase in X: (2) Y = a + b(X+ΔX) + e.
Let us take the difference of (1) and (2) (i.e. subtract (1) from (2)) so that we can find ΔY:
ΔY = (a + b(X+ΔX) + e) – (a + b(X) + e)
ΔY = (a + bX + bΔX + e) – (a + bX + e)
ΔY = bΔX
We want to find out the change in Y with respect to X, ΔY/ΔX:
ΔY/ΔX = b
Et voila! This is what we saw earlier, if we were to differentiate the simple level-level model.
Secondly, we have a model of the form (1) Y = a + bln(X) + e. Again we deal with a unit change in X: (2) Y = a + bln(X+ΔX) + e. Again we subtract these:
(2) – (1) => ΔY = (a + bln(X+ΔX) + e) – (a + bln(X) + e)
ΔY = bln(X+ΔX) – bln(X)
ΔY = b[ln(X+ΔX) – ln(X)]
ΔY = b[ln((x+Δx)/x]
The last line came from the fact that ln(a) – ln(b) = ln(a/b).
ΔY = b[ln(1 + (ΔX/X))]
This is approximately equal to
ΔY = (bΔX)/X
This comes from the fact that ln(1 + X) approximately equals X when X is small. This can be shown by taking a Taylor series expansion about x=0.
Finally, we can rearrange:
(ΔY/ΔX)*X = b
Note that ΔX/X is the formula for percentage change. Hence b is the change in Y for a 100% increase in X.
Thirdly, we have a model of the form: (1) ln(Y) = a + bX + e which we we exponeniate to get Y = exp^(a + bX + e). We subtract this from (2): Y = exp^(a + b(X+1) + e):
(2)-(1) => ΔY = exp^(a + b(X+ΔX) + e) – exp^(a + bX + e)
Note that the second term is obviously Y and the first term is exp^(bΔX)*Y (this comes from the fact that addition within an exponential is the same as multiplying the exponentials).
ΔY = Yexp^(bΔX) – Y
ΔY + Y = Yexp^(bΔX)
(ΔY + Y)/Y = exp^(bΔX)
Now we take logs of both sides:
ln[(ΔY + Y)/Y] = bΔX
Note that the left hand side equals ln(1 + (ΔY/Y)) which as we same above is approximately equal to ΔY/Y:
ΔY/Y = bΔX
ΔY/(ΔX*Y) = b
Hence, b equals the change in (ΔY/Y) for a unit change in X. So we multiply by 100 to get in percent.
Finally, we consider the log-log model: (1) ln(Y) = a + bln(X) + e, (2) ln(Y) = a + bln(X+ΔX) + e.
Taking the exponential and subtracting:
ΔY = exp^(a + bln(X+ΔX) + e) – exp^(a + bln(X) + e)
Now we are going to add and subtract bln(X) in the first exponential:
ΔY = exp^(a + bln(X+ΔX) + bln(X) – bln(X) + e) – Y
ΔY + Y= exp^(a + bln[(X+ΔX)/X] + bln(X) + e)
ΔY + Y = Y*exp^(bln[(X+ΔX)/X])
(ΔY + Y)/Y = exp^(bln(1+(ΔX/X)))
Take logs of both sides:
ln(1 + (ΔY/Y)) = bln(1 + (ΔX/X))
ΔY/Y = bΔX/X
(ΔY/ΔX)*(X/Y) = b
The left hand-side is an elasticity which tells us that for a 1% change in X we expect a b% change in Y.
As a side note, let us consider what happens when we are dealing with ndex data. For example, suppose that we want to see the impact of employment rates on GDP:
GDP = a + bEmployment + e
Employment is now a rate, e.g. 80 percent of people are employed. So a unit increase in x is a percentage point increase. However, the model stays the same, so we have a unit increase in X (employment) is associated with a unit increase in Y (GDP). We would interpret this as a percentage point increase in employment is associated with a £b increase in GDP. Make sure not to get muddled up between percentages and percentage points!
|Level-Level||Y = a + bX + e||A unit increase in X results in a b*unit increase in Y|
|Level-Log||Y = a + bln(X) + e||A percentage increase in X results in a (b/100)*unit increase in Y|
|Log-Level||ln(Y) = a + bX + e||A unit increase in X results in a 100*b% increase in Y|
|Log-Log||ln(Y) = a + bln(X) + e||A percentage increase in X results in a b% increase in Y|