I recently attended a fantastic workshop on demand estimation and have been doing a lot of reading around this topic recently, so wanted to share an outline of the IO story on demand estimation.
Estimating underlying demand functions – that illustrate what happens to quantity purchased when prices change – is useful for a variety of policy analyses, such as estimation of merger control, the introduction of tariffs, welfare effects, product entry etc. In essence, we want to calculate the elasticities which exist between varying products. However, this is not easy, both conceptually and computationally.
To begin with, there is a fundamental problem of endogeneity. Let’s assume we have a dataset of all products sold in a given market (a market consists of both the product market and the geographic market*) and we know the price, total quantities and observable characteristics of the product. A first attempt to estimate a demand function could be to regress price on quantity sold. However, this misses the endogeneity between demand and supply. Remember, prices are determined by both demand and supply (for instance, higher costs will push up prices), so it is difficult to disentangle price differences without knowing something about supply. Simply estimating a regression of price and quantity sold won’t capture this endogeneity and will lead to biased coefficients. The key way to deal with this endogeneity is to introduce instruments which are correlated with price but not quantity demanded (we will come back to this later).
This led to the development of product-space approaches, or multi-stage budgeting, to estimate demand functions from economic first principles. One such models is the Almost Ideal Demand Systems. These models try to estimate price elasticities for all products but this leads to a computational problem from the curse of dimensionality. The more products we have in our model, the more own-price and cross-price elasticities we have to estimate. If the model has 2 products then we need to estimate 4 elasticities (2 own-price elasticities and 2 cross-price elasticities, although trivially these 2 cross-price elasticities will be symmetric). If the model has 20 products then we need to estimate 400 elasticities (i.e. x^2). Very quickly, we can see that the estimation problem explodes when we have several different products in a market.
Another pitfall of the product-space approach, is that it treats each product as the unit of analysis, which means we cannot evaluate what would happen if a new, unknown, product enters the market. This eliminates the possibility of estimating an important counterfactual.
Finally, the AIDS models are built up from economic theory based on a representative consumer model, which ignores heterogeneity in consumer preferences, something we might be interested in studying. Whilst there are some ways around this, these models become quite complicated.
Instead, another set of models tried to estimate demand by looking at the characteristics-space instead. Now, rather than focusing on an individual product, the approach boils down each product to a bundle of characteristics. So for fizzy drinks, we might consider the relevant characteristics of different products to be size, sugar content and flavourings. We can see that this approach therefore is less likely to suffer from the curse of dimensionality problem, as the number of characteristics is likely to be lower than the number of products. However, this clearly requires more data which needs to be collected on the various (observable) characteristics.
Another advantage of the characteristics-space approach is that it can be built from first principles. To do so, we posit that the individual utility from a given product is a function of product price, observed product characteristics, and unobserved product characteristics:
u_{ijt} = \delta_{jt} + \mu_{ijt} + \epsilon_{ijt} \\ \delta_{jt} = \alpha p_{jt} + x'_{jt}\beta + \zeta_{jt}
Each consumer (i) in a given market (t) then chooses a bundle of products (j) that maximises utility (u). Now, we can estimate this model just by knowing the distribution of preferences over characteristics. Delta tells us the average preferences across all individuals in markets (based on product characteristics), mu tells us the systematic heterogeneity (e.g. low income people don’t like products with high price), and epsilon is an idiosyncratic heterogeneity term. We can see that delta itself depends upon the product price, p, observable characteristics, x, and unobserved characteristics, zeta. These observable characteristics are those outlined above, such as sugar content and bottle size for fizzy drinks. Unobserved characteristics are those which cannot be easily captured in a data sense, so are unobserved for the econometrician, but likely matter to the consumer. For instance, this might be the colour of the fizzy drink, which might not be contained in the econometrician’s dataset, but which consumers care about in terms of their utility from choosing a fizzy drink. We can then aggregate over people’s choices to work out the aggregate market share for different products.
Pure Logit Model
Let’s continue, for now, by assuming that mu=0, such that there is no systematic heterogeneity, all consumers have the same preferences for the product. Clearly, this is a large assumption and we will show why we would want to make this assumption (ease of estimation) and the problems with doing so (unrealistic estimates of elasticities).
This means that, with a statistical assumption about the distribution of the error term (which, for convenience, we will assume to be independent and identically distributed type 1 extreme value), we can express the market share of a product as a function of the deltas of different products (i.e. price, observable characteristics and unobserved characteristics). This looks something like:
s_{jt} = \frac{exp(\delta_{jt})}{1+\sum{exp(\delta_{kt})}}
This model is often called the multinomial logit, the mixed logit, or the random coefficients logit. We can then use something called Berry’s trick to write this function in terms of the log of the ratio of market share of each product compared with the market share of the outside option (i.e. not buying a product defined as within the market). This gives us a linear equation which can easily be estimated using OLS (note, that we are implicitly assuming all consumers face the same price, i.e. there is no price discrimination occurring):
log(\frac{s_{jt}}{s_{0t}}) = \alpha p_{jt} + x'_{jt}\beta + \zeta_{jt}
We can interpret the coefficient of alpha as the amount of utils gained per dollar and beta as the number of utils gained from a particular product characteristic. Clearly, this isn’t very useful to know, so it might instead be more interesting to divide beta by alpha to report the dollar willingness to pay for a particular product characteristic, as well as the elasticities of demand. The price elasticity of demand can easily be calculated as:
\eta_{jjt}=\alpha p_{jt} (1-s_{jt})
However, whilst it is possible to estimate this linear equation using OLS, it is likely that price is correlated with the unobserved quality term (zeta). This is because zeta is observed by consumers and firms, but not the econometrician. As a firm, if you know that your product has high quality you are likely to charge a higher price. Hence, we would expect price and zeta to be indeed correlated, meaning our OLS estimates are biased. This will lead to the estimate on alpha being biased towards zero. As usual, with endogeneity we can take an instrumental variables (IV) approach to dealing with the problem. Here, we find an instrument which is correlated (relevance) with the endogenous variable (price) but is not correlated with the unobserved quality term (exclusion). A discussion on possible instruments is held in footnote **. Additionally, we can also include product and market fixed effects to eliminate some bias (although this is unlikely to completely eliminate the bias, as zeta will typically vary by product and market).
Let’s assume that we can satisfactorily instrument price and find unbiased estimates for our coefficients. The estimated elasticities are constant across products: an increase in the price of product j results in the same diversion from all other products in the model, regardless of how similar the products are. Hence, consumers will switch from a high quality product to a low quality product at the same rate as they would switch from a high quality product to a similar high quality product, which clearly seems unrealistic. Unfortunately, this is a property of the pure logit model and the restrictive assumption we made that mu=0. We can see from the formula for the cross-price elasticity, given below, that the elasticity does not depend upon the characteristics of the product (only price and market shares):
\eta_{jkt} = -\alpha p_{kt} s_{kt}
This is sometimes known as the red-bus blue-bus problem, which highlights that the independence of irrelevant assumptions (IIA) property doesn’t hold. In this problem, if we started with a situation where it was possible to get to work by a red bus or by car (both with 50:50 market share) and then introduced the option of travelling by blue bus, we would expect red and blue bus to have 25% market share and the car to retain its 50% market share. However, using the logit model, we would instead predict a 33% market share for each transport method, despite people not really caring about the colour of the bus they take to work!
There are two potential solutions to dealing with this IIA problem and recovering more sensible cross-price elasticities. The first approach is to create a nested logit model, where we put products into nests, depending on their similarity. Within each nest, the above cross-price elasticities would hold (i.e. depends upon market share, such that IIA fails within nests) but across nests we would have different cross-price elasticities. For example, we could put the different coloured buses into one nest and the car in another nest, and would retain more realistic cross-price elasticities. However, this assumes that we can accurately nest products along sensible dimensions.
The second approach, which does not rely on this somewhat arbitrary assignment, is to incorporate consumer preference heterogeneity. This means we no longer assume that mu=0 and we have what is called the random coefficients logit model. Remember: at the moment, we aren’t incorporating any differences in consumer preferences and so have a similar pitfall as the representative consumer models we discussed above.
Random Coefficients Logit Model
Let us reincorporate preference heterogeneity back into the model, such that utility is given as:
u_{ijt} = \delta_{jt} + \mu_{ijt} + \epsilon_{ijt}
It is typical to express mu as a function of observed demographics (pi), income (y), and unobserved preferences (sigma):
\mu_{ijt} = x'_{jt}(\Sigma \nu_{it} + \Pi y_{it})
We can see that this systematic heterogeneity occurs across product characteristics, x. Pi shifts people’s preferences for product characteristics according to demographic characteristics. For instance, people have heterogeneous sensitivities to price which might vary over income. Sigma captures unobserved shifts from demographics (a bit like zeta).
Now, we need more information than just product data, we also need data on consumers from demographic sources such as the census or survey data. Furthermore, estimation is now complex, as with preference heterogeneity, the Berry inversion no longer works and there is not an analytical expression (closed form) to map market shares directly to mean utilities. Instead, we use an estimation technique devised by Berry, Levinsohn and Pakes called the BLP contraction, which essentially uses a guess of sigma and pi to find mean utilities. This helps with the IV approach, given that the endogeneity occurs in a non-linear fashion.
By including preference heterogeneity and using the BLP estimation technique, we can estimate demand and find realistic elasticities, thereby solving our initial question of how to estimate demand! Additionally, we can use these estimated elasticities to impute a firm’s marginal costs. Assume that we are dealing with a single-product firm, then the Lerner Index tells us that the price-cost mark-up equals the inverse own-price elasticity of demand. We know price and we know the elasticity from the demand model, so we can thus back out marginal cost, if we are happy with our assumption about the market structure that leads to the price-cost mark-up relationship.
The advantage of the BLP method is that it is flexible enough to handle a variety of estimation problems and imposes relatively little structure to the theoretical model. The cost is the complexity of the model and the data requirements, in particular the need for exogeneous instruments to remove the fundamental price endogeneity issue that plagues demand estimation. However, the BLP approach cannot, yet, handle dependence of current choices on past choices (i.e. consumers are sticky due to high switching costs) or interactions between the choice of different individuals (e.g. consumers purchase something because their friend also purchased it, or some other network effect).
Conclusion
To wrap things up, we have seen the development of the literature on demand estimation, which started by trying to estimate demands in product space and running into the problem of the curse of dimensionality and limits to the counterfactuals which could be studied. Clearly, if one were to examine a market where there were a small number of products and did not want to study the effect of product entry, then this methodology may still be feasible and appropriate. However, generally we want to study markets with a large number of products and want to examine what happens if a new product enters the market. This led to the development of discrete choice models based on the underlying characteristics of products. This helped remove the curse of dimensionality problem and meant new product entry could be studied, as the new product was simply a different bundle of characteristics that have already been modelled. To easily estimate the model, the literature ignored preference heterogeneity to estimate simple logit demand functions which suffered from unrealistic estimates of elasticities. Two approaches were used to solve this, nested models and the inclusion of preference heterogeneity. The latter approach then needed to solve the estimation problem, which was a lot trickier with the inclusion of consumer heterogeneity. This led to the development of the BLP estimation technique. The literature is currently developing by using micro moments to add additional heterogeneity, some relevant sources for this are included below.
Footnotes
* It is not necessarily straightforward to determine the market size but this can have important implications for the analysis, as the definition of the market determines the size of the outside choice and hence affects our estimates. For instance, if we want to determine the market size for fizzy drinks in a certain geographic area, we might assume that the population would reasonably drink X fizzy drinks a day and use this as a basis to calculate potential market size. Usually, robustness checks would be conducted, testing different market sizes to see how this affects results.
Note that even defining product markets can be tricky. For instance, what counts as a fizzy drink. In other contexts, we need to look at things like the hypothetical monopoly test. In this context, we might reasonably use the product market definitions provided by our data.
** A number of different instruments tend to be used when estimating demand. (1) perhaps the most useful instrument is a cost or mark-up shifter, a variable which is excluded from demand but shifts the costs of the product up and hence is highly correlated with price (e.g. input prices). However, in reality, input prices do not tend to vary much within a single market, so variation might be low, making this a weak instrument. (2) Hausman instrument: the price of the same good in neighbouring markets, this assumes that demand shocks are not correlated across but markets but that cost shifts are correlated. (3) BLP instrument: the observed product characteristics (assumed exogenous) of rivals’ products, this works as an instrument because the characteristics of the competing products affects the mark-up of the firm’s product. (4) Waldfogel instrument: average consumer characteristics in nearby locations, if we assume firm’s operate under uniform pricing. You can read more about instruments here.
References
Ackerberg, D., Benkard, C.L., Berry, S. and Pakes, A., 2007. Econometric tools for analyzing market outcomes. Handbook of econometrics, 6, pp.4171-4276.
Berry, S.T. and Haile, P.A., 2021. Foundations of demand estimation. In Handbook of industrial organization (Vol. 4, No. 1, pp. 1-62). Elsevier.
Berry, S.T., Levinsohn, J.A. and Pakes, A., 1993. Automobile prices in market equilibrium: Part I and II.
Conlon, C. and Gortmaker, J., 2020. Best practices for differentiated products demand estimation with pyblp. The RAND Journal of Economics, 51(4), pp.1108-1161.
Conlon, C. and Gortmaker, J., 2023. Incorporating Micro Data into Differentiated Products Demand Estimation with PyBLP (No. w31605). National Bureau of Economic Research.
Nevo, A., 2000. A practitioner’s guide to estimation of random‐coefficients logit models of demand. Journal of economics & management strategy, 9(4), pp.513-548.
Nevo, A., 2001. Measuring market power in the ready‐to‐eat cereal industry. Econometrica, 69(2), pp.307-342.
Pinter, F., 2021. Demand Estimation Notes. https://www.frankpinter.com/notes/Demand_Estimation_Notes.pdf
Rasmusen, E.B., 2007. The BLP method of demand curve estimation in industrial organization. https://host.kelley.iu.edu/riharbau/repec/iuk/wpaper/bepp2006-04-rasmusen.pdf