The Data-Generating Process

John P. Hussman, Ph.D.

For anyone who works to infer information from a broad range of evidence, one of the important aspects of the job is to think carefully about the structure of the data – what is sometimes called the “data-generating process.” Data doesn’t just drop from the sky or out of a computer. It is generated by some process, and for any sort of data, it is critical to understand how that process works.

For example, one of the moments of market excitement last week was the reported jump in new housing starts for September. But later in the week, investors learned that there was a slump in existing home sales as well. If we just take those two data points at face value, it’s not clear exactly what we should conclude about housing. But the story is clearer once we consider the process that generates that data.

One part of the process is purely statistical. The housing data that is reported each month actually uses monthly data at an annual rate, so the jump from 758,000 to 852,000 housing starts at an annual rate actually works out to a statement that “During September, in an economy of about 130 million homes, about 100 million which are single detached units, a total of 9,500 more homes were started than in August – a fluctuation that is actually in the range of month-to-month statistical noise, but does bring recent activity to a recovery high.” Now, in prior recessions, the absolute low was about 900,000 starts on an annual basis, rising toward 2 million annual starts over the course of the recovery. The historical peak occurred in 1972 near 2.5 million starts, but the period leading up to 2006 was the longest sustained increase without a major drop. In the recent instance, housing starts bottomed at 478,000 in early 2009, so we’ve clearly seen a recovery in starts. But the present level is still so low that it has previously been observed only briefly at the troughs of prior recessions. 

The second part of the process is important to the question of what is sustainable. Here the question to ask is how and why does a decision to “start” a house occur? According to CoreLogic, about 22% of mortgages are underwater, with mortgage debt that exceeds the market value of the home. Likewise, banks have taken millions of homes into their own “real-estate owned” or REO portfolios, and have dribbled that inventory into the market at a very gradual rate. All of that means that the availability of existing homes for sale is far smaller than the actual inventory of homes that would be available if underwater homeowners were able, or banks were willing, to sell. Accordingly, much of the volume in “existing home sales” represents foreclosure sales, REO and short-sales (sales allowed by banks for less than the value of the outstanding mortgage). That constrained supply of homes available for sale is one reason why home prices have held up. At the same time, constrained supply means that new home buyers face higher prices and fewer choices for existing homes than they would if the market was actually clearing properly. Given those facts, buyers who are able to secure financing (or pay cash) often find it more desirable to build to their preference instead of buying an existing home. It’s not clear how many of these starts represent “spec” building by developers, but it’s interesting to note that the average time to sell a newly completed home has been rising, not falling, over the past year.

