Why are there fewer counties in the 1950 1% sample compared to the 1940 1% sample?


I downloaded data from 1930-2010.

I used the 1% or 5% samples.

In 1930 and 1940 using the 1% samples, I have roughly 3,000 observations at the state-county level.

In 1950 1% sample I get 145, in 1960 5% sample I get 435 obseravations and in 1970 1% sample I get 145 observations.

Am I doing something incorrect? Or, was there a change in the data/sampling that leads to fewer counties in later years?




You haven’t done anything incorrect. Beginning in 1950, the lowest level geography identifiable in public use data is the PUMA (public use microdata area). PUMAs are sometimes identical to counties, but often not. Therefore, some counties in some samples, from 1950 onward, are able to be identified. More details about this are described on the COUNTYFIPS variable description. This recent blog post explains this issue and offers some ideas for alternatives.