Huge Discrepancy with Historical Data

I am working with historical ACS data from 1950 onward to look at population trends in New York City. When working with the 1960 data and the CITY variable, I got a pretty accurate total count of people living in New York (I found 7,782,400 compared to 7,781,784 according to the Census estimate). When I ran the same calculation with the same variable for 1950, however, I found 8,735,653 compared to the Census estimate of 7,891,957.

Is there something I am doing wrong or is there an existing issue with the 1950 1% sample? I have included my code below for reference.

ACS1950 ← ACS1950%>%
subset(CITY==4610)

with(ACS1950[YEAR==1950], sum(PERWT))
with(ACS1950[YEAR==1960], sum(PERWT))

I believe that you’re referring to the 1950 and 1950 census data; the ACS was first conducted in 2001. The issue that you’re encountering is that CITY identifies the city of residence for households located in identifiable cities. While one might expect New York City to be an identifiable city, especially given that there are observations that are assigned to that city, the comparability tab for the variable provides more information on sample-by-sample identifiability. Regarding the 1950 1% sample, it states that only the central cities of metropolitan areas are identified. Moreover, if a metropolitan area had more than one central city, then all central cities of this metro will share the same city code (i.e., the code of the largest city). The accompanying list identifies the following groups of central cities and includes “New York, NY, Newark, Jersey City, and Paterson, NJ”. This means that the 8,735,653 figure that you are observing for 1950 is actually referring to the combined populations of all four of these cities. Luckily, the solution in this case is simple: you can filter your observations to only include respondents in New York State (STATEFIP = 36). Doing so, I find a population of 7,991,312 for the NY State portion of CITY = 4610 (only 1.26% greater than the Census estimate you quote). There does not appear to be such as issue with the IPUMS 1960 5% sample.

You might additionally be interested in using the recently released 1950 full count file that contains all enumerated persons from the 1950 census. In this sample, the city of residence is given for households in any city with 10,000+ inhabitants. While this is a preliminary release with several noted issues, the estimate I get for the population of NYC using this sample is even more accurate at 7,932,929 (0.5% greater than the Census estimate).

1 Like