Analyzing Industry and Occoupational Data, MSA

Good afternoon IPUMS community,

I’m working on a project that examines janitors (OCC code 7690) in the Miami-Fort Lauderdale-West Palm Metro area. I’m using ACS five year data, 2013-2017. I’m limited to working in excel.

For starters, I’m trying to replicate some basic tables produced by the census, to make sure I’m doing things correctly.

I’m having trouble recreating ACS Table C24050: Industry by occupation for civilian employed population above age 16.

In my IPUMS excel table, I’m summarizing PERWT to obtain an estimate of workers in all industries. I’m excluding industry code 0, which eliminates persons less than 16 years old/unemployed who never worked/NILF who last worked more than 5 years ago, as well as industry codes 9670-9920, representing military and unemployed persons.

My result is 3,408,762 observations, significantly larger than ACS’ 2,858,792 estimate, and well outside it’s MOE.

Do folks have advice in terms of what mistakes I’m making? Do I need to divide by a certain amount given that I’m summarizing data for all five years? Is it incorrect to simply sum PERWT? Any advice would be appreciated. Thank you!

To be more helpful, here is an image of the ACS Table I’m referring too:

The issue is your method for identifying employed people. Industry codes are not a good indicator of employment status - anyone who has been employed at all in the past 5 years has a non-zero industry code. You want to use the EMPSTAT variable (in fact EMPSTATD, the detailed version which is automatically downloaded when you select EMPSTAT). Civilian employed people will have codes 10 or 12. When I did this for the Miami metro, I got an overall total of 2,842,510, only 0.5% off from FactFinder. In general we do not expect to be able to replicate FactFinder numbers exactly using public-use microdata. See this page for more info on why.

Thank you Matt, your answer was informative and in-depth. Problem solved!