Is empstat in 1910 and 1930 derived from the occupational/gainful employment questions used to create labforce? If this is true, why are there some individuals who are employed by empstat but considered “not in the labor force” by labforce ? In principle, should the use of empstat provide us indpendent information about those considered out of the labor force by labforce?
My understanding of the creation of labforce prior to 1940 is that it was based off reporting of a “gainful occupation.” In contrast, empstat in 1910 and 1930 asked if the respondent worked on the day of the census (or last working day). From the enumerator instructions displayed on IPUMS’ website, it looks as though this question concerning work was only asked to the subset of individuals who report a gainful occupation - both sets of instructions state to only ask the question of those reporting an occupation. However, when you crosstab these two variables in the microdata, there are some employed/unemployed persons in empstat who report being out of the labor force in labforce. In addition, in 1930 there are a few people considered “not in the labor force” by empstat but considered “in the labor force” by labforce .
Is the discrepancy due my misunderstanding of the data? Or is it due to some cleaning issue such as illegible occupations? Or is this enumerator error?
It looks like these cases are being caused by the slightly different universe statements for EMPSTAT and LABFORCE in 1910 and 1930. Specifically, the universe for EMPSTAT in 1910 includes all persons, except the self-employed and in 1930 includes all persons. On the other hand, the universe for LABFORCE in both 1910 and 1930 includes only persons who are age 16+. So, if you adjust the universe for EMPSTAT to exclude those who are aged 15 and below, these seemingly contradictory cases will disappear.
Thank you for your answer. This takes care of the lion’s share of difference, but not all. In particular, my question is with regards to those calssified as “not in labor force” via labforce, not as “NA”.
For example, if we use the 1910 1% sample and drop all individuals who are younger than 16 or have labforce equal to 0 (NA), we still get 3050 individuals who are employed/unemployed (codes 1 or 2) via empstat but “not in labor force” (code 1) via labforce. Similarly, in the 1930 5%, we get 42,948 who have a similar coding issue once we adjust for the universes.
Relative to the overall size of the samples, these are a small number of individuals. However, my concern is whether these differences reflect a more substantive difference in the source of the information and the type of work these variables are reflecting. Namely, I am not sure to what degree empstat is derived from occupational data and thus how much it is derivative of the labforce variable.
Your answer above would seem to indicate that empstat should match labforce in 1910 and 1930 exactly once the we drop those who are not in the universe for labforce. Correct me if this is an incorrect interpretation.
There are other reasons why the comparison between LABFORCE and EMPSTAT in 1910 and 1930 look different than in other years. One key reason is the differences in the universe statements in these years. Another reason is the source material for these questions. In particular, from 1850 to 1930, participation in the labor force (LABFORCE) is defined as reporting any “gainful occupation”, as recorded in OCC. For EMPSTAT, respondents were asked whether they were emplyed or unemployed in a specific referece period. In 1910, the referece period was on the day the census was taken: April, 15, 1910. In 1930, the reference period was the previous regular working day. Therefore, while these responses should overlap for most respondents there are undoutably some cases where the responses don’t align perfectly. For more information about this nuance see the Comparability Tabs for EMPSTAT and LABFORCE.