WIF in IPUMS/earners definition

I’m interested in calculate the number of earners in family for a finer geographic division. The Census Table B19122 suggests that the ACS collect those info, but I have no luck finding on single variable indicating this.

The answer here About WIF(workers in family) in household dataset, and how to make it comparable to ACS aggregated datasets? seems to suggest that IPUMS did not include WIF in 2017. Since it’s been 9 years, I wanted to ask this question again. Is it still accurate to assume that I need to construct earner_flag by myself?

Thank you so much!

IPUMS NHGIS provides summary data from the U.S. decennial census and ACS, aggregated at a variety of geographic levels, including more granular levels than the geography variables available from IPUMS USA. You can search for tables summarizing ACS data on the number of workers in a family by using the NHGIS search filters to select the year(s) and geographic levels of interest. Then select “Labor Force and Employment Status” under topics. This brief video tutorial shows how to use the NHGIS data finder to search for tables. There are tables on workers in family available at levels as granular as the census tract, for example, Table B19122 Number of Earners in Family.

IPUMS USA provides microdata from the U.S. decennial census and ACS. Microdata are person-level data, where each row in the dataset represents a person or household, and each column is a variable providing information about the person or household. IPUMS USA does not provide the workers in family variable (WIF in the original ACS data). You can use the IPUMS USA search function within the extract system to search for variables you are interested in. Using other variables from IPUMS USA, it is possible to estimate the number of workers in each household or family at the time of the census or survey, but not over the past 12 months. The variables LABFORCE and EMPSTAT report labor force status and employment status, respectively. The variables SAMPLE and SERIAL together uniquely identify households. Within a household, the variable FAMUNIT indicates to which family each person belongs (using the IPUMS definition of a family unit). The variable CBSUBFAM reports the subfamily number of each person (using the Census Bureau definitions of subfamilies). I will pass your interest in this variable on to our IPUMS USA team, as they consider user needs and interest when prioritizing adding new variables to existing IPUMS USA samples.

I am not sure what you mean by constructing “earner_flag” yourself. If this is an ACS variable in the original data, I am unfamiliar with it and was unable to find it in any Census Bureau documentation. If you can provide more information about the flag or variable you are looking for, I may be able to better assist you.

Hi Isabel,

Thank you so much for your detailed response. I will check out IPUMS NHGIS for sure.

I’m primarily working with the ACS 2020-2024 5-year file right now, and I am interested in the number of earners in each household (which is a different concept from family). Right now, I’m defining a variable ‘earner_flag’ =1 if INCEARN is not 0000000 (i.e. No earnings). The value of INCEARN appears to be basically the same (up to 1 dollar difference) with INCWAGE+INCBUS00, which I believe is how CENSUS defines earning (wage/salary income + net income from self-employment).

I want to verify if this definition is correct by trying to replicate CENSUS Table B19122. I likely will have to use FAMUNIT to restrict the analysis to family. Would like to know if you have any feedbacks to the way I identify earners! Thank you so much!

My understanding is that you would like to use ACS microdata from IPUMS USA to construct a variable that reports the number of earners in each household. It sounds like your end goal is not to replicate the Census Bureau published table you referred to, rather you want to use this table as a benchmark to test whether your definition of an earner is consistent with the Census Bureau’s definition of an earner. Please feel free to provide more details or clarification if my interpretation of your work is not correct. I will provide some information on how you can estimate the number of earners in a household using IPUMS USA data and some other information about households and families that may be helpful.

First, we generally do not expect estimates produced with the public use microdata (e.g., data from IPUMS) to match official published estimates from the Census Bureau. These published estimates use the internally available microdata, while the publicly available data—the public use microdata series (PUMS)—are a subset of the internal data. The public use data have top codes imposed on most income variables, while official published estimates do not necessarily come from top-coded data. You can read more about why your calculations may not match official statistics on this IPUMS blog post.

I cannot find any definitive documentation confirming exactly how the Census Bureau defines earners for the purpose of creating the published table you referenced. Their list of subject definitions does not include earners. In my view, a reasonable assumption would be someone with earnings greater than zero. The Census Bureau typically defines earnings as income from wages and salary and from self-employment. Whether a restriction on employment is imposed on the definition of earners is not clear.

When attempting to approximately replicate Census Bureau estimates at the family level, you need to use the Census Bureau’s definition of families. The variable FAMUNIT identifies family units as defined by IPUMS, not as defined by the Census Bureau. You should not use FAMUNIT when comparing your estimates to Census Bureau published estimates. In the IPUMS USA extract system, you can find a number of variables that use Census Bureau definitions of families/subfamilies. See the family interrelationship variable group and the variables that mention Census Bureau classifications in parentheses. You can read our overview of subfamilies and family interrelationships here. To uniquely identify households using IPUMS USA data, combine the variables SAMPLE and SERIAL. To uniquely identify families as defined by the Census Bureau, combine the variables SAMPLE, SERIAL, and SUBFAM. Note that SUBFAM does not separate individuals unrelated to the household head from the household head and the head’s family if that unrelated individual does not have a parent or child living in the same household. The head of household and their family are coded SUBFAM=0 (not in subfamily). Unrelated individuals without a parent or child in the household will also be coded SUBFAM=0. This means you cannot use SUBFAM alone to identify families within households. I suggest using RELATE, relationship to household head, to remedy this.

As stated in the INCEARN variable description, this variable is the total of for the IPUMS variables INCWAGE, INCBUS, and INCFARM (for 1990) and for INCWAGE and INCBUS00 (for the 2000 census, the ACS, and the PRCS). All of these income variables are top coded. The codes section for each of these income variables provides information about the top codes. Note that a value of 0 in INCEARN is the NIU (not in universe) code, but this is not the NIU code for the other income variables listed. Be sure to check the codes section of all the income variables used to identify special codes.