Sample Weights Approach Overtime for Decennial and CPS-ASEC

For 1940 through 1993, we are using income of data from the IPUMS USA and IPUMS CPS ASEC to create our own estimate of family market income by summing the income variables for households with dependent children. We use these estimates to create our own counts of families living in poverty.

For 1940, 1960, and 1970 Decennial, we use the household weight (HHWT). For 1950 Decennial, we use the sample line weight (SLWT). For CPS 1977-1993, we use the ASEC family weight (ASECFWT).

Is this weighting approach acceptable?

When running this analysis, you should be aware that the terms household and family are not synonymous in the data. Households refer to an address or dwelling where respondents reside and can include one or multiple families of related individuals. I assume that you are using FTOTINC in your measure of family income. If so, you should note that this value only includes the incomes of all members of one’s family as defined by FAMUNIT. This complicates your question regarding weighting since there is no family weight in the decennial census data. You might consider modifying your analysis to focus on households rather than families or use the provided household weight to weigh families with the caveat that the weight was not constructed for this type of analysis. In both cases, you will use HHWT as the weight in your analysis. You will also need to restrict your analysis to one member per household/family to not overcount these based on the number of respondents in each. For households, you can simply restrict your sample to PERNUM =1 after creating your income estimate. For families, you will want to restrict your sample to include only single unique combinations of SAMPLE, SERIAL, and FAMUNIT.

Regarding the 1950 decennial, I assume you want to use SLWT because you are using INCTOT, or another individual-level income measure. However, doing so will not give you a household or family-level estimate since these are individual income measures and were only asked of sample-line persons (see this IPUMS USA documentation for more information about sample line respondents in 1950). Your income variable therefore only includes income from respondents who were the sample-line person. Your process would therefore undercount household/family income and completely skip households/families where there is no sample-line person. Moreover, SLWT is a person-level weight designed to produce person-level estimates. One way that you might try to work around this is to restrict your sample in this year to households/families where the household head is the sample-line person (i.e. PERNUM = 1 & SLREC = 2). The households/families in your sample would then be broadly comparable, although you still would not be capturing total household/family income. This is the solution implemented by IPUMS for FTOTINC in 1950 where the variable is reported only for families with a sample-line household head. You would then follow the filtering and weighting procedure outlined in the previous paragraph by using HHWT.

Running this analysis using CPS ASEC data is much more straightforward since, as you indicate, a family weight does exist. FTOTVAL is also recorded for all persons. For a family-level analysis, you will still want to restrict your sample to single unique combinations of YEAR, MONTH, SERIAL, and FAMUNIT.

1 Like

Ivan – Thank you for the detailed response! I do have a few clarifications questions.

We were aware that the terms “household” and “family” are not synonymous in the data. Thus, as you describe to do, we have attempted to restrict our sample to the family level.

However, we have attempted to restrict our sample to the family level in both CPS-ASEC and Decennial using a combination of YEAR, SERIAL, and FAMUNIT for both. Is this acceptable? Or should it always be SAMPLE, SERIAL, and FAMUNIT for USA/Decennial and YEAR, MONTH, SERIAL, and FAMUNIT for CPS-ASEC?

We are not using any of the IPUMS provided income total variables (i.e., FTOTINC, INCTOT, or FTOTVAL). We are using the series of variables with the “INC” stem to generate our own definition of market income (because, to the extent possible, we want pre-transfers income and want to exclude income like INCWELFR).

For the 1950s sample, we wanted to use the SLWT because we are using a combination of these “INC” variables. We were aware of the sample line issue in 1950 and attempted to address this by restricting our 1950 sample only to families where the first person in the family was also the “sample line” person (i.e., the survey respondent). However, we did so using a combination of PERNUM and RELATE. We sort the data by PERNUM to identify the first person in the family and then keep only those first persons who were also the householder (i.e., RELATE = 1). Should we use SLREC instead of RELATE in that practice?

We do restrict our analyses to 1 person per family, because we are not able to focus on the household. For the Decennial data, am I understanding correctly that it is acceptable to use the provided household weight (HHWT) to weigh families with the caveat that the weight was not constructed for this type of analysis?

For additional context, here is a summary of the variables we are using:

Years Variables Sources
1940 incwage IPUMS USA
1950 incwage, incbusfm, incother, fbusinc, fwage2 IPUMS USA
1960 incwage, incbusfm, incother IPUMS USA
1970 incwage, incbus, incfarm, incother IPUMS USA
1977-1987 incwage, incbus, incfarm, incaloth, incint, incretir, incdrt IPUMS CPS
1988 incwage, incbus, incfarm, incint, incretir, incother, incdivid, incrent, incalim, incasist, incsurv, incdisa1, incdisa2 IPUMS CPS
1989-1993 incwage, incbus, incfarm, incint, incretir, incother, incdivid, incrent, incalim, incasist, incsurv1 incsurv2, incdisa1, incdisa2 IPUMS CPS

If you are only using CPS ASEC data (i.e. only one sample year), then there is no need to restrict on MONTH; a combination of YEAR, SERIAL, and FAMUNIT for both CPS-ASEC and Decennial Census data will be adequate.

I am not fully understanding how you are retaining families in 1950 where the first person in the family was also the sample line person. Using a combination of PERNUM = 1 and RELATE = 1 will not accomplish this without using SLREC. While RELATE = 1 identifies the household head, anyone in the household (with any value of PERNUM) could have been the sample line person. For additional context, census enumerators in 1950 interviewed household members and entered each person’s information on a line in the enumeration form. Each census page contains information on thirty individuals. Every fifth line on the census page was designated as a sample line, and persons falling on this sample line were asked additional questions (this is where the INC questions appear). Restricting the 1950 sample to households with a sample line household head will require you to drop families where the household head (RELATE = 1, or equivalently in the 1950 census PERNUM = 1) is not the sample-line person (SLREC = 1).

This approach retains all families where the head of the household is the sample line person. These families can be divided into two groups:

  • Primary families (i.e., a group of individuals residing in the same household who are all related to the household head) where the household head is a sample line person.
  • Secondary families (i.e., one or more groups of related individuals residing in the same household who are not related to the household head) where the household head is a sample line person.

Some of these secondary families may contain sample line respondents, but by definition, none of them will include the household head. To address these secondary families, you might try to determine yourself who you would call the family head for each of the secondary families by using contextual variables and retaining only families where the family head is the sample line person. Otherwise, you may choose to drop secondary families in multi-family units as well.

I stated in my previous response that HHWT was primarily constructed for household-level analysis, however HHWT does permit users to generate estimates for families in many of the Decennial Census samples that you reference. In particular, the 1960 & 1970 IPUMS samples are all “flat” samples, meaning that the probability of being included in the IPUMS sample is identical for all households (and by extension, families as well). Since income questions were asked of all respondents 14+ in these years, no additional adjustment to HHWT is necessary in order to use it as a weight for analyses of families. The same can be said of both the 1940 1% and full count files. Even though the 1% file is not a flat sample, HHWT is constructed using a method that allows for representative estimates for families.

However, in the 1950 samples, since income is only asked of sample line persons, additional adjustments must be made. When restricting your analysis to households/families where the household/family head is a sample line person, your subsample is actually a flat sample since each household/family has an equal probability of having its head be a sample line person. Moreover, this probability is equal to the probability of any individual respondent being a sample line person. In the 1% file (due to complicated sampling) this probability is 1-in-330. In the full count file, this probability is 1-in-5. If using the 1950 1% sample, you should therefore create a weight that is equal to 330 for all observations and run your analysis using this weight, restricting your sample to one observation per family/household as mentioned previously. With the full count file, your weight should equal 5. You may also want to compare your estimates between the two samples since there are known issues with reported income in the 1950 preliminary release.

Taking a quick look at your variable list, I’m seeing that you mention FWAGE2 and FBUSINC in 1950. While these variables may be helpful to check your aggregation of family income, you want to make sure not to double count income. For example, if both the household head and another member of the household/family were sample line persons in 1950 and reported wage earnings, you will have INCWAGE data for both respondents. If you then add FWAGE2 on top, you will be double counting the second respondent’s earnings. You mentioned not wanting to use FTOTINC since you wish to exclude transfers from income, but it may still make sense to use this variable for 1950 only (while constructing your own measure for other years) as it accounts for the sample line structure within families.

I hope my explanations are clear. Please let me know if you have any further questions.

1 Like