Hello, I am a doctoral student seeking to understand the effect of disability on educational attainment via a sibling design study. I want to use the full-count 1910 and 1940 censuses. For the 1910 Census, what would be the most appropriate household variable that would then allow me to identify siblings within the same household?
Secondly, I have a series of covariates pertaining to 1910 but only one outcome variable (plus the HIK variable) in 1940. Do I extract the 1910 Census data separately from the 1940 one and unlink the ‘linked census’ button? Or is there a way to link them such that I only choose certain covariates in 1910 and the two for 1940?
Is there a way to identify the occ score pertaining to the father and/or mother for each sibling pair?
You can match siblings in each household by using the IPUMS constructed family interrelationship variables MOMLOC and POPLOC. Households are uniquely identified within each Census year with a SERIAL number; since SERIAL values are reused across years, they must be used together with the variable SAMPLE to identify households when pooling multiple years of data. Within each unique combination of SERIAL and SAMPLE, individuals with the same values for MOMLOC and/or POPLOC are inferred by IPUMS as being siblings.
The original Census data only indicates the relationship of each household member to the household head without providing information on the relationships between individuals who are not the household head. As a result, IPUMS creates family interrelationship variables to simplify the process of linking family members for users. You can read about our linking method in detail on the Family Interrelationships User Guide.These variables include MOMLOC, which reports the person number (PERNUM) of the individual’s probable mother, and POPLOC, with the same for the individual’s probable father. This type of linking is only possible if the individual’s parent resides in the same household and cannot identify parents who are not recorded in the household roster. In cases where there is no mother/father to link to, our algorithm assigns a value of 0 for the corresponding variable. While there is no age limit for an individual to be linked to their co-resident parents (i.e., adult siblings living with their parents will still be identified), this method works best for matching younger siblings since it misses siblings who live together without their parents or whose parents have died. MOMLOC/POPLOC includes social relationships (such as stepfather and adoptive father) as well as biological relationships; STEPMOM/STEPPOP can be used to additionally differentiate between these cases. The rule used for a link between a parent and a child is provided in MOMRULE_HIST and POPRULE_HIST.
The attach characteristics tool allows users to add variables pertaining to an individual’s mother/father (as well as spouse and household head) as an additional variable on the individual’s record. For example, you can use the tool to add the variables OCCSCORE_MOM and OCCSCORE_POP, which will provide the OCCSCORE value for each person’s mother and father as identified in MOMLOC/POPLOC. This option is available on the Extract Options page after clicking View on your data cart and proceeding to create your extract. Only variables that have already been added to your data cart will appear as options in the attach characteristics tool. OCCSCORE_MOM/OCCSCORE_POP will be missing if there is no corresponding parent record to link to.
Currently, only 10-year links are available through the IPUMS USA extract system. Based on your description of wanting to link 1910 to 1940 directly, I recommend using the 30-year links through the MLP version 1.2 crosswalks. These crosswalks link HISTIDs (a consistent individual-level identifier) across full count samples by assigning linked records the same HIK. This allows you to sequentially merge each separate 1910 and 1940 data extract with the variables that you requested onto the crosswalk to obtain your linked data. For this method, do not utilize the “link census data” feature on the website. Please see this user guide for further details and sample code.
Will I get the same result if I identify siblings as those whose HISTID_POP and HISTID_MOM in an earlier census year are the same, and further use STEPMOM/STEPPOP to identify “pure” biological siblings? Thanks!
As I understand, you are trying to use the IPUMS Multigenerational Longitudinal Panel (MLP) to link adults in one census to a previous census record when they resided with their parents in order to identify their co-resident siblings. This method seems reasonable, but requires careful consideration. In cases where all members of a group of siblings resided in their parental household in a single census year, the linking method will produce only a subsample of the siblings from the pooling method that I described in the post above. This is because the MLP is unable to link all persons across all of their census records (see the MLP data description for more information). However, for cases where some siblings are linked back to their parental household in one year while others are linked back to the household in a different year, the MLP can help identify additional groups of siblings that pooling alone cannot.
Note that HISTIDdoes not uniquely identify people across the full count samples. It instead uniquely identifies each person record such that a person who appears across multiple full count samples will have a different HISTID value for each of their records. Aside from identifying each person record, HISTID values for each record remain consistent across releases of the IPUMS historical full count data, allowing researchers to track any edits to individual records across different release versions of the data. The HIK (Historical Identification Key) will be more helpful as it tracks individuals across full count samples by matching HISTID values of persons linked using the IPUMS Multigenerational Longitudinal Panel (MLP) algorithm. It is possible to use the attach characteristics option to add HIK_MOM/POP to your extract.
As I mentioned, there are two broad cases of sibling links to consider: (a) those where all siblings are linked back to their parental household in a single census year, and (b) those where some siblings are linked back to their parental household in one year while others are linked back to the household in a different year (or not at all):
The first type can be identified using the pooling method described in the previous post. A linked census sample containing the same years as a pooled sample will identify a smaller number of the same siblings as the pooled sample. This is due to the fact that when requesting linked census data, users are provided with the option to either “include only those persons linked across censuses” or to “include all persons in the household of persons linked across censuses”. In both cases, if no one in a household is linked across the selected census samples they will be dropped from the extract. This filtering is not applied when requesting unlinked samples.
The MLP can still be helpful if you have a specific target year in which you want to identify siblings or if you want to track siblings over time. For example, if you are looking for all siblings in 1940, you can first identify all siblings who lived with their parents in that year using a single unlinked sample of the 1940 data. Then, you can use a linked census sample to add to those the siblings who can be linked from previous years. When you request a linked census sample (e.g., 1930 and 1940), the data will include two person records for each individual, one for 1930 and one for 1940. Each person’s HIK is automatically provided in the extract and can be used to group pairs of linked person records. By obtaining the linked individual’s 1930 household SERIAL number and MOMLOC/POPLOC values, you can then search for matching combinations of these values across your sample of linked 1940 individuals. This can also be done with a sample of multiple linked decennial censuses (e.g., 1920 and 1940 with 1930 and 1940), as long as all of the siblings in each group link back to their parental household in the same decennial census sample (whether 1920 or 1930 in this example).
If some siblings link back to their parental household only in 1930 while others only link to 1920 (or do not link at all), then the MLP can help locate additional groups of siblings that pooling alone cannot by relying on parental links in HIK_MOM/POP. For example, suppose that you have siblings with a relatively large age gap where one of them is 8 years old in 1940 and the other is 18 years old. Additionally, the 18 year old is living elsewhere and not enumerated with their parents or siblings in 1940. In this case, the sibling pair will not be easily matched since the younger sibling will not be linked with the older sibling to the parental household in 1930. However, with an extract of multiple linked decennial censuses that has HIK_MOM/POP included, you can after grouping linked person records and obtaining any HIK_MOM/POP values, identify matching instances of these values across decennial census samples as siblings. Note that HIK_MOM/POP will only be available if the mother/father is in the household and the parent links to at least one other decennial sample (any of the 1850-1950 samples regardless of whether they are in your data cart) since those who do not link are not assigned a HIK value.
When using linked data to identify siblings, it will be important to differentiate between matches that are the same linked person from those who are their siblings (the individual HIK values should help). Finally, all linking methods are affected by the representativeness of who is linked by the MLP. See the MLP data description page section on representativeness for more details.