IPUMS “rectangularized” data is organized so that household information is added to the end of each person’s row within that household. It is possible to download the data in a “Hierarchical” format, where households are on their own line and individuals are listed beneath the household they are a part of. For your purposes I would recommend using the default, rectangularized data, as this provides an ease of flexibility. For example, you can create a new variable that is coded as 1 if the individual is an “other non-relative”. You could then generate another new variable that counts the number of “other non-relatives” within a household and assigns that value to each member of the household, sorted by YEAR and SERIAL (and DATANUM if using multiple samples from a single year). Then, because all of the household information, including the new variable about number of other non-relatives, is stored on each household member, you can select just one person to represent the household in a household level analysis. The STATA code would look something like this:
. keep if gq==1
. gen roomer = 1 if relate==12
. sort year serial
. egen nroomer = total(roomer==1), by(year datanum serial)
. keep if pernum==1
The first line (keep if gq==1) drops group quarters from the data set, so that only households remain. After keeping only the first person from every household you can then apply the household weights (HHWT) to further analyses.
I think your choice of OCC1950 is appropriate for determining the working status of children, however there are a few codes within the range you defined as “Working” that may actually be within the household, such as “100: Farmers (owners and tenants)” and “830: Farm laborers, unpaid family workers”, who are probably working on the family farm. Also, though it doesn’t look like it will effect your analysis since you are looking at children ages 10-15, it is important to note the universe changes for OCC1950 over the period you are interested in. I hope this helps.