# Counting observations per year, people and households

Hi Folks!
I have an all-ASEC sample and I am trying to just count observations per year, of households and of people, and I am getting some results I don’t understand. The household numbers seem sensible – starting around 25 or 30 thousand in the sixties, gradually rising to about 100 thousand currently. with big jumps around '66, '76, and 2001.

But the number of observations of people is weird – nothing resembling a stable relationship between number of people and number of households. You see an enormous rise from about two people per household to in the early 60s to almost four in 1980, and then there is a very regular pattern of bouncing back and forth by almost a full person in alternate years, but with a slow overall decline until around 2017, when it suddenly jumps up by, like, two full people. These are unweighted observations, so I don’t expect them to accurately reflect the true population ratio, but still, why the alternate-year pattern and the sudden recent jump? I would assume that this is a programming error of mine, except that I calculated it two different ways, by just counting lines and by summing the maximum PERNUM in each household. Is there an alternate-year oversample or something?

I am not exactly sure how you are defining your sample, but if you are just looking at the raw counts of households and persons the values you calculate should match the counts reported on this page of our website. If your extract is rectangular, the easiest way to get the total number of person records per ASEC would be to get a count of rows by year.

My results from counting lines by year and then summing are very different from this, and I do not know why. All the samples I have checked have 25,404,662 lines.

The only thing that I have been able to think of is that maybe it is a product of sample selection. All my samples include the IPUMS-CPS automatically selected default samples, and also three Suppliment Topics: Education, Fertility and Marriage, and Voter.

Does the Latino oversample increase the number of lines (above the numbers given on the web page you reference)?
Does the use of The IPUMS-CPS default samples automatically added from months other than March increase the number of lines?
Does the addition of Supplement Topics to to the sample increase the number of lines?

If not, do you have any other suggestions as to why my results might be different, or diagnostic tests you would recommend?

If the difference is the result of sample choices such as those above, would I be correct in assuming that the the proper response is to make them go away with some merger procedure? None of these other samples involve the addition of any more households or people, correct? Though I suppose there could be persons in sampled households who are not present in either March. Do you have code for such a merger that already available, or would I write it myself based on Drew, Flood and Warren?

Okay, I think I may know what is happening here – although I am not 100% certain. Originally, it sounded like you were only looking at the ASEC samples. However, this follow up question makes it sound like you are also considering the basic monthly samples along with some of the specific monthly samples with supplement topics. Note, that if you are including both the ASEC and basic monthly samples into the same IPUMS extract then summing up the number of observations by year is not going to match the figures listed on the IPUMS CPS sample sizes page since this page correctly separates sample size counts of the ASEC and basic monthly samples.

Let me now address each of your questions one at a time: (1) No the Latino oversample does not change the sample size referenced on the CPS sample sizes page. (2) When you include multiple samples into one data extract, the data will be appended together. So adding sample A with size X to sample B with size Y will result in a data set with X + Y number of observations. (3) The same idea goes for the supplement topics, since these are just additional variables available in specific basic monthly samples. Finally, I do not think this discrepancy is due to any merging that needs to happen. Feel free to email ipums@umn.edu directly with additional detail (specific extract number os samples you are including) about this issue if the issue seems to persist.

My second posting on [this thread] (People, samples and lines - #3 by AHoerner) is intended as a follow-up question for both my first posting there and this posting. I am sorry if there is some redundancy in my questions here and there. It’s because there is also considerable redundancy in my various confusions.