# Why there is a difference in the number of strata and psu from NHIS data and NHIS-IPUMS data?

I’m trying to analyze NHIS nine years data (2008-16). I downloaded merged and appended these data from NHIS website. I also downloaded NHIS-IPUMS data for the same years. I use data to compare the results ( I was trying to create an exercise for my students). The unweighted frequency matches (see the output for sex below). However, when I ran weighted analysis in Stata results were different. The NHIS data gives me 300 strata and 600 PSUs whereas, NHIS-IPUMS gives me 352 strata and 1255 PSUs. The number of observations, population size, design df all are different. I used WTFA weight in NHIS analysis and PERWEIGHT in the NHIS-IPUMS analysis. Also, using obs option also gives me different frequency distribution! I wondering if anyone can understand what I’m I doing wrong here?

NHIS
. tab sex

HHC.110_00. |
000: Sex | Freq. Percent Cum.
------------±----------------------------------
1 Male | 309,782 47.36 47.36
2 Female | 344,327 52.64 100.00
------------±----------------------------------
Total | 654,109 100.00

. svy linearized : tabulate sex, obs percent format(%9.3g) miss
(running tabulate on estimation sample)

Number of strata = 300 Number of obs = 579,934
Number of PSUs = 600 Population size = 1,867,950,988
Design df = 300

## HHC.110_0 | 0.000: | Sex | percentage obs ----------±---------------------- 1 Male | 48.3 274487 2 Female | 51.7 305447 | Total | 100 579934

Key: percentage = cell percentage
obs = number of observations

NHIS IPUMS
`
. tab sex, miss

Sex | Freq. Percent Cum.
------------±----------------------------------

1. 1.Male | 309,782 47.36 47.36
2. 2.Female | 344,327 52.64 100.00
------------±----------------------------------
Total | 654,109 100.00

. svy linearized : tabulate sex, obs percent format(%9.3g)
(running tabulate on estimation sample)

Number of strata = 352 Number of obs = 654,109
Number of PSUs = 1,255 Population size = 2,113,089,258
Design df = 903

Sex | percentage obs
----------±----------------------

1. 1.Mal | 48.3 309782
2. 2.Fem | 51.7 344327
|
Total | 100 654109

Key: percentage = cell percentage
obs = number of observations

I have a note of clarification and an observation. First, note that the IPUMS NHIS variables STRATA and PSU are both constructed by IPUMS and not necessarily exactly the same as the public-use sample design variables released by NCHS. The variables released by NCHS are more accurately understood as pseudo-strata and pseudo-psu variables. The IPUMS derived STRATA and PSU variables are designed so that they can be used when examining data from one year or from many years at one time. So, with this in mind, you may not be doing anything wrong. More information on variance estimation in IPUMS NHIS is available here. Second, I pulled your data extract with samples from 2008 through 2016. This shows a total of 880,195 observations, not 654,190 as the output above suggests. I am not sure what is causing this discrepancy, but it may be something you want to look into.

The sample is 18 years and older