I adopted a two-stage sampling strategy to conduct a study on patient medical expenses. Instead of going to the households, We decided in the first stage to select health institutions and in the second stage 50 patients in each institution selected in the first stage of sampling. We can evaluate the inclusion probability for the primary sampling units, but we have doubts to do for the patients. Knowing that on the day of the team’s visit, the number of patients attending the institution may be less than the number of patients to be investigated per institution (50 patients); At this time a second day of visit is necessary to complete the number of patients. We do not know if each patient corresponds to a unique household. In this case, how to determine the inclusion probability of a patient? Knowing that if the institution does not have a lot of attendance on the day of the visit, we survey all the patients present. Otherwise, from the consultation register, we draw the 50 patients using the systematic sampling method.

We also know the number of patients who made their first visit per month and per year (for the year prior to the study) and the number of total visits. We also want to evaluate the sampling weight of the household from the patient. Is it possible to determine the weight,survey for the household? Or we should not analyze this data using the weighting process.

This type of question is outside the scope of the IPUMS user support team. Most surveys that IPUMS harmonizes have a sampling frame that includes the civilian noninstitutionalized population for a given country/year; in the case you outline however, people only enter the sample if they were a patient at one of the health institutions included in your first stage. Households where no one visited a health institution during the survey period will have zero inclusion probability. You might therefore use weights to make interferences that are representative of the national patient population (rather than the total national population). You should consult the current methods being used by others in the field to adjust for this. Someone else on the forum might be able to provide more guidance as well.

If you are interested in using US health data that is nationally representative, IPUMS MEPS provides longitudinal data from 1996 to the present on health status, medical conditions, healthcare utilization, prescribed medicines, and healthcare expenditures.