Analysis approach to find associations between disease outcomes within specific demographic groups

Hi all,

Please excuse my naivety as I am not sure how to best formulate my question.

Here is a toy problem demonstrating what I would like to accomplish:
Imagine we have four groups, Male and female adults without hypertension, and male and female adults with hypertension.

Is there an analysis that can, as I see it, (1) sample from the “without” groups to create “matching cohorts” on all variables of interest but hypertension, to (2) estimate associations between various other variables (some acting as covariates of no interest) and the emergence of hypertension in the latter group? E.g., something like a survival analysis, but without a time variable, given that IPUMS data are cross-sectional without repeated measures. References to other papers would also be appreciated.


What you are describing is a matching analysis. There are several varieties. Matching on exact combinations of variables as you describe is known as “exact matching”. Other common methods are propensity score matching, full matching, and coarsened exact matching. Exact matching is equivalent to running a linear regression using “hypertension” as the outcome variable and dummy variables for every combination of covariates, as explained in this Stata Blog post, however this requires all covariates to be discrete and generally would require a very large sample size in order to ensure a sufficient number of matches. The other matching methods are more flexible in what counts as a match, and thus have less stringent data requirements. Other approaches to this type of analysis are logistic, probit, and linear probability models, all of which are linear index models. All of these methods have plenty of resources freely available online. I hope that helps.