How to set up a "time" variable for a Cox PH model to examine correlations to a diagnosis

Hi all – thanks so much for the great forum and data. Novice question here.

I am trying to recreate the analyses shown here: Heterogeneity among women with stroke: health, demographic and healthcare utilization differentials - PMC


Second, the correlation between healthcare utilization, demographic/lifestyle traits and health status and prevalence of stroke was calculated for the young and old cohorts of women using Cox proportional hazards regression on a match cohort of young and old women with and without stroke. Matching was performed using propensity scores derived from age, race, region of residence, education, household size, and marital status with replacement.

I am seeking to do a very similar analysis with a different subsetted population. I’ve performed the propensity score matching in R using the MatchIt package, e.g.,

Hisp.Matched <- matchit(Match~AGE + Black + REGION + FAMSIZE + Married, data=Hisp_Match2, method = 'nearest', ratio = 1, replace = TRUE)

With Hisp_Match2 being the subsetted data free of NAs/β€œMatch” as the column separating the individuals WITH the diagnosis from those WITHOUT the diagnosis. Unsure of how to proceed without a time variable, though.

Thanks for any and all help!

The NHIS data used in that paper are cross-sectional, so there are no repeated observations of the same individual over time. I’m not familiar with the use of duration models like the Cox model without repeated measures on the same individuals. The authors may be creating a synthetic panel by looking at people in the dataset born in a particular year. I suggest reaching out to the authors for more clarification.

1 Like

The insight is welcomed! That was on my mind to do at this point – thanks for the suggestion.