State level time-series analysis using CPS data


I am interested in using monthly CPS data on demographic and economic indicators in a state-level time series analysis. This project will combine data from multiple sources, all with state-level variable at the monthly level (e.g., monthly unemployment, monthly incarceration rate, monthly crime rate). The CPS variables I am interested in are at the person level (age, race, marital status, educational attainment, labor force status, etc). I am only interested in using data from a single state, and would like to compute monthly summary statistics for these variables (e.g., median age in the state, percent of residents that are male in the state, percent of residents that are white in the state, etc.). My plan is to recode the data into binary indicators and average all of the responses for a given month to get a proportion of residents meeting a certain characteristic, I see that the CPS data are supposed to be representative at the state level, but I am curious if I need to weight these data in order to ensure my monthly estimates are not biased.

Thank you in advance for any advice!

Yes, you will need to weight the data. For person-level analysis, the basic weight is WTFINL. This is used for all variables from the basic monthly CPS (also known as “core” variables in the IPUMS CPS variable selection interface). If you are analyzing ASEC (aka March supplement) samples, you will want to use ASECWT. Similarly each supplement to the CPS has its own set of weights, for example EDSUPPWT. You can find the appropriate weights in the variables group page for your supplement of interest (e.g. this page for the education supplement). There are also household-level versions of many of these weights.

Thank you, Matthew. I appreciate your prompt response. I have the WTFINL variable to use, but will likely need to manually weight the CPS variables since there are other covariates in my model that come from different data (so I cannot use pweight in a regression). When manually weighting is it acceptable to manipulate the weight prior to multiplying it by the variable value as long as the same manipulation is done for each weight? For example, the WTFINL ranges from 0 to 15538.19 in my data. Since I have dichotomous indicators, multiplying a 0/1 value by 15538.19 would yield uninterpretable values. Can I divide all WTFINL values by some constant (let’s say the max value of WTFINL) and then manually weight the variables of interest?

Thank you for your time and I appreciate your help on this!

Since you are doing state-level analysis, what you want to do is use the person-level weights when calculating state-level indicators from microdata, for example the incarceration rate. Then merge this indicator with the other state-level data that you have. Then in your regression (which will not use microdata at all) you do not need to weight the observations. However if you are interested in getting a nationally representative estimate (as opposed to an average across states), you should construct new weights equal to the state population.

Thank you for clarifying!