Merging basic monthly CPS files to create gross flows


I´m trying to merge basic montly CPS files from 1994-2014 to create gross flows of workers. I have 3 questions regarding the merging procedure and analysis of the data.

1.My first question is:

I read in Drew et al.(2013) that new codes CPSIDH a CPSIDP would be available at IPUMs to facilitate the merging. However, I can not find this variables. Are they available or do I have to construct them myself based on the paper? I read as well here (…) That to link individuals I should use Identifiers HRHHID, HRSAMPLE, HRSERSUF, HRHHNUM (for files jan 1994-april2004) and HRHHID1 and hrhhid2 (May 2004 -present). And add variables PULINENO, PUSEX and PRTAGE to both samples. I´m confuse about which method to use, what do you recommend?

  1. My second question is:

I read as well that there would be longitudinal weights available for flow data but I cannot find it. are they already available? If not, should I use WTFINL for weightning matched people across months? And if yes, how sould I do it, do I have to add the final weights for each month considered and then divided by the number of months?

  1. My third questions is:

In order to analyze the data using stata programe, I have to svyset my file with the data base. There are clear instructions on the web about how to do it for AISEC data, but there is no information for basic monthly data and I do not know what to put in strata, PSUs, etc. I read the CPS sample design and that is a state based sample, and that the ¨PSUs¨are counties…Does it mean that I have to svyset my data with WTFINL as my weight, state as PSUs, strata none and vce (Taylor linealization)? I also found the variables CLUSTER and STRATA to get the variance…since the way I svyset my data will be important for geting right the variance and confidence intervals (no problem for point estimates). So I wonder how I should svyset my data correctly for working with basic monthly files?

Thank you very much for your answer



The CPS linking variables are currently available for all basic monthly samples. You can find the household linking key CPSID here and the person linking key CPSIDP here. These variables are adequate for linking across basic monthly samples. See this answer for background on the inherent complexity of creating these linking keys and its implications for researchers.

Longitudinal weights are not currently available in the IPUMS-CPS dataset. In the meantime, the NBER basic monthly files do contain longitudinal weights for samples from 1989 to present. Because the NBER and IPUMS-CPS files share a common sort order, it is possible to sequentially merge the raw CPS data files from NBER onto an IPUMS-CPS extract (as long as you have not used the select cases option).

There are no Strata/Cluster variables in the CPS public-use files, because they are censored from the data by the Census Bureau. However, the WTFINL variable does account for both Strata and PSU. Therefore, if you are using svyset in STATA, then the WTFINL variable is sufficient. While IPUMS-CPS does not endorse any specific “best practice” method for generating variance estimates, we recommend that users explore current literature on estimating variance in the Current Population Survey. For example, this article describes a procedure for generating synthetic design variables to use in place of the censored variables.

Hope this helps.



Hi Tim,

thanks so much for your reply, it was very useful. However, I have a question with respect to linking people month to month in the CPS. I´m not so sure how to do it in Stata. Is enoght to do a 1:1 merge with the linking personal variable CPSIDP? Or do I still have to use other variables to identify people? As I wrote merge 1:1 with CPSIDP, Stata replied that it was not possible to do the match. As I added more variables in addition to CPSIDP such as race, sex, etc, Stata answered the same thing, not possible. I m a bit stuck with this and I would appreciate any advice to link monthly CPS files with Stata. I m interested in generating unemployment flows with the information. Thank you very much.



I am unable to replicate your issue. Using your data extracts #28 and #26, the following Stata code successfully merges 1:1.

use cps_00028.dta, clear
sort cpsidp
save cps_00028.dta, replace
use cps_00026.dta, clear
sort cpsidp
merge 1:1 cpsidp using cps_00028.dta

Since this is matching January 2011 to March 2011 (two months apart), we would expect less than 50% of the cases to match. The code above successfully matched over 40% of the cases (61,512 matches). CPSIDP already takes into account sex and age, but the researcher may choose to use additional variables that are expected to remain constant to help determine the validity of the match (e.g. RACE, BPL, etc.).

Hope this helps. If you continue to experience trouble merging your data, please email for assistance.