Weights for linking CPS basic monthly data



I´m trying to link basic monthly CPS data across two consecutive months. I´m trying to create gross flows from 1976 to 2018 by age and sex. I´d like to know which weight should I use and how do you recommend to use it?

I see Shimer(2012) has a code that does this. He uses WTFINL, and applies this weight by adding the weights across to month and divinding by two:

gen weight = (fweight(month1)+fweight(month2)/2
egen double flows = sum(weight), by(lfs2)
replace flows = flows/100

Should I divide the weight by 100? Or any further adjustment?

I see the universe of WTFINL is 1976-1988 all persons, and 1989-2018 civilians. Also, I see that this is a cross-sectional weight, and I´m creating a time series. So, Can I use this weight from 1976 to 2018 and get representative estimates of the population? How does the change in the universe (from both samples 76-88 and 89-18) would affect my results/representativeness? Can I use to create estimates for certain groups of the population, say young people?

If I should use some of the longitudinal weights that CPS provides, which one do you recommend? If I want to estimate flows by different ages and sex, will these weights be represenative?

I see LNKFW1MWT is a weight I could use from 1976 to 2018, though it has some breaks in the series, which means those years I won´t be able to use my data?

In the other hand, I have PANLWT from 1994 to 2018, which seems to serve the purpose of analyzing panel data.

Finally, If I use any of the longitudinal weights to link people across two consecutive months, should I also add the weights across two months and divide them by two?

Thanks in advance for the answer!


From what you are explaining here, it sounds like you should use the LNKFW1MWT sampling weight. This variable is designed to specifically account for situations when users are linking together adjacent basic monthly samples. Although you are correct that the LNKFW1MWT variable is not available in some years, it is only available in samples that can be linked to the month after using CPSIDP. See here for details on these samples.


Hi Jeff, thanks a lot for your reply.
Regarding my other questions, is this wrong if I use WTFINL as well? In which way using the wrong weight would bias my results?

And how should I use LNKFW1MWT between two months? Sould I add the weight of two months and divided by 2? What about dividing that by 100 as Shimer does? Is that necessary, or any further adjustment?

Thank you!


If you use WTFINL and do not adjust the value for the fact that you are (a) pooling samples together and (b) using the CPS data as a panel rather than a cross-section, the weights will inflate any population count to be larger than the actual US population size. The method you discuss above (i.e., adding the weights across months and dividing by two) is approximately appropriate, but really does not account for the longitudinal features of your data in the way LNKFW1MWT does. Regarding how to use LNKFW1MWT: no additional adjustments are needed. The IPUMS command files automatically make all of the necessary adjustments.


Hi Jeff, thanks a lot!

With respect to your answer. I´m constructing a gross flow, which means the transition of an individual from month1 to month2. I still have to handle the LNKFW1MWT adding the weight across two months and then dividing by two right? Is there any other way to handle the weights?


The specific answer to your question depends on what you intend to calculate. If you are aiming to estimate some sort of population average that is representative of the two months, then dividing the weight by two is a reasonable approach. If you are performing regression analysis, then dividing the weight by two is not necessary. For more detail on weighting in regression analysis, see the attached paper by Solon et al. (2015).

In investigating this question, I came across some helpful resources that you might find helpful. First, the core difference between PANLWT and LNKFW1MWT is that the first comes from Bureau of Labor Statistics and the second is generated by IPUMS. The BLS doesn’t make PANLWT available in the data before 1994. Second, both weights are intended for linking between two months, but PANLWT uses the population controls and weights from the second month and LNKFW1MWT uses these from the first month to calculate the longitudinal weight. Therefore, it is permissible to use LNKFW1MWT in years where PANLWT is not available. More info on PANLWT can be found in Technical Paper 66, p10-14 (85/175 in the pdf).

Solon et al. (2015 JHR).pdf (272.0 KB)


Hi Jeff, thank you for your answer, it was very useful!

I have another question regarding the representation of my sample.

  1. Let´s say I have 65,000 unweighted observations in my cps file and I want to know the distribution by age in the population, is this way to obtain that number ok?

svyset LNKFW1MWT
tabulate age [fweight= LNKFW1MWT]

  1. And, if in order to check whether I have worked out my data correctly, I want to compare the statistics I get with, let´s say Census data, can I do that? For example, if using this weight I get that I have 1,020,000 young persons aged 19-20 years old. Can I compare it with the number of 19-20 years old from a Census population table? If not, how could I check I get the right statistics?

Thanks a lot!


After talking more about this with some folks around the office, the consensus is not to use LNKFW1MWT when PANLWT is not available. The reason is PANLWT essentially adjusts time 2 weights whereas LNKFW1MWT uses time 1 weights. So, these sampling weight values will be close but not exactly the same. Therefore, the “best” way forward is to construct a sampling weight using the methods used to create PANLWT for the samples before 1994. Ultimately, we’d like to make the PANLWT variable available for years prior to 1994, but at this time I am not able to forecast when this will become available via IPUMS CPS. In the meantime, we do have some resources available that will help you create this weight yourself. The attached paper has much more information about gross flow analysis using CPS data. Additionally, our longitudinal weights page has some starter code that can be helpful as your create this weight.


Hi Jeff, thanks again for your answer!

I wanted to ask you though if you could please clarify a bit more. I found the answer a bit confusing, in relation to what you have answered before.

You recommended on your answer (April 18th) using the LNKFW1MWT weight for longitudinal series such as the one I ´m constructing. So I´m confuse when to use LNKFW1MWT and when PANLWT?



Sorry for the confusion. After hearing more about your intended analysis and talking with some folks around the office, our best advice is to use the PANWT variable when performing gross flow analysis. Because the PANWT variable is not available prior to 1994, you’ll need to create this variable yourself in the pre-1994 samples. The resources provided in the previous post should be helpful. The LNKFW!MWT weight, on the other hand, is not specifically designed for gross flow analysis. The key difference between these weights is PANLWT uses time period 2 sampling weights whereas LNKFW1MWT uses time period 1 weights. So, although the values of these weights will be similar they will not be exactly the same.


Hi Jeff,

thanks a lot! Now I understand.

I have one more question regarding the use of the weight PANWT. As you know, I ´m constructing gross flows between employment states in two consecutive months. So, the question is this:

In order to get how many people transition from employment (month1) to inactivity (month2), is the right approach to use the average weight of the first and second month? or should I use the weight of the second month?



Based on the discussion in the attached paper, I think that you do not need to divide the PANWT values when performing gross flow analysis. I’d encourage you to read through the attached paper as it seems like it will be very helpful for your work.

HarleyJFrazisEdwinLRobiso.pdf (595.2 KB)