Replicating Gross Flow Lab Exercise 6 from CPS 2018 SUMMER DATA WORKSHOP


I’m unable to replicate the Gross Flow answers from the 2018 summer workshop. I’m using the actual CPS extracts for November and December 2017 along with the code provided in the workshop (grossflow and validate_long.txt).

For example, here are the numbers I generated for Table 1 (exercise 3)
Design df = 142,669

      |               empsimpT2               

empsimpT1 | employed unemploy NILF Total
employed | 69317573 738841 2192463 72248876
unemploy | 658804 1443203 818377 2920385
NILF | 1740520 680379 41952039 44372938
Total | 71716897 2862424 44962879 119542199

Key: weighted count

I’m using lnkfw1mwt instead of panlwt for the weight. However, I don’t believe that the different weight is the major reason for why my results are so different from the answers in the answer key. Any insight as to why my numbers are so far off?


I just replicated this exercise, a couple of my notes:

  1. I wasn’t able to exactly replicate the exercise numbers, but I got pretty close. This was using PANLWT. It’s not totally clear what’s causing the discrepancy, but it is likely that an old version of the linking validation code was used to produce the answer key. See below for screenshots.


  1. Using LNKFW1MWT and time==2 (Dec), my numbers were the same as yours. Note that this is not the proper use of this weight. This weight should be used for observations in the 1st month, designating cases that link to the following month. But I also tried this with time==1 and got very different results, which I don’t have a good explanation for.

Overall the main issue is the use of weights. PANLWT is used when analyzing people who link to the previous month, while LNKFW1MWT is used when analyzing people who link to the subsequent month.

Hi Matthew.

Thank you for your reply.

It turns out that I figured out your second point when working with the April 2020 data. Am I correct that April’s value LNKFW1MWT won’t become available until the May CPS is released (with the CPSIDPs included)?

Also, when used correctly, is there a significant difference between LNKFW1MWT and PANLWT?

Again, thanks for getting back with me.



You’re correct that LNKFW1MWT will not be available until the next month’s data is released.

Regarding your second question, a priori I wouldn’t expect a large difference between estimates using PANLWT and LNKFW1MWT for most variables. I was surprised to see such large differences when I replicated the gross flows exercise. I will look into this a bit more and follow up.

Hi Mathew.

Thanks for getting back with me.

I decided to run an experiment to compare lnkfw1mwt (subsamp 1) with panlwt (subsamp2) using the code from Lab Exercise 6. I found a very large discrepancy (see subpop pop). Here is a summary: