Proper use of WTFINL for longitudinal analysis

raven52 · April 10, 2018, 11:32pm

This is a more complete, detailed version of an earlier post of mine.

I have extracted monthly, county-level data for EMPSTAT and LABFORCE from 2010-2017. I am using CPSIDP to help track an individual’s status from month-to-month, where applicable. I create a dummy variable if that individual went from “not in labor force” to “employed” in adjacent months, and sum these dummy variables by year to produce unweighted labor force flows.

I am wondering how to properly incorporate WTFINL into this procedure. From another post, it was suggested to divide the sample weight in each year by the number of sample years in the panel. Can these adjusted weights then be applied to the dummy variables I mentioned above? (before summing by year, of course).

For example, I can first sum the adjusted weights by month, for a particular year. Next, I can divide each individual WTFINL (for each CPSIDP) by these monthly totals, for the appropriate month. Finally, I can apply this result (again, for each CPSIDP) to the dummy variables of interest, then sum by year. Any insight into if/how this approach can be improved is greatly appreciated.

References:

(1) Is there longitudinal weight available for analyzing person across different monthly files?

(2) https://cps.ipums.org/cps/resources/l… (provided in response to a similar, recent post of mine; this was very helpful, and much appreciated!)

(3) What weight should I use in IPUMS-CPS

JeffBloem · April 11, 2018, 6:37pm

In general, there is no real consensus about the proper incorporation of sampling weights, particularly when pooling together samples as you describe. I am not certain I completely understand what you are doing, and if I am misunderstanding in any way feel free to provide more detail, but I can offer the following discussion.

I am not certain what you mean by using “county-level data” and this could influence how you use sampling weights. You could be extracting specific counties or first aggregating labor force statistics by county or something else entirely. In any case, do note that in most IPUMS CPS samples the COUNTY variable is identifiable for only about 45% of the population. The tricky bit is that when pooling CPS monthly samples, the pooled sample will include multiple observations for some (but not all) households. In principle issues relating to this detail can be avoided by limiting analysis to only those with MISH==1. However, it sounds like you want to exploit the fact that there are repeated observations of households.

If I understand correctly, you are interested in identifying individuals who transition from not in the labor force in one month to being employed in the next month. This being the case, you are limiting your sample to individuals with at least two observations in consecutive months. (Note that this means your sample is limited to individuals with consecutive observations with MISH==1-4 OR MISH 5-8. Since there is an eight month gap between MISH 4 and 5, these individuals do not meet your criteria.) Additionally (unless you are explicitly correcting for this detail) since you are summing by year, the consecutive months are restricted to being within any one year. Said differently, individuals not in the labor force in December and employed in January are likely dropped from your sample. It is these restrictions on your sample that should be accounted for by the sample weight.

Finally, couple suggestions: Perhaps first limit each monthly sample to only individuals with MISH==1-4. Then create your “transition” dummy variables, as you describe, and sum up the number of transitions for each individual (via CPSIDP). Next keep only one observation per CPSIDP and apply the sampling weights (WTFINL). If you happen to be first aggregating the data up to the county level, then you should apply sampling weights within this aggregation. Finally, a reasonable way to check if you’ve applied the sampling weights “correctly” is to calculate the total population size, after you’ve applied the sampling weights. If the population is roughly equivalent to the real population of the US, then you are on the right track.

I also encourage you to look into additional discussion on this topic here and here for more information.

Topic		Replies	Views
I'm using CPS to calculate LFP for a subset of workers. I think I use stata code pweight=wtfinl. CPS	1	801	June 28, 2017
Aggregating across states CPS	1	562	February 6, 2020
State level time-series analysis using CPS data	4	1248	August 1, 2019
Weights for linking CPS basic monthly data CPS	16	3283	April 22, 2020
Merging Jan-Dec IPUMS CPS files: How to set up weights correctly? CPS	3	692	April 24, 2020

Proper use of WTFINL for longitudinal analysis

Related topics