I’m using the 2021 5-year ACS sample data. I am looking at the state of independent contractors in the largest cities in the US. I’m using the variable CLASSWKRD==13 as definition of independent contractor (self employed unincorporated). I did my analysis for 2009 and 2019 before, but I have now updated to 2011 and 2021. I am using the exact same code in R. It seems, however, that for a few cities the number of independent contractors has dropped over 80-90% from 2019 to 2021. This is for Charlotte, Columbus, Dallas, Houston, Philadelphia, San Antonio, Seattle. I use the PWCOUNTY variable for the geographic areas (which means they won’t be exact city boundaries but the CITY variable is not available for the years nor cities I need.
Since I’m using the same code, I’m trying to understand why there is this drop. Is it because of Covid? Is it because of some adjustment made due to Covid? Or has any geographic codes or other definitions changed in the 2021 version?
I’m not exactly sure which counties you’re using for your city boundaries with PWCOUNTY, but I tried replicating your finding using Franklin County, Ohio and Dallas County, Texas and did not find the significant decrease that you mention. Specifically, I’m finding 34,930 self-employed not incorporated workers in Franklin in 2019 and 33,175 in 2021. For Dallas, this value is 91,510 for 2019 and 93,120 for 2021. You can see my results in Stata for Dallas below. The 5-year ACS file provides an estimate of an average value over the 5-year period rather than estimates for each individual year. Therefore, I first multiply the weights (i.e. PERWT) by 5.
Note that neither of these counties are perfect boundaries for their respective city. You might therefore generate estimates for their respective metropolitan area using the IPUMS variable MET2013, given that the sum of errors for both of these areas are relatively small. While the microdata does not allow for identification of many major cities that are intersected by PUMAs, official census tables are available for such aggregated data both through IPUMS NHGIS and the Census website (e.g. Columbus 2021). Note that the census table estimates include unpaid family workers and only respondents who are currently employed (the universe for CLASSWKR is Persons age 16+ who had worked within the past 5 years ). I hope this helps you figure out what’s happening with your data.
Thank you so much for looking into this, it is very helpful.
I am a little bit confused about one thing, why would I multiply the PERWT by 5 if I only want the figures for the 2021 year? I understand this would be the average of the 5 years, but I still only want it respective to whatever the representation of people is in 2021.
The 5-year ACS file is a combination of five 1-year ACS files where the weights have been divided by 5 (see the PUMS 5-Year Accuracy of the Data report). If you’re looking to generate estimates for only a single year using the 5-year file, you will need to undo this process by multiplying the weights by 5. Alternatively, you can simply use the 2021 1-year ACS file.
The reason is that weights are specific to the sample and you are changing the sample by restricting to 1 year. A helpful way to think about it might be: if you take 1 year of the 5 year ACS, you are taking the sample down to 1/5th of the total-- your weights will then add-up to 1/5th of the total that you want, makes sense from that perspective! For another good example to understand this issue (from the flip-side of the same coin as yours; adding samples together) there’s a similar post using the “Form 1” and “Form 2” in the 1970 ACS. It is a good idea to pool the 1970 form 1 metro sample and 1970 form 2 metro sample together
Thanks for responses. So, I don’t quite understand Chris when you say “1/5th of the total that you want”. I don’t want the total 5 year, but the average per year for this 5-year period. Would I not then want to take the average of the weights? If I multiply by 5 then wouldn’t this over represent how many there are per year on average with this figure?
So, I don’t quite understand Chris when you say “1/5th of the total that you want”. I don’t want the total 5 year, but the average per year for this 5-year period.
The most important thing to know here is that the ACS 5-year estimate for (say) self-employed-not incorporated people working in Dallas is not the sum of the estimates of the number of those people for all 5 years. This goes back to Ivan’s original reply:
The 5-year ACS file provides an estimate of an average value over the 5-year period rather than estimates for each individual year.
Now the number of these workers using the full 5-year ACS was 89,526, which we get from adding up the weights for that group. If you get rid of 4 of the 5 years of the 5-year sample, you are reducing the sample by approx. 4/5ths, so the weights only add-up to approx.1/5th of the full sample estimate-- going back to where I said in my reply: “weights are specific to the sample”. It’s exactly what I was referring-to when I said you’ll get 1/5th of the number you want, so to make these weights representative of a specific year in the sub-sample you want, you need to go back and multiply by 5 so that the weights (approximately) add-up again.
Here is some R code (using
tidyverse libraries) which replicates @Ivan_Strahof 's Stata results, and shows that your single-year estimates will agree with the 5-year sample if you multiply the weights by 5:
This shows that if you don’t multiply by 5, the estimates will falsely be 1/5th of the ballpark of the ACS 5-year estimate: