State-level self-employment analysis

When cross-checking my data with BLS’ CPS national self-employment data, my numbers line up pretty closely. National self-employment data found on page 12 of 16 at:

Unfortunately, when I try to crosscheck my state-level numbers with ACS data, the only available data at the state level (ACS table K202402 provides yearly unincorporated self-employment averages), my annual average counts for unincorporated SE workers are off by a much greater proportion.

Could this be due to an ACS self-employment definition being different than that of CPS? Or maybe the ACS data is seasonally adjusted compared to my numbers which are NSA?

My 12-month pool of data has a sample size of 711, so I gather that it is large enough. Here’s a sample of my R code:

#Gathering Yearly Avg of Unincorporated Self-Employment Count in GA
GASelfEmployment2019_Avg ← GAyear2019%>%
filter(empstat >= 10, classwkr == 13, month >= 1,
month <= 12, wkstat >= 11, wkstat <= 41)%>%
summarize(unincoporatedSEworkers = sum(wtfinl))%>%
mutate(SE2019avg = unincoporatedSEworkers/12)

Also, if I’m looking to find a disaggregated self-employment rate by quarter (merging a quarterly self-employment count w/ a quarterly all-employment count), would I be on the right track with this? See R code below:

#Gathering disaggregated counts for unincorporated self employed workers for 2017 Q1
GASelfEmployment2017_Q1 ← GAyear2017%>%
filter(month >= 1, month <= 3,
empstat >= 10, classwkr == 13,
wkstat >= 11, wkstat <= 41)%>%
group_by(new_race = haven::as_factor(new_race), sex = haven::as_factor(sex))%>%
summarize(unincorporatedSEworkers2017Q1 = sum(wtfinl))

#Gathering disaggregated counts for ALL employed workers for 2017 Q1
GAALLEmployed2017_Q1 ← GAyear2017%>%
filter(month >= 1, month <= 3,
empstat >= 10, wkstat >= 11, wkstat <= 41)%>%
group_by(new_race = haven::as_factor(new_race), sex = haven::as_factor(sex))%>%
summarize(AllEmployedWorkers2017Q1 = sum(wtfinl))

#Merging created data sets of self-employed and All workers to calculate
#self-employment rate for 2017 Q1
SErateStat2017_Q1 ← full_join(GASelfEmployment2017_Q1,GAALLEmployed2017_Q1)%>%
mutate(SErate = unincorporatedSEworkers2017Q1/AllEmployedWorkers2017Q1)

Based on the information you shared, I have a few ideas that may be helpful. Please follow up with further questions.

Regarding the annual estimates, there are some universe differences you should consider. It looks like the ACS table you are referencing has a universe of the civilian employed population aged 16 years and over (though I couldn’t find this specific table prior to 2014) whereas the BLS Monthly Labor Review reports further stipulates in nonagricultural industries. I suspect this is the biggest driver of the discrepancy, but will note that ACS data would also include people living in group quarters (the civilian restriction in the table should adequately capture the target population differences between the CPS and ACS for persons in the Armed Forces). Differences in data collection methods may also be a factor here; ACS data are self-reported whereas CPS is an interview and has a more extensive series of questions related to work.

I am including a quick cross-tab from the online tabulator using the 1-year ACS PUMS data from IPUMS USA. This is year by self-employment (I listed both not incorporated and incorporated) with the following sample population restrictions: persons aged 16 and above, with no military service (a bit aggressive as it omits veterans too), not in group quarters, and with IND1990 values between 40 and 932 (omitting agriculture and military). I am only sharing in case it is helpful to see these numbers in comparison to yours.

Regarding quarterly estimates, it looks to me like you are restricting to unincorporated self-employed workers who were at work in January-March, and generating a numerator based on values of WKSTAT. This seems reasonable to me; I will note that for point estimates you would want to divide the weights by 3 to reflect the pooling (however this won’t affect the proportion). I noticed that in the quarterly rates and the annual counts code you are omitting persons who were not at work but are usually part-time (WKSTAT == 42) but including persons who were not at work but are usually full-time (WKSTAT == 13).

Thanks Kari! Your response has been incredibly helpful in highlighting the differences between ACS and CPS data gathering methodology. I made the corrections in my filtering to exclude absent workers for full-time and part-time. And in regards to cross-checking my data pulls, I’ll just did that with national figures and trust that when filtering at the state level, the rest of the code will generate reliable numbers. Thanks again.

1 Like