We have encountered some issues with the 2022 5-year ACS data for New York City. Each year, we use the ACS data to calculate the percentages of educational categories, races, and gender for the working population in NYC. However, this year, we have noticed some alarming patterns that differ significantly from previous years.
For example, for the occupation 11-3111 Compensation and Benefits Manager, the data indicates that individuals with a high school diploma have the highest percentage. In previous ACS data, this category was dominated by individuals with a bachelor’s degree. Additionally, the female/male percentage for this occupation in NYC is reported as 1. Upon checking the count, there are no male respondents for this occupation. We’ve seen such wired patterns for a bunch of occupations in NYC.
We have used the same R code to access the same variables each year, and this is the first time we have noticed such discrepancies. Could there be an issue with the 2022 5-year ACS data? Any insights or guidance would be greatly appreciated.
I downloaded an IPUMS USA extract that includes the 2022 5-year ACS and 2021 5-year ACS. I tabulated the unweighted frequencies of SEX and EDUC among those with OCCSOC==113111 and MET2013=35620 for each sample. The tables are below (the first table shows the 2022 5-year ACS and the second table shows the 2021 5-year ACS). I see some changes between the 2021 5-year ACS and 2022 5-year ACS. However, I do not see the patterns you describe (mostly high-school educated persons or 100% female).
I would be very cautious about drawing conclusions about the demographic (sex, race, or education) makeup of workers in this occupation in New York City or changes in the demographic makeup over time with sample sizes this small. As your sample size grows very small, as it is in this case, your standard errors become very large, and your estimates will not necessarily be representative.
Also, keep in mind that overlapping 5-year ACS samples share up to four overlapping years of data. For example, the 2022 5-year ACS and the 2021 5-year ACS both include data from 2018, 2019, 2020, and 2021. You may find comparing 1-year ACS samples to one another more informative than comparing adjacent 5-year samples, whose differences will depend almost solely on differences between the earliest year of the earlier sample (2017 in this example) and the latest year of the later sample (2022 in this example).
Thank you for your response! It seems you used the MET2013 variable for the 2022 5-year data, which corresponds to the NYC metro area and includes more than just the five counties in NYC. For our analysis, we focus on the working population within the five counties NYC, so we typically filter by PWSTATE2 (=36) and PWCOUNTY (5, 47, 61, 81, 85).
When I checked the unweighted count by sex for OCCSOC=113111 using PWSTATE2 and PWCOUNTY, the count was 22, and all were female. Additionally, I checked the unweighted counts by education level, and below is the distribution:
- High School Diploma or Equivalent: 5
- Some College: 2
- Bachelor’s Degree: 7
- Master’s Degree: 6
- Doctoral/Professional Degree: 2
When I applied PERWT to calculate the weighted count, the results were:
- High School Diploma or Equivalent: 142
- Some College: 18
- Bachelor’s Degree: 130
- Master’s Degree: 98
- Doctoral/Professional Degree: 43
Could you confirm if you obtain the same results using PWSTATE2 and PWCOUNTY? Also, do these results seem typical to you (female/male ratio 1?)? According to other public BLS data and previous ACS data, a Bachelor’s degree should be the most prominent category for 113111.
I created the same tabulations but defined NYC as PWSTATE2=36 & PWCOUNTY=(5 or 47 or 61 or 81 or 85). In the 2022 5-year ACS sample, I see 28 persons meeting those criteria, of which 26 are female. In terms of education, 5 persons have completed grade 12, 2 have completed 1 year of college, 1 has completed 2 years of college, 10 have completed 4 years of college, and 10 have completed 5+ years of college. The sex and education composition of these workers is similar in the 2021 5-year ACS.
However, a sample size of 21 or 28 cases is simply too small to use to draw meaningful conclusions about a population. In other words, I would not feel confident making any claims about the sex composition or educational attainment of these workers in New York City based on these sample sizes. In general, there is no bright-line rule regarding “too small to study” but double-digit sample sizes are usually regarded by statisticians as too small. In practice, what will happen is the sampling error around estimated statistics will be relatively large and will, therefore, limit any informative interpretation from the data.
One way to increase the sample size of your estimates is to pool data from multiple geographic areas together; perhaps you could study workers in all of New York state or multiple metro areas or cities. You could also pool multiple occupation codes together; there may be similar occupations that you could analyze as a larger group. Alternatively, if you are only interested in this specific occupation in New York City, you could seek out administrative data sources that include data on a very large number of employees or taxpayers.