Table of avg wages at thirty and sixty by industry


I am trying to make a table that shows me the average wage of a 30 year old and the average wage of a 60 year old across industries. I made a variable for wageat30 = hourwage if (age eq 30) and did the same for sixty. When I go to make a table with, row as wageat30 and column as industry, it is too many cells. I just want to make the AVERAGE wage for thirty year olds in the industry and compare that to the average wage for sixty year olds. Any ideas?


You might consider changing industry to your row variable instead of column variable; statistical packages will often allow for larger tables if the additional cells can be displayed vertically. Otherwise, you might consider collapsing industries together to reduce the number of cells in your table. Codelists for industry available through IPUMS USA (e.g., 2013-2017) or the Census Bureau may offer useful guideposts for aggregating codes into meaningful groups.

Thanks, Kari, I will give that a shot! One other quick question: I realized that when I am creating my variables for wage at a certain age (30, 60), they are coming out with the exact same numbers.

Right now I am generating a variable like this:
wageat30 = hourwage
if (age eq 30)
wageat60 = hourwage
if (age eq 60)

Do you have any idea why they might be coming out the same?

Thanks again for your help. I am not a quant scholar and you are saving me!


You don’t need to create a new variable to do this; instead, you can use the “Comparison of Means” program in the online tabulator by clicking on the “Means” tab. If you enter HOURWAGE as the dependent variable, the contents of the table will be the mean value of HOURWAGE for the subpopulations of your table (as defined by your row and column variables–age and industry based on your original question). You can limit the size of the table by using the “Selection Filter(s)” field to only include persons of a certain age (e.g., entering “age(30,60)”).

I do want to share a few other comments on using the tabulator and CPS data.

First, for analyses using HOURWAGE, you should weight with EARNWT. Additionally, you may want to apply the selection filter to exclude missing/out of universe cases for HOURWAGE (e.g., entering “hourwage(0-998)”). Finally, values for HOURWAGE are not adjusted for inflation; if you are pooling multiple years of data, you should use Consumer Price Index adjustment factors.

Another reason to consider using a single year of data is to avoid breaks in the industry coding scheme. Modifications to how industry is classified and coded over time limit the comparability of the variable IND across years of the CPS; more information on changes to industry classifications over time are available from the Census Bureau.

Finally, I would encourage you to look at the unweighted counts in your table cells; some of them may end up quite small, which increases the sampling error around that estimate and limits the ability to interpret the data. A common way to get around this is to pool multiple years of data; however, there are other approaches too. As I noted in my earlier response, you might consider collapsing industries into fewer categories by using the recoding functionality of the SDA tabulator. In addition to reducing the size of your table, it will also increase the unweighted count in cells. You might also consider looking at slightly wider age ranges rather than exactly 30 and 60 years old to address small cell sizes.