Share of Foreign Born Workers


Thank you very much for the wonderful databases. I’ve been doing some work with the IPUMS USA and CPS data, using R Studio. I wanted to hear you advice on how to best proceed in calculating the share of foreign born workers. My goal is to look for the share of foreign born workers (CITIZEN = 3) by year, who worked full time (35 hours +) by year and also by NAICS economic sector and industry. Eventually I’d like to do the same with SOC-Occupation, but I’ll hold on to that for now.

I just would like to make sure I understand what variables to use (I’m a bit unsure about which weights to use) and just how to proceed.

Thank you in advance for your consideration.

I suggest looking at the following variables, which are present in both IPUMS USA and IPUMS CPS unless otherwise specified:

IND1990 (CPS)
ASEC industry and occupation variables referring to the previous calendar year (ASEC only)

I’ll refer you to the respective variable description pages for details of each of these variables. The various industry and occupation variables contain either the raw codes (IND and OCC) or harmonized codes using various coding schemes (IND1950, OCC1990, etc.).

For most person-level analyses using IPUMS USA, the proper weight is PERWT. For CPS basic monthly samples, it is WTFINL. For ASEC samples, it is ASECWT.

Please see the video tutorials for how to create and download an extract. Using R, you’ll need to download the R command file that is generated by the extract system, as well as the DDI metadata file, and use these in conjunction with the package ipumsr to load the data into R.

Hi Matthew,

Thank you very much for your response. I have downloaded and explored many of these variables, though I wasn’t aware of the likes of FULLPART (CPS). My question is more about how I can make calculations using ACS or CPS data that are representative of the general population.

To give you an example, suppose I used the ACS data (IPUMS USA 2000-2019) to calculate the share of foreign born workers (CITIZEN = 3) by YEAR (2000-2019), looking only at employed individuals (EMPSTAT =1), AGE (18-65), and who worked full-time (UHRSWORK >= 35). I am aware that the sample density is not the same across these years (i.e. between 2000 and 2019). I believe that the population density from 2005 onwards is 1% (though I’m not entirely sure if that is the case) and between 2000 and 2004 it can linger between 0.13 and 0.42%. My question had to do with how I could properly use the IPUMS weights when calculating the share of foreign born workers by year, and subsequently for other variables like Industry and Sector. Again, ideally I’d like to have calculations be accurate renditions of the general population.

Thanks again for your help!


The weights will account for the differing sample densities. In R the most common way to include weights in your analysis is using the survey package (and its derivative, srvyr). For example this package includes the svymean() function which calculates weighted means. In your case since you’re calculating statistics for a subpopulation, you should use the subset() or svyby() functions to define the subsample.

The proper method for calculating standard errors and confidence intervals in the ACS and ASEC samples is to use replicate weights. You’ll need to add the variable REPWTP (same for USA and CPS) to your extract, and note that this will greatly increase the size of your extract. Example code for using replicate weights in R can be found on this forum thread.

Hi Matthew,

Many thanks for your response and help.