Code for cleaning original NSF NSCG Data

I recently downloaded the NSCG 2015 and 2017 files from NSF, and noticed that they are somewhat different than the IPUMS versions of previous NSCG data. Is there some prewritten code (ideally in Stata) for making these data sets comparable quickly? Also, when will the NSCG 2015 and 2017 data be released on IPUMS? Thanks.

Unfortunately, the grant that funded the existing IPUMS Higher Ed project is no longer operational. Therefore, at this time we do not have plans to integrate the 2015 and 2017 NSCG samples into IPUMS Higher Ed. Also, the IPUMS harmonization process is not executed in Stata or any other statistical software and so we do not have any pre-written code that harmonizes these data.

Hello, I have a similar question regarding this. In the original NSCG dataset there are many variables that are present across the every year of the survey but are not present in the IPUMS extracts. For instance, variables related to location of education (e.g. MRRGNX, does not show for any of the years when only selecting NSCG sample. Is this intended? Is there anyway to create an extract only with NSCG data with all the variables?

While IPUMS Higher Ed provides data from three surveys — the National Survey of College Graduates (NSCG), the National Survey of Recent College Graduates (NSRCG), and the Survey of Doctorate Recipients (SDR) — we only release the SESTAT (Scientists and Engineers Statistical Data System) subsamples of the NSCG and the NSRCG. This subsample consists of fewer respondents, including only those with science or engineering degrees or occupations, and also includes fewer variables. For example, MRRGN is available only as a restricted use variable in the NSCG SESTAT sample.

You can merge respondents from an IPUMS Higher Ed data extract with the full NSCG file containing MRRGN and any other variables of interest (see the public use files page) using REFID.

1 Like

Thank you for the clarification Ivan!