Hello all,
I am here for the first time. I tried my best to find a hint on my question here on the forum and google.
I would like to use a decennial data since 1970_2018(before COVID) for my analysis aggregated at state levels.
The problem I am facing is how to merge(append) the data sets at each decennial years AND which to pick:
1970 1% state (2 forms)
1980 5% state
1990 5% state
2000 ACS 1% or 5% ?
2010 ACS 1% or 5%
2018 ACS 1% or 5%
or is there an alternative data that I can use.
I appreciate any piece of advice. Thank you
I forgot to mention that I am interested to get employment status and related person specific variables for detailed occupations.
Hello Leulseged,
The simplest way to append all of these years into one dataset is to add them all into a single data extract on IPUMS. This will produce a single dataset containing records from all years. Please note that there is no way to link individuals between censuses in these years.
Which datasets to use depends on the analysis you are interested in conducting. You can find descriptions of samples at this page.
Regarding the 1970 sample, I recommend adding both samples to your data cart (by choosing Change Samples from the extract builder page). Then when you are browsing variables, you will be able to see which of the two 1970 state samples contain that variable, and choose the sample that has the variables you are interested in.
For 2000, I would generally recommend using the 5% sample from the 2000 census, instead of the ACS data. That is because this was the first year of the ACS, it had a smaller sample size, and it was in an experimental phase.
For 2010, the 1% ACS data uses data from only one year (2010). The sample size is around 3 million. The 5% sample combines data from 2006-2010, and has a sample size closer to 15 million. The 5% dataset gives more precises estimates because of the larger sample size, but is not as representative of the specific year 2010 because it also includes other years. You’ll need to decide which to use considering this tradeoff. The same applies to the 2018 ACS samples.
Hi Mathew,
Thank you very much for your quick and detailed reply, it is quite helpful.
I am more or less clear on how to go about it. 1970 sample depemnds on what variable I am interested in as you clarified it well.
In 2000, when you said “…using the 5% sample from the 2000 census…”, I’m assuming you’re referring to the 5% sample in the IPUMS sample selection.
For the 2010 and 2018, I can probably use both 1% ACS and the 5% sample to see the results’ sensitivity to the sample.
Concerning linking individuals, thank you for clarifying this issue. I believe, however, that I can link occupations (hence labour market variables for each occupation) using cross walks, am I right?
Thank you once again for your advice.
Yes, for 2000 the “5%” sample comes from the 2000 census.
Regarding occupations, yes you can create a consistent series of statistics by occupation. For this purpose, I would recommend using one of the harmonized occupation variables created by IPUMS, most likely OCC1990. Please see the documentation for that variable to understand how it was created. You can read more about the IPUMS harmonized occupation variables here.
Thank you very much. I will go through the occupation cross walk documentation.
Thank you once again for your swift replies.