Hello, im new using the website and im working on a migration analysis, so i used the Analyze data section and selected the 1980 sample and dpld variable (detailed birth place) i got a number. But then i use the Browse and select data section to dowload the same information on csv format. I did al the procedure, but for my surprise the values didnt match with the ones on the online Analysis
ill appreciate some help explaining that
Did you use weights for your analysis when you downloaded the data? You need to weight your analysis using PERWT in order to replicate the results in the online analysis system. If you did, can you please post the number you found from both the online system and the extract that you downloaded?
Sure, i multiply the PERWT by PERNUM, then i sum all those results, in order to get the total population in usa to double check that number with the one on the Analyze data section. This way i got 32.361.600. Instead of 226.862.400 which is the number on the Analyze data section and looks more rational for the total population in us
my question would be if that multiplication is well done? also i just noticed that when i open the file excel tells me that couldnt load all the data cause its over the limit of 1.048.576 rows. Any idea how can i solve this?
You shouldn’t multiply PERWT by PERNUM. To check the population total, just sum PERWT. To get a weighted estimate (for example average age) you multiply the weight by the variable of interest, for example: PERWT*AGE. I’ll also note that when calculating averages in Excel, you’ll need to divide the weights by the population total (the sum of the weights). Statistical software packages will do this for you automatically.
I think the reason your population total is so small is because you are only able to load about 1/10 of the dataset, so only about 10% of the people are being counted. There are a number of ways to overcome this limitation, here are a few:
- Use a statistical package such as R (free) or Stata instead of Excel.
- Use PowerPivot in Excel (this turns Excel into a full-featured database and can load much more data).
- Take a random sample of the data when making your extract in IPUMS. More details on this are available here, and I’ve also attached a screenshot showing where you can choose this. For your particular sample, I think a density of .45 would be sufficiently small.
Thanks for your support, i wouldnt have done it with out your help. I got your idea about the PERWT vairable and i used the power pivot tool, finally matched the data.