Does a 'revised' extract include everything that the original did?


I used the ‘revise’ function to increase the type of geographies covered by my original extract (IPUMS-USA), from just certain cities to include the metro areas around those cities plus the cities. In my revision, I marked the desired metros as “special cases” and made sure all the cities were also still marked as “special cases.” But when my revised extract was finished, it was about half the size of the original (81000 kb vs 145000kb in unzipped GZ form). I thought the whole revised file would be bigger, not smaller. Or am I supposed to now merge this revised extract with the original extract? Thank you!



Yes, revising an extract will retain all samples and variables. However, when selecting cases by cities and metro areas, your data will only include cases that meet both specifications. Your first extract selected cases that were within certain cities, but now the revised extract contains cases that are within certain cities AND the chosen metropolitan areas, which is more restrictive. This is why your revised extract is much smaller.

To limit the initial size of your extract, I recommend selecting cases using STATEFIP. Then once the data is read into a statistical package you can whittle down the data to a more specific area.

Hope this helps!



Thanks, that makes sense.