Dear Folks–
I have attached (or tried to, hope it works) a spreadsheet containing data from the Case Count View of the IPUMS-CPS variable METAREA. If you scroll down the left-hand edge to about row 500 you will see a stacked area graph of METAREA counts divided into four categories: identified metro, unidentified metro, non-metro (which I take to be rural – not sure what the N.I.U. here means. It’s too big a number to be an NIU by standard definition), and missing. These are percentages and add to 100 (and the do – they are not forced to by some spreadsheet operation).
I was struck by the downward spike (in yellow) in the share of “Unidentified Metropolitan” in 1977. Because this is a stacked graph and the high end does not change much, this represents a near-doubling of this value. Actually, it means that the count for rural people (or families?) in 1977, which is between 29 and 32 percent for the five years preceding and eight years following 1977, is zero in 1977, and all of that value seems to have been reported in unidentified metropolitan instead.
Since it is clear that the rural population of the U.S. didn’t disappear for a year, this would seem to be an outlier in the strictest sense, and I wonder if you know anything about it.
OK, you don’t have a spreadsheet, and I am pissed about it. It is at odds with your mission as an open-government, open-data nonprofit to limit access and interaction with you to proprietary formats, when there are free, open-source equivalents with substantial market share and superior performance – in this case LibreOffice’s spreadsheet Calc, the OpenOffice spreadsheet (also called Calc), and GNOME’s Gnumeric. All three of these programs are based on the OASIS Open Document Format, like the DDI format IPUMS uses an open-source standard set by an international nonprofit consortium. The open-source programs may be a bit less versatile, but unlike Excel, Calc and Gnumeric consistently get the right answer, and so are more suitable for reproducible research and science more generally.
And, they are enormously less likely to harbor macro viruses, so security concerns would suggest that you allow information exchange in the secure free open-source reproducible format and ban interaction through the insecure proprietary monopoly-priced expensive buggy inaccurate occult-coded format. Or at least allow the better format, for gosh sakes. As between Microsoft and LibreOffice’s network of volunteers, which is providing software “for Good, never for Evil”?
Anyway, you can see the anomaly on the METAREA page, Case Count View, Codes 9997 and 9998 for 1977.