Can I aggregate 1981 Census data from Spain to the municipality level?

I am interested in aggregate, municipality-level data, but the INE (National Statistics Institute) in Spain does not provide it for the 1981 Census. I thought one work-around could be to aggregate microdata to the municipality level to serve as a rough approximation of the data I need. There are a few related questions to what I am thinking of doing.

  1. As I understand it, the microdata is a 5% sample, meaning that I should have at least around 1,000 respondents for municipalities with populations over 20,000. Would the resulting, aggregate data be representative of each municipality (if I apply the appropriate weights) or is treating the microdata in this way statistically inappropriate?

  2. Data for residents of municipalities with populations under 20,000 are reported at the province level (larger area than municipality). I suppose I am not guaranteed any sort of sample size, but could I use the data from respondents in these smaller municipalities to create aggregate data for municipality-groups by province? Is there a minimum number of respondents that I should have before proceeding with this approach?

  3. I read that the provinces of Alava, Guipuzcoa, Navarra, and Vizcaya were over-sampled at roughly 5 times the rate of other province. Does this over-sampling affect my first two points in any way?

  4. Is there anything else I should be aware of that could affect whether or not this proposed approach to create aggregate, municipality-level data is appropriate?

Yes, it is appropriate to use 5% microdata to produce aggregate statistics for municipalities, but weights should be used and standard errors should be computed and taken into account.

Note however that codes for municipalities with fewer than 20,000 inhabitants have been grouped within provinces to protect confidentiality. The left-overs apply to additional unidentified municipalities within the same province.

Weighting will take care of any over-sampling.

Question 4 requires more details to provide an answer. This is more a matter of general statistics and sampling.