# What do Weights Really Mean?

I am using Unweighted Samples to see average income from wages, occupation, and other characteristics of different race/ethnic populations.

I am looking at Hawai’i, and some of the sample sizes are very small when I look at specific characteristics (i.e. Chinese people with a bacehlors degree in 1930). If the sample is Unweighted, can I still use the small sample size (less than 20) to infer that Chinese people with a bachelors degree made X amount of earnings?

If, in my Unweighted sample, each record is meant to represent 100 people, does that mean that even if I find 3 Chinese people who have very specific characteristics (i.e. work in Sales, have a Bachelors degree, own a home), does that really represent 300 Chinese people during that time?

Does the PERWT (person’s weight) variable only represent how that person’s racial background is weighted in the population, or does it represent all of the characteristics associated with that specific record (their job, their wage, etc.)

Sampling weights correct for the non-random sampling strategy used in most US Census samples. That is, most samples are not a raw random sample of the US population. Rather, the sample over-samples some types of people (defined by a number of demographic and household type characteristics) in order to ensure that the sample includes enough individuals of a given type to perform valid statistical calculations. This blog post and this documentation page discuss the concept of sampling weights in much more detail. When performing person-level analysis, like you discuss above, it is generally a good idea to use the PERWT variable. Incorporating this variable will correct for the complex sampling strategy used to generate each sample and allows for the calculation of representative population estimates.

The exception to this rule is when analyzing one of the “flat” or unweighted IPUMS samples. Flat IPUMS samples include the 1% samples from 1850-1930, all samples from 1960, 1970, and 1980, the 1% unweighted samples from 1990 and 2000, the 10% 2010 sample, and any of the full count 100% census datasets. In these “flat” samples, the PERWT and HHWT values will be the same for everyone in the sample. So, the use of these variables is only necessary when estimating population counts of individuals who meet a given set of characteristics. More information about the sampling designs of US Census data is available here.

Thanks so much for the response, Jeff. You mention that the 1% samples from 1850-1930 are flat but it seems like the 1910 1% sample is weighted. I just want to make sure that is correct.

This is an error. The values of PERWT in the 1910 1% value should be the same for all observations. The IPUMS USA Team will work on getting this fixed as soon as possible. In the meantime, you can simply replace the value of PERWT to be 100 for all observations.

We like to reward users who bring errors such as this to our attention with an IPUMS coffee mug. To claim yours email your mailing address to ipums@umn.edu.

Thanks so much!

What about the 1930 1% sample? It also appears to be weighted but is this an error?

Yes, this is also the case for the 1930 1% sample as well. All observations should have the same value for PERWT. The IPUMS USA Team will look into and address this issue as soon as possible.

Hi Jeff,

Thanks for the detailed info. and link to the blog–this was helpful. I think I’m still a bit confused on whether the PERWT variable is important/useful/accurate in all instances of performing person-level analysis. Here’s an example that might give more clarity:

I want to know what types of workers are moving from California and Texas. So I would pull some variables, including:

Employment status
Earnings
PUMA1
PUMA2
Age
Occupation

Now, let’s say one specific sample shows a 25 year old Software Developer earning \$100,000 moved from California to Texas. Based on their demographics, this person represents 95 Californians according to the PERWT variable. But do they represent 95 Software Developers earning \$100,000 that moved between the two states? In other words, if this was the sole sample (ignoring too small sample limitations), would it be accurate to use the statement in the previous sentence as an estimate of migration patterns?

-Hector

Yes, this use of PERWT is valid. However, note that all estimates have an associated margin of error. The general principle here is that the more detailed the slice of the population you examine (e.g., someone of a particular age, in a particular job, making a particular amount of money, moving from one specific state to another) the larger the standard error will be associated with that estimate. So, although the PERWT value will provide a “valid” estimate, in the case you discuss it will provide a very noisy or imprecise estimate.

Thanks for the quick response, Jeff. That makes sense. Do the estimates become less noisy/more precise if we aggregate the data more while still using PERWT? For example, if we just look at all respondents with a certain occupation code that migrated between two states? Is there good documentation on how to use the weights to find the standard error (I believe in this case we need the replicable weights as well)?

Alternatively, is there a methodology to perform analysis strictly using the samples and ignoring the personal weights when trying to get as granular as I’ve been suggesting? Also, is there an email/phone contact for IPUMS support?

Thanks!

In general the more individuals within a given sample that meet your criteria the more precise your estimates will be. So, if you only condition on occupation and migration information, this estimate will tend to be more precise than if you also condition on age, income, etc. I am not sure what software you are using, but here is some good documentation about applying sampling weight in Stata. Most statistical software have built-in features that automatically calculate standard errors around point estimates.

If your goal is to estimate representative statistics with non-random sampled data, then you should in general always be applying sampling weights. With that said, the attached article by Solon et al. offers a nice discussion of when sampling weights should be applied.

Finally, you can contact the IPUMS user support team directly by emailing ipums@umn.edu

Solon et al. (2015 JHR).pdf (272.0 KB)