How to get weighted N of cases using the API?

Context: I’m building a report at the state level and want to display, per state, the reason people moved out of that state.
I got the data “Reason for moving by state and year” using the Analyze Data Online web tool and also ipumsr in R.

Here is my query for getting the data using the online tool.

here’s the R code for getting the same data:

library(ipumsr)
library(tidyverse)

api_key <- Sys.getenv("ipums_api_key")
set_ipums_api_key(api_key)

my_extract_definition <- define_extract_cps(
  description = "reason for moving to another state extract", 
  samples = c("cps2017_03s", "cps2018_03s", 
            "cps2019_03s", "cps2020_03s", "cps2022_03s"), 
  variables = c("WHYMOVE", "MIGSTA1", "MIGRATE1"),
  data_structure = "rectangular",
  data_format = "csv",
  case_select_who = "individuals"
)

# WHYMOVE - primary reason for moving, for people who lived in a different residence a year ago.
# MIGRATE1 - Movers across various geographic boundaries--county, state, and country. 
# MIGSTA1 - State of previous residence

# In WHYMOVE, the codes relate to family, work, housing, education, climate, and health. 

data <- my_extract_definition |>
  submit_extract() |>
  wait_for_extract() |>
  download_extract() |>
  read_ipums_micro()

data_movers_only <- data |>
  filter(WHYMOVE != 0) |>
  filter(MIGRATE1 == 5) |> 
  filter(!(MIGSTA1 == 91 | MIGSTA1 == 99)) |> 
  select(YEAR, WHYMOVE, MIGSTA1)

# initial exploring
ggplot(data_movers_only, aes(x = WHYMOVE)) +
  geom_bar() +
  labs(title = "Reason for Moving to Another State",
       x = "Reason",
       y = "Count")

table(data_movers_only$WHYMOVE)

I notice that with the API’s data, I only have the frequency distribution based on unweighted N cases. With the online tool, I get both weighted and unweighted. However, I still want to use the API and I want to do frequency and percentage analysis using weighted N cases instead of unweighted. Please guide me on how to do so.

I notice the variable ASECWT, am I supposed to do something with it in R to get the weighted N cases?

Thank you!

I want to preface by saying the API is a tool to access data; the API is not an analysis tool for frequencies and percentages. The online analysis tool applies weights (ASECWT is the sdawt for CPS samples in the SDA) to produce weighed estimates, but the frequencies displayed on the website are unweighed counts (e.g. see the WHYMOVE case-count view). However, you can use the data you obtain through the API to run your analysis in R. IPUMS provides a variety of data training exercises with sample code for analyzing CPS data in R, including code that incorporates weights into analyses. I recommend reviewing these exercises to understand how to adapt your code to account for weights.

1 Like

Thank you! I was able to go through a few exercises and got the weighted data.