Context: I’m building a report at the state level and want to display, per state, the reason people moved out of that state.
I got the data “Reason for moving by state and year” using the Analyze Data Online web tool and also ipumsr in R.
Here is my query for getting the data using the online tool.
here’s the R code for getting the same data:
library(ipumsr)
library(tidyverse)
api_key <- Sys.getenv("ipums_api_key")
set_ipums_api_key(api_key)
my_extract_definition <- define_extract_cps(
description = "reason for moving to another state extract",
samples = c("cps2017_03s", "cps2018_03s",
"cps2019_03s", "cps2020_03s", "cps2022_03s"),
variables = c("WHYMOVE", "MIGSTA1", "MIGRATE1"),
data_structure = "rectangular",
data_format = "csv",
case_select_who = "individuals"
)
# WHYMOVE - primary reason for moving, for people who lived in a different residence a year ago.
# MIGRATE1 - Movers across various geographic boundaries--county, state, and country.
# MIGSTA1 - State of previous residence
# In WHYMOVE, the codes relate to family, work, housing, education, climate, and health.
data <- my_extract_definition |>
submit_extract() |>
wait_for_extract() |>
download_extract() |>
read_ipums_micro()
data_movers_only <- data |>
filter(WHYMOVE != 0) |>
filter(MIGRATE1 == 5) |>
filter(!(MIGSTA1 == 91 | MIGSTA1 == 99)) |>
select(YEAR, WHYMOVE, MIGSTA1)
# initial exploring
ggplot(data_movers_only, aes(x = WHYMOVE)) +
geom_bar() +
labs(title = "Reason for Moving to Another State",
x = "Reason",
y = "Count")
table(data_movers_only$WHYMOVE)
I notice that with the API’s data, I only have the frequency distribution based on unweighted N cases. With the online tool, I get both weighted and unweighted. However, I still want to use the API and I want to do frequency and percentage analysis using weighted N cases instead of unweighted. Please guide me on how to do so.
I notice the variable ASECWT, am I supposed to do something with it in R to get the weighted N cases?
Thank you!