Thank you!! Such a relief to have some success, I am able to retrieve race_mom and race_pop using your code.
When I select multiple variables, I do not see attached variables. I am not super experienced in Python, please pardon me if this is my Python and not ipumspy. The code below will run without errors:
EXT=UsaExtract([‘us1910k’,‘us1920a’,‘us1930a’,‘us1940a’,‘us1950a’,‘us1960a’,‘us1970c’,‘us1980b’,‘us1990b’,‘us2000d’,‘us2010a’,‘us2020a’], [‘YEAR’,‘SAMPLE’,‘SERIAL’,‘CBSERIAL’,‘HHWT’,‘CLUSTER’,‘STRATA’,‘GQ’,‘PERNUM’,‘PERWT’,‘SEX’,‘AGE’,‘MARST’,‘BIRTHYR’,‘RACE’,‘RACED’,‘HISPAN’,‘HISPAND’,‘BPL’,‘BPLD’,‘EDUC’,‘EDUCD’,‘FTOTINC’,‘INCWAGE’,‘OCCSCORE’], data_format=“stata”, description=“API retrieval”, )
EXT.attach_characteristics(‘RACED’,[“mother”,“father”])
#EXT.attach_characteristics(‘BPLD’,[“mother”,“father”])
#EXT.attach_characteristics(‘EDUCD’,[“mother”,“father”])
When I look at the dataframe, however, I don’t have attached variables:
DOWNLOAD_DIR=pathlib.WindowsPath(‘c:/users/jcolo/Box/Dissertation/Descriptive/Data’)
ipums_api.submit_extract(EXT)
ipums_api.wait_for_extract(EXT)
ipums_api.download_extract(EXT, download_dir=DOWNLOAD_DIR)
gz_file=(f"{DOWNLOAD_DIR}/{EXT.collection}_{str(EXT.extract_id).zfill(5)}.dta.gz")
with gzip.open(gz_file, ‘rb’) as f_in:
with open(‘extract.dta’,‘wb’) as f_out:
shutil.copyfileobj(f_in, f_out)
df=pd.read_stata(‘extract.dta’, convert_categoricals=False)
df.describe
<bound method NDFrame.describe of year sample serial cbserial hhwt cluster
0 1910 191002 101 NaN 100.0 1.910000e+12
1 1910 191002 101 NaN 100.0 1.910000e+12
2 1910 191002 102 NaN 100.0 1.910000e+12
3 1910 191002 201 NaN 100.0 1.910000e+12
4 1910 191002 201 NaN 100.0 1.910000e+12
… … … … … … …
21141062 2020 202001 1193466 2.020001e+12 112.0 2.020012e+12
21141063 2020 202001 1193466 2.020001e+12 112.0 2.020012e+12
21141064 2020 202001 1193467 2.020001e+12 50.0 2.020012e+12
21141065 2020 202001 1193467 2.020001e+12 50.0 2.020012e+12
21141066 2020 202001 1193468 2.020001e+12 172.0 2.020012e+12
strata gq pernum perwt ... raced hispan hispand bpl \
0 110100100.0 1 1 100.0 … 200 0 0 1
1 110100100.0 1 2 100.0 … 210 0 0 1
2 110100100.0 1 1 100.0 … 210 0 0 1
3 110100100.0 1 1 100.0 … 100 0 0 1
4 110100100.0 1 2 100.0 … 100 0 0 1
… … … … … … … … … …
21141062 50056.0 1 5 103.0 … 100 0 0 49
21141063 50056.0 1 6 107.0 … 100 0 0 56
21141064 20056.0 1 1 50.0 … 100 0 0 46
21141065 20056.0 1 2 53.0 … 100 0 0 31
21141066 30056.0 1 1 172.0 … 100 0 0 56
bpld educ educd ftotinc incwage occscore
0 100 NaN NaN NaN NaN 6
1 100 NaN NaN NaN NaN 4
2 100 NaN NaN NaN NaN 6
3 100 NaN NaN NaN NaN 80
4 100 NaN NaN NaN NaN 0
… … … … … … …
21141062 4900 1.0 17.0 108000.0 999999.0 0
21141063 5600 1.0 14.0 108000.0 999999.0 0
21141064 4600 6.0 63.0 27000.0 0.0 0
21141065 3100 6.0 63.0 27000.0 0.0 0
21141066 5600 11.0 114.0 22000.0 22000.0 20
[21141067 rows x 25 columns]>
EDIT I realized my describe was not showing all vars – here’s the Stata output:
. describe
Contains data from C:\Users\jcolo\Box\Dissertation\Descriptive\extract.dta
Observations: 21,141,067
Variables: 25 24 OCT 2023 04:07
Variable Storage Display Value
name type format label Variable label
year int %8.0g YEAR census year
sample long %12.0g SAMPLE ipums sample identifier
serial long %12.0g household serial number
cbserial double %12.0g original census bureau household
serial number
hhwt double %12.0g household weight
cluster double %12.0g household cluster for variance
estimation
strata double %12.0g household strata for variance
estimation
gq byte %8.0g GQ group quarters status
pernum byte %8.0g person number in sample unit
perwt double %12.0g person weight
sex byte %8.0g SEX sex
age int %8.0g AGE age
marst byte %8.0g MARST marital status
birthyr int %8.0g year of birth
race byte %8.0g RACE race [general version]
raced int %8.0g RACED race [detailed version]
hispan byte %8.0g HISPAN hispanic origin [general version]
hispand int %8.0g HISPAND hispanic origin [detailed version]
bpl int %8.0g BPL birthplace [general version]
bpld long %12.0g BPLD birthplace [detailed version]
educ byte %8.0g EDUC educational attainment [general
version]
educd int %8.0g EDUCD educational attainment [detailed
version]
ftotinc long %12.0g total family income
incwage long %12.0g INCWAGE wage and salary income
occscore byte %8.0g occupational income score