CPS json requested extract in stata format is delivered as .dat

I have an extract that specifies as dataFormat stata, yet the file I download is .dat.
Below is my code – the extract clearly specifies “dataFormat”:“stata” – but when I extract/unzip the file, there’s a .dat inside.

Where did I go wrong?

from pathlib import Path

from ipumspy import IpumsApiClient, UsaExtract, readers, ddi, api

IPUMS_API_KEY = 'XXX'
DOWNLOAD_DIR = Path('./')

ipums = IpumsApiClient(IPUMS_API_KEY)
extract_json = '''{"version":2,"dataStructure":{"rectangular":{"on":"P"}},"dataFormat":"stata","caseSelectWho":"individuals","description":"FOR REPLICATION BROADNESS Revision of (for predicting unemployment, 1995-2018)","samples":{"cps1996_01s":{},"cps1996_02s":{},"cps1996_03b":{},"cps1996_04b":{},"cps1996_05s":{},"cps1996_06b":{},"cps1996_07b":{},"cps1996_08b":{},"cps1996_09s":{},"cps1996_10s":{},"cps1996_11s":{},"cps1996_12b":{},"cps1997_01b":{},"cps1997_02s":{},"cps1997_03b":{},"cps1997_04s":{},"cps1997_05s":{},"cps1997_06b":{},"cps1997_07b":{},"cps1997_08b":{},"cps1997_09s":{},"cps1997_10s":{},"cps1997_11b":{},"cps1997_12b":{},"cps1998_01b":{},"cps1998_02s":{},"cps1998_03b":{},"cps1998_04b":{},"cps1998_05b":{},"cps1998_06s":{},"cps1998_07b":{},"cps1998_08s":{},"cps1998_09s":{},"cps1998_10s":{},"cps1998_11s":{},"cps1998_12s":{},"cps1999_01s":{},"cps1999_02s":{},"cps1999_03b":{},"cps1999_04s":{},"cps1999_05s":{},"cps1999_06b":{},"cps1999_07b":{},"cps1999_08b":{},"cps1999_09s":{},"cps1999_10s":{},"cps1999_11b":{},"cps1999_12b":{},"cps2000_01s":{},"cps2000_02s":{},"cps2000_03b":{},"cps2000_04b":{},"cps2000_05s":{},"cps2000_06s":{},"cps2000_07b":{},"cps2000_08s":{},"cps2000_09s":{},"cps2000_10s":{},"cps2000_11s":{},"cps2000_12b":{},"cps2001_01b":{},"cps2001_02s":{},"cps2001_03b":{},"cps2001_04s":{},"cps2001_05s":{},"cps2001_06s":{},"cps2001_07b":{},"cps2001_08s":{},"cps2001_09s":{},"cps2001_10s":{},"cps2001_11s":{},"cps2001_12s":{},"cps2002_01s":{},"cps2002_02s":{},"cps2002_03b":{},"cps2002_04b":{},"cps2002_05b":{},"cps2002_06s":{},"cps2002_07b":{},"cps2002_08s":{},"cps2002_09s":{},"cps2002_10s":{},"cps2002_11s":{},"cps2002_12s":{},"cps2003_01b":{},"cps2003_02s":{},"cps2003_03b":{},"cps2003_04b":{},"cps2003_05b":{},"cps2003_06s":{},"cps2003_07b":{},"cps2003_08s":{},"cps2003_09s":{},"cps2003_10s":{},"cps2003_11s":{},"cps2003_12s":{},"cps2004_01s":{},"cps2004_02b":{},"cps2004_03b":{},"cps2004_04b":{},"cps2004_05s":{},"cps2004_06s":{},"cps2004_07b":{},"cps2004_08b":{},"cps2004_09s":{},"cps2004_10s":{},"cps2004_11s":{},"cps2004_12s":{},"cps2005_01s":{},"cps2005_02s":{},"cps2005_03b":{},"cps2005_04b":{},"cps2005_05s":{},"cps2005_06b":{},"cps2005_07s":{},"cps2005_08s":{},"cps2005_09s":{},"cps2005_10s":{},"cps2005_11s":{},"cps2005_12s":{},"cps2006_01s":{},"cps2006_02b":{},"cps2006_03b":{},"cps2006_04b":{},"cps2006_05s":{},"cps2006_06s":{},"cps2006_07b":{},"cps2006_08s":{},"cps2006_09s":{},"cps2006_10s":{},"cps2006_11s":{},"cps2006_12s":{},"cps2007_01s":{},"cps2007_02b":{},"cps2007_03b":{},"cps2007_04b":{},"cps2007_05b":{},"cps2007_06b":{},"cps2007_07b":{},"cps2007_08s":{},"cps2007_09s":{},"cps2007_10s":{},"cps2007_11b":{},"cps2007_12s":{},"cps2008_01s":{},"cps2008_02b":{},"cps2008_03b":{},"cps2008_04b":{},"cps2008_05s":{},"cps2008_06s":{},"cps2008_07b":{},"cps2008_08s":{},"cps2008_09s":{},"cps2008_10s":{},"cps2008_11s":{},"cps2008_12s":{},"cps2009_01s":{},"cps2009_02b":{},"cps2009_03b":{},"cps2009_04b":{},"cps2009_05b":{},"cps2009_06b":{},"cps2009_07b":{},"cps2009_08s":{},"cps2009_09s":{},"cps2009_10s":{},"cps2009_11s":{},"cps2009_12s":{},"cps2010_01s":{},"cps2010_02b":{},"cps2010_03b":{},"cps2010_04b":{},"cps2010_05s":{},"cps2010_06s":{},"cps2010_07s":{},"cps2010_08s":{},"cps2010_09s":{},"cps2010_10s":{},"cps2010_11s":{},"cps2010_12s":{},"cps2011_01s":{},"cps2011_02b":{},"cps2011_03b":{},"cps2011_04b":{},"cps2011_05s":{},"cps2011_06s":{},"cps2011_07s":{},"cps2011_08s":{},"cps2011_09s":{},"cps2011_10s":{},"cps2011_11s":{},"cps2011_12s":{},"cps2012_01s":{},"cps2012_02b":{},"cps2012_03b":{},"cps2012_04b":{},"cps2012_05s":{},"cps2012_06s":{},"cps2012_07s":{},"cps2012_08s":{},"cps2012_09s":{},"cps2012_10s":{},"cps2012_11s":{},"cps2012_12s":{},"cps2013_01b":{},"cps2013_02s":{},"cps2013_03b":{},"cps2013_04b":{},"cps2013_05b":{},"cps2013_06s":{},"cps2013_07s":{},"cps2013_08s":{},"cps2013_09s":{},"cps2013_10s":{},"cps2013_11s":{},"cps2013_12s":{},"cps2014_01s":{},"cps2014_02s":{},"cps2014_03b":{},"cps2014_04b":{},"cps2014_05b":{},"cps2014_06s":{},"cps2014_07s":{},"cps2014_08s":{},"cps2014_09s":{},"cps2014_10s":{},"cps2014_11s":{},"cps2014_12s":{},"cps2015_01s":{},"cps2015_02s":{},"cps2015_03b":{},"cps2015_04b":{},"cps2015_05s":{},"cps2015_06s":{},"cps2015_07s":{},"cps2015_08s":{},"cps2015_09s":{},"cps2015_10s":{},"cps2015_11b":{},"cps2015_12s":{},"cps2016_01s":{},"cps2016_02s":{},"cps2016_03b":{},"cps2016_04b":{},"cps2016_05b":{},"cps2016_06s":{},"cps2016_07b":{},"cps2016_08s":{},"cps2016_09s":{},"cps2016_10s":{},"cps2016_11s":{},"cps2016_12s":{},"cps2017_01b":{},"cps2017_02s":{},"cps2017_03b":{},"cps2017_04b":{},"cps2017_05s":{},"cps2017_06s":{},"cps2017_07s":{},"cps2017_08s":{},"cps2017_09s":{},"cps2017_10s":{},"cps2017_11s":{},"cps2017_12s":{},"cps2018_01s":{},"cps2018_02s":{},"cps2018_03b":{},"cps2018_04b":{},"cps2018_06s":{}},"variables":{"YEAR":{"preselected":true},"SERIAL":{"preselected":true},"MONTH":{"preselected":true},"HWTFINL":{"preselected":true},"CPSID":{"preselected":true},"ASECFLAG":{"preselected":true},"REGION":{},"STATEFIP":{},"COUNTY":{},"METAREA":{},"METRO":{},"PERNUM":{"preselected":true},"WTFINL":{"preselected":true},"CPSIDP":{"preselected":true},"AGE":{},"SEX":{},"RACE":{},"MARST":{},"VETSTAT":{},"EMPSTAT":{},"LABFORCE":{},"OCC2010":{},"IND1990":{},"EDUC":{},"DIFFANY":{}},"collection":"cps"}'''

# the file extr contains the same definition as extract_json
extract = api.extract.define_extract_from_json('extr')

# Submit an API extract request

ipums.submit_extract(extract)
print(f"Extract submitted with id {extract.extract_id}")

# wait for the extract to finish
ipums.wait_for_extract(extract)

# Download the extract
ipums.download_extract(extract, download_dir=DOWNLOAD_DIR)

It looks like you might be submitting a different extract than that defined in your extract_json = ... statement. If you change extract = api.extract.define_extract_from_json('extr') to extract = api.extract.define_extract_from_json('extract_json'), you should get the Stata formatted data.

Hope that helps!

Oh sorry, the file contains exactly the same definition as the string. I just didn’t know how to define_extract_from_json using a string as opposed to a file, and didn’t know how to attach the file here.

No worries - I also gave incomplete advice. There is a bug in define_extract_from_json() in v0.4.1 that overwrites *Extract keyword arguments in the json with default values. This bug is fixed in the next version (hopefully to be released soon!), but for now the quickest thing will be to to just update the extract’s data_format attribute after reading from the json and before you submit your extract as in the following:

...

# include the .json file extension in the argument
# this creates a CpsExtract object
extract = api.extract.define_extract_from_json('extr.json')

# even though the json defines the data_format as stata, 
# this bug is reverting it to fixed-width by default 
# the line below will show "fixed-width"
print(extract.data_format)

# we don't want that! We want Stata!
extract.data_format = "stata"

# Now the same print statement should show "stata"
print(extract.data_format)

# Now you can submit and get a stata-formated extract!

...

I hope this helps!

1 Like