Stata version, .do files and file metadata conversion issues


#1

Do the .do files apply equally to all versions of of Stata? If not, in what version of Stata are the files supplied?

Also, I am a little confused about the format of the Stata data, which I am converting to R. Because R holds its data in RAM, I need to read larger files in in chunks (for transfer to a database). Unfortunately, R’s flagship Stata -> R conversion program, haven, does not have a setting to specify reading less than the entire file (though there are some other conversion packages I have not explored yet, at least through Stata 13). So I was thinking that I would use a line-reading utility to read the first n lines, use haven to convert those line with all the right formatting and metadata, and then bind subsquent blocks data using the converted top-of-file with the full collumn format, including all the levels information. But then it appeard to me that all this information is actually in the .do file, so that would not work – I’d have to extract it from the .do file somehow.

However, for smaller files, haven translation seems to pick up everything, including the header info. I’m not quite sure how that can be without some sort of metadata in the file similar to that in the .do file. Is the metadata information such as the data type and the levels information also encoded in the first however many rows? If so, do you have any information about the metadata formatting in those rows? If so, I’d like any information you have on the formatting of the metadata in those rows, and I especially want to know how the metadata is divided from the data – by assigning some number of rows (and if so how many)


#2

The .do files should run in any version of Stata. Basically, these command files take the (decompressed) fixed-width .dat files and convert them into a data file, such as a Stata .dta file. As part of this process, the command files label variables and define value labels. Since you are trying to read IPUMS data into R, it sounds like you may benefit from using the ipumsr package. This package reads in data from an IPUMS extract into R along with all of the associated metadata, such as variable labels and value labels.


#3

Wow! Great answer. I’ll check it out immediately.