I have downloaded extracts of the Swedish 1880-1990 censuses. When opening them, all the Swedish characters in name strings have been replaced by question marks.
E.g.
NAMELAST
Eketr?
Jernstr?m
S?flund
Byd?n
This is irregardless of file format and the tool I open the extract with. How do I download/open the census extracts properly to not lose ÅÄÖ characters?
1 Like
Thank you for alerting us to this issue. We are sorry about the misspecified character formatting in these strings. The technology that is supposed to handle the characters when transferring data from our internal area to the web extract system is not working properly. Our IT team is working on the problem. We will get it corrected as soon as possible.
This is still an issue. It occurred when I downloaded an extract of the 1910 SweCens as CSV today. It is even present in the command files for loading fixed-width text files. How can I import the data without losing å, ä, and ö in names?
Thank you for alerting us to this issue. We are looking into it and will respond here when we have solved the problem.
2 Likes
Hi Grace! I am also working with this data, do you have any estimation of how long the fix will take to implement?
Best wishes and many thanks,
Raoul
This issue should now be resolved. Please reply to this post or email ipums@umn.edu if you continue to experience difficulties.
This problem is still present for the source variable se1910a_resname in the SweCens 1910 data set.
Thanks for bringing this to our attention. I have an inquiry out to the IPUMS International team to see if I can get clarification on the previously implemented fix and why the issue is persisting.
I am following up to report that this appears to be working. Please let me know if you continue to encounter issues @Monir_Bounadi.
Hello! This problem is still present in 2024. For example, when downloading the CSV file, “Å” turns into “a”. Downloading the DTA file and then running the Stata file provided by IPUMS also turns “ö” into “a”… The main issues are with the se1890a variables, where data is taken from the source as transcribed, and that is where you find the Swedish letters.
How were you able to get it working?
For others: There are tedious ways to resolve this through other means. It would be good with an internal fix at IPUMS.
When checking this several months ago, we noticed appropriate characters in formatted data files for Stata, SPSS, and CSV; SAS was replacing “ä” with “ae” but all seemed to be working on the whole for Swedish letters å, ä, and ö. I resubmitted an extract today and both my Stata-formatted data file and my fixed-width file read in with the corresponding .do file worked for Swedish letters. However, I did notice issues with “é” and “è” in the fixed-width text file read in with the corresponding .do file (but not the Stata-formatted file). I will share this with the IPUMS International team and look into fixing it. Assuming you are seeing issues with other letters, could you please tell me which extract in your account you are using as well as the version of Stata you are running so I can try to troubleshoot further?
Hey! Thanks for your reply. It is still not working for me. I am using Stata MP 16.1 and, most recently, extract number 31 on my account. For example, the surname of a person may be “Ekelaf” when it should be “Ekelöf” as can be seen here for the example “Sofia Albertina Ekelaf” in my extract, and compare it to the original transcription here: Ekelöf, Sofia Albertina - Riksarkivet - Sök i arkiven
Can you look into this matter? I would like it to be solved for all census years (1880, 1890, 1900, 1910).
Hey again! I’ve delved deeper into this and the problem is present for all “as transcribed” source variables for 1880-1900. The problem doesn’t seem to be there for 1910.
I noticed that in the extract we have
å, ä, ö, Å, Ä, Ö = a
and also e.g.
ü = a
Thanks for the additional information. I have added these to my test extracts and will compare behavior across a number of different data extract specifications.