Swedish characters in names are replaced by question marks. How do I render them properly?

aronolof · May 27, 2017, 2:19am

I have downloaded extracts of the Swedish 1880-1990 censuses. When opening them, all the Swedish characters in name strings have been replaced by question marks.

E.g.
NAMELAST
Eketr?
Jernstr?m
S?flund
Byd?n

This is irregardless of file format and the tool I open the extract with. How do I download/open the census extracts properly to not lose ÅÄÖ characters?

Lara_Cleveland · May 31, 2017, 3:35pm

Thank you for alerting us to this issue. We are sorry about the misspecified character formatting in these strings. The technology that is supposed to handle the characters when transferring data from our internal area to the web extract system is not working properly. Our IT team is working on the problem. We will get it corrected as soon as possible.

Monir_Bounadi · September 23, 2021, 8:28pm

This is still an issue. It occurred when I downloaded an extract of the 1910 SweCens as CSV today. It is even present in the command files for loading fixed-width text files. How can I import the data without losing å, ä, and ö in names?

Grace_Cooper · October 4, 2021, 4:29pm

Thank you for alerting us to this issue. We are looking into it and will respond here when we have solved the problem.

Raoul_van_Maarseveen · January 31, 2022, 10:51am

Hi Grace! I am also working with this data, do you have any estimation of how long the fix will take to implement?

Best wishes and many thanks,
Raoul

KariWilliams · August 4, 2022, 5:53pm

This issue should now be resolved. Please reply to this post or email ipums@umn.edu if you continue to experience difficulties.

Monir_Bounadi · April 6, 2023, 8:59am

This problem is still present for the source variable se1910a_resname in the SweCens 1910 data set.

KariWilliams · April 19, 2023, 8:26pm

Thanks for bringing this to our attention. I have an inquiry out to the IPUMS International team to see if I can get clarification on the previously implemented fix and why the issue is persisting.

KariWilliams · January 5, 2024, 9:35pm

I am following up to report that this appears to be working. Please let me know if you continue to encounter issues @Monir_Bounadi.

Monir_Bounadi · March 19, 2024, 10:08am

Hello! This problem is still present in 2024. For example, when downloading the CSV file, “Å” turns into “a”. Downloading the DTA file and then running the Stata file provided by IPUMS also turns “ö” into “a”… The main issues are with the se1890a variables, where data is taken from the source as transcribed, and that is where you find the Swedish letters.

How were you able to get it working?

Monir_Bounadi · March 19, 2024, 10:08am

For others: There are tedious ways to resolve this through other means. It would be good with an internal fix at IPUMS.

KariWilliams · March 22, 2024, 9:26pm

When checking this several months ago, we noticed appropriate characters in formatted data files for Stata, SPSS, and CSV; SAS was replacing “ä” with “ae” but all seemed to be working on the whole for Swedish letters å, ä, and ö. I resubmitted an extract today and both my Stata-formatted data file and my fixed-width file read in with the corresponding .do file worked for Swedish letters. However, I did notice issues with “é” and “è” in the fixed-width text file read in with the corresponding .do file (but not the Stata-formatted file). I will share this with the IPUMS International team and look into fixing it. Assuming you are seeing issues with other letters, could you please tell me which extract in your account you are using as well as the version of Stata you are running so I can try to troubleshoot further?

Monir_Bounadi · March 26, 2024, 1:51pm

Hey! Thanks for your reply. It is still not working for me. I am using Stata MP 16.1 and, most recently, extract number 31 on my account. For example, the surname of a person may be “Ekelaf” when it should be “Ekelöf” as can be seen here for the example “Sofia Albertina Ekelaf” in my extract, and compare it to the original transcription here: Ekelöf, Sofia Albertina - Riksarkivet - Sök i arkiven

Can you look into this matter? I would like it to be solved for all census years (1880, 1890, 1900, 1910).

Monir_Bounadi · March 27, 2024, 4:25pm

Hey again! I’ve delved deeper into this and the problem is present for all “as transcribed” source variables for 1880-1900. The problem doesn’t seem to be there for 1910.

I noticed that in the extract we have

å, ä, ö, Å, Ä, Ö = a

and also e.g.

ü = a

KariWilliams · April 2, 2024, 10:24am

Thanks for the additional information. I have added these to my test extracts and will compare behavior across a number of different data extract specifications.

Jonatan_Andersson · October 15, 2025, 11:34am

Hi!

The problem that Monir_Bounadi is describing is still unresolved. I use the ipumsr package to extract the censuses into R. NAMELAST and NAMEFRST contain åäö in 1910 but not in the other censuses.

What did you find when you inspected the extracts?

KariWilliams · October 17, 2025, 7:37pm

Thanks for your message and my apologies to @Monir_Bounadi for not following up after looking into this most recently.

Based on my examination, the characters are being rendered correctly in the fixed-width data files for 1880 and 1910 in both the integrated variables (e.g., NAMEFRST) and the unharmonized source variables (e.g., SE1880A_NAMEFRST). I did not see any of these characters in 1890 or 1900 (in the source variables or the integrated variables) – as previously highlighted, these are seemingly being translated to “a”, which makes them harder to detect than if they were presenting as something less typical (e.g., “?”).

It seems that one of our upstream processes for ingesting the original 1890 and 1900 data is stripping these characters out. My IPUMS International colleagues are exploring how we might efficiently recover the original version of the names that does not obscure these characters.

Jonatan_Andersson · December 4, 2025, 1:41pm

Hi Kari. Thank you for looking into this issue. Were your colleagues able to recover the original version?

Best

Jonatan

KariWilliams · December 5, 2025, 8:40pm

Thank you for following up. We are in the process of revising these files to correct the character issue in our next data release. I don’t have a specific timeline for when the revised version of the files will be available – please email ipums@umn.edu directly to request early access if these revised files are time-sensitive for your research.

Topic		Replies	Views
Swedish Censues 1880-1910: Missing Parish Names INTERNATIONAL	5	768	October 8, 2021
How to change the strange encoding that appears when we ask for a csv document ? INTERNATIONAL	1	393	May 19, 2016
CPS json requested extract in stata format is delivered as .dat IPUMS API	3	104	June 13, 2024
not Stata format (tried with Stata 12.1) INTERNATIONAL	1	3262	January 5, 2016
The variable PIDSE in the Swedish censuses 1880-1910 INTERNATIONAL	2	180	November 16, 2023

Swedish characters in names are replaced by question marks. How do I render them properly?

Related topics