492999 – UTF-8 characters breaks windows and display of text

Bug 492999 - UTF-8 characters breaks windows and display of text

Summary: UTF-8 characters breaks windows and display of text

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	anaconda
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Anaconda Maintenance Team
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	466644 494087 498760 (view as bug list)
Depends On:
Blocks:	F10Target F11Target
TreeView+	depends on / blocked

Reported:	2009-03-31 08:15 UTC by Jan ONDREJ
Modified:	2013-01-10 05:07 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-05-21 06:37:06 UTC
Type:	---
Embargoed:
Dependent Products:
Flags:	jkeating: fedora_requires_release_note?

Attachments	(Terms of Use)
network configuration screenshot (5.24 KB, image/png) 2009-03-31 08:15 UTC, Jan ONDREJ	no flags	Details
Network manager (4.12 KB, image/png) 2009-03-31 08:16 UTC, Jan ONDREJ	no flags	Details
Install image retrieve (4.07 KB, image/png) 2009-03-31 08:18 UTC, Jan ONDREJ	no flags	Details
First screenshot for wrong "ť" character. (5.34 KB, image/png) 2009-05-19 16:11 UTC, Jan ONDREJ	no flags	Details
Second screenshot of wrong "ť" character. (4.83 KB, image/png) 2009-05-19 16:23 UTC, Jan ONDREJ	no flags	Details
View All

Description Jan ONDREJ 2009-03-31 08:15:51 UTC

Created attachment 337276 [details]
network configuration screenshot

Description of problem:
After selecting Slovak language when installing F11 (also for F10 and may be some older) text windows are broken and text are not displayed properly.

Version-Release number of selected component (if applicable):
Fedora 11-Beta

How reproducible:
Always

Steps to Reproduce:
1. boot F11-Beta or devel image
2. select slovak language
3. go to next screens
  
Actual results:
Windows are broken.
Some text are not displayed fully, for example for "Získava sa" is displayed only "Z", everything after first non ASCII character is gone.
Sometimes unwanted strings are displayed instead of some characters, for example "<8C>" instead of "Č".
Screenshots attached.

Expected results:
Normal windows, full texts.

Additional info:
I think something is wrong when trying to check UTF-8 string lenght. Only number of bytes is returned, instead of number of characters. May be it's a problem of ncurses or another library, which is displaying these texts.

After going to graphical install text are ok, but this part of text installation can't be skipped.

Comment 1 Jan ONDREJ 2009-03-31 08:16:47 UTC

Created attachment 337277 [details]
Network manager

Comment 2 Jan ONDREJ 2009-03-31 08:18:35 UTC

Created attachment 337278 [details]
Install image retrieve

Instead of "Z" should be "Získava sa".
Broken window.

Comment 4 Adam Pribyl 2009-04-20 08:57:40 UTC

Confirming. In preupgrade cs_CZ anaconda install the accented letters are completely broken, leaving the screen often unreadable.

Comment 5 Chris Lumens 2009-05-04 14:11:50 UTC

*** Bug 498760 has been marked as a duplicate of this bug. ***

Comment 6 Chris Lumens 2009-05-04 14:26:53 UTC

*** Bug 494087 has been marked as a duplicate of this bug. ***

Comment 7 Jan ONDREJ 2009-05-06 10:23:08 UTC

Trying to simulate this on second install console (F2).

I see, that there is LANG=C set for this console and when trying to display UTF8 characters in this console, there are very similar (or may be same) problems.

I think, this happens because whiptail is used in "C" locale. Can somebody switch locale in boot disk to an UTF-8 charset, for example en_US.UTF-8 ?

"text" mode installation works well after these initial problems. Looks like this happens only at boot time, before stage2.img was downloaded. These text-mode install parts can't be skipped when using network installation.

Comment 8 Jan ONDREJ 2009-05-09 06:19:25 UTC

Setting blocker to F11AnacondaBlocker because I think, this bug has been lost.
If you think, locale cannot be changed for Fedora 11, please change blocker to F12.

Comment 9 Chris Lumens 2009-05-13 15:28:30 UTC

Regarding comment #7, the $LANG is only set to C for the shell on tty2.  On tty1 where loader is running, all locales are whatever.UTF-8.  The real problem here is that we don't have the proper locale information in the initrd to know how to handle these characters.  I believe the information we need is /usr/lib/locale/locale-archive which is about 80 MB, which grows the initrd by a huge amount.  This is fallout from not having the wlite code in anaconda anymore, which was essentially a complete duplication of code found elsewhere.

Comment 10 Ville-Pekka Vainio 2009-05-13 21:23:45 UTC

*** Bug 466644 has been marked as a duplicate of this bug. ***

Comment 11 Ville-Pekka Vainio 2009-05-13 21:25:22 UTC

I marked Bug 466644 as a duplicate of this bug since this is in the Anaconda F11 blocker and the issue was the same.

Comment 12 Jan ONDREJ 2009-05-14 06:14:51 UTC

(In reply to comment #9)
> here is that we don't have the proper locale information in the initrd to know
> how to handle these characters.  I believe the information we need is
> /usr/lib/locale/locale-archive which is about 80 MB, which grows the initrd by
> a huge amount.  This is fallout from not having the wlite code in anaconda
> anymore, which was essentially a complete duplication of code found elsewhere.  

I think it's not a problem. I have my own updates of installer with full or stripped locale-archive in initrd.img, but there are still problems.
Minimal locale-archive has been built as described on:
  http://rwmj.wordpress.com/2009/03/20/why-minimal-is-225-mb/

Also all tested combination of:

        setenv("LANG", "en_US.UTF-8", 1);
        newtInit();
        SLutf8_enable(1);
        SLsmg_utf8_enable(1);
        SLtt_utf8_enable(1);
        SLinterp_utf8_enable(1);

failed with similar problems.

If nobody can fix this problem, may be translators should be noticed, that translations in first stage can't contain non ascii characters.

Any other ideas?

Comment 13 Ville-Pekka Vainio 2009-05-14 08:39:21 UTC

(In reply to comment #12)
> If nobody can fix this problem, may be translators should be noticed, that
> translations in first stage can't contain non ascii characters.

I've already written about this issue to the translator list on January, http://www.redhat.com/archives/fedora-trans-list/2009-January/msg00159.html however I didn't get any comments on the list.

The translations can actually contain also other than just ASCII characters, for example ä and ö used in Finnish work well. As I've described in https://bugzilla.redhat.com/show_bug.cgi?id=466644#c9 I think whether a character works or not depends on the size of the character in bytes. 1- or 2-byte characters work, 3-byte characters don't.

Comment 14 Jan ONDREJ 2009-05-14 08:48:26 UTC

Partial translations with only some characters looks poorly at least for Slovak language. It's better to have only standard characters or fully translated string, otherwise it looks like mistakes.

Comment 15 Jan ONDREJ 2009-05-14 09:20:00 UTC

When suggesting conversion to ASCII, it's problematic, because strings like "_Back" are used in both stages and needs different translations.

May be it will be better to disable translations for first stage of installer and start to use translated strings when graphics installation starts, or after stage2 is loaded.

It can be better to read english texts like "Z", what is an broken "Retrieving" in slovak translation.

Comment 16 Jesse Keating 2009-05-18 19:05:25 UTC

Work is ongoing here, but fixes are not friendly (80meg increase to loader).  Still trying a better solution, and will take one if it comes up, but won't block release for this issue.

Comment 17 James Laska 2009-05-18 19:07:09 UTC

Proposed fixes for this issue (https://www.redhat.com/archives/anaconda-devel-list/2009-May/msg00234.html)

Comment 18 Jan ONDREJ 2009-05-18 19:26:14 UTC

Why you do not use minimal locale-archive, as described here:
    http://rwmj.wordpress.com/2009/03/20/why-minimal-is-225-mb/
?

May be this script need some updates to automate this build.

Comment 19 Bill Nottingham 2009-05-19 03:07:28 UTC

Patches pushed to git, will be in anaconda-11.5.0.54-1.

Comment 20 Jan ONDREJ 2009-05-19 06:05:12 UTC

(In reply to comment #19)
> Patches pushed to git, will be in anaconda-11.5.0.54-1.  

Can I test them?

How I can use scripts/mk-images? It requires 7-8 parameters, but they are not described. Can you help me to build my own initrd.img?

Comment 21 Jan ONDREJ 2009-05-19 08:27:12 UTC

I can confirm, that this works well for Slovak language.
There are some characters, like "ť", which are displayed with different color, but it's well readable.

I still can't build my own initrd image, current initrd.img with updated "sbin/loader" and "locale-archive" works.

Comment 22 Bill Nottingham 2009-05-19 15:34:12 UTC

Rebuild anaconda, use pungi to make new trees/isos is the 'easiest' way to test.

When you say characters in a different color, is that dependent on booting with or without 'nomodeset'?

Comment 23 Lubomir Rintel 2009-05-19 15:42:03 UTC

(In reply to comment #22)
> Rebuild anaconda, use pungi to make new trees/isos is the 'easiest' way to
> test.

I think there's somethinkg like make iso from the anaconda checkout; may serve well for this as well.

> When you say characters in a different color, is that dependent on booting with
> or without 'nomodeset'?  

I highly doubt so. The different color is "bold", which the display library uses when two characters overwrite the same character cell; as teletypes did. The library probably confuses if with wide (>8bit) characters which are also rendered in once cell, despite being more than one byte long.

Comment 24 Jan ONDREJ 2009-05-19 16:11:09 UTC

Created attachment 344646 [details]
First screenshot for wrong "ť" character.

First screenshot for wrong "ť" character.

Comment 25 Bill Nottingham 2009-05-19 16:22:33 UTC

Looks fine for me on a 'normal' VT. You may want to check whatever you're using to display it (kvm?) and what font it's using.

Comment 26 Jan ONDREJ 2009-05-19 16:23:20 UTC

Created attachment 344650 [details]
Second screenshot of wrong "ť" character.

Second attachment. There are 2 ť characters in this image. You can see, that for different backgrounds these chars have different color (interesant).
It's curious, that I can't see any problems with other characters, only with "ť".

Btw. this char has wrong shape. For slovak chars "t" should have "'" and not this opposite "^", but this is often used too.

These problems are not new. I think they was in some older Red Hat 7 and may be older too. Also I don't know, if they need a fix.

Please ignore network error, this was caused probably by my mistake when updated initrd.img.

Comment 27 Jan ONDREJ 2009-05-19 16:50:10 UTC

(In reply to comment #25)
> Looks fine for me on a 'normal' VT. You may want to check whatever you're using
> to display it (kvm?) and what font it's using.  

Colors are fine for me too on my real PC (nvidia card).
Possible this is a problem of KVM, which was used to create these screenshots.

Shapes not, but this can problem be ignored. It's a problem of font.

I think this bug has been fixed well.

If you need another test, please give me a link to a new initrd.img (I am booting from PXE network, so I don't need ISOs).

Comment 28 Jan ONDREJ 2009-05-21 05:21:50 UTC

Todays devel images works well.

Note You need to log in before you can comment on or make changes to this bug.