Bug 76394

Summary: Fonts are good on my local files, but fail on servers.
Product: [Retired] Red Hat Linux Reporter: Need Real Name <cult>
Component: XFree86Assignee: Mike A. Harris <mharris>
Status: CLOSED NOTABUG QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-10-21 12:23:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2002-10-21 09:42:00 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

Description of problem:
When I upload files to a server, special caracters like "i" become "C)" for example.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
Hello,

I can't stand french redhat fonts. Since I installed the latest redhat my
*french* fonts like "i" don't work properly. I use my redhat for work...

Examples:
1/ I edit with "gedit and gFTP" a file from an ovh server for a little HTML
modification, specials caracters ouput correctly in gedit "i", I saved this
file, I upload it to the server, and BOOM, all "i" caracters have been modified
in "C)", and yet they were looking fine in gedit!!!!
So I had to write "i" in "\&eacute;" 'cos even with emacs the spacials caracters
are well, but once they are in the server, when I take a look at the HTML
source, they are like that: C)

2/Using Mozilla, I can modify a filled form, and the "i" ouput like that "C)"
inside the <textarea>...</textarea>


I do not understand why fonts are like that, I had never had such fonts
troubles, I can't work as I would like, I can't edit my files and put them to
servers, I AM TRAPPED BY REDHAT FONTS.

I tried to resolv it changing the file /etc/sysconfig/i18n:
LANG="fr_FR.UTF-8"
SUPPORTED="en_US.UTF-8:en_US:en:fr_FR.UTF-8:fr_FR:fr"
SYSFONT="latarcyrheb-sun16"

by:

LANG="fr_FR@euro"
SUPPORTED="fr_FR@euro:fr_FR:fr"

But, with these new setenv, doest not work, and what's more fonts are ugly now,
check:
Before (nice fonts):
http://boxfly.free.fr/images/Capture2.png
After (ugly fonts):
http://boxfly.free.fr/images/Capture.png

So it does not work even with ugly fonts :-(

What should I do, I do need to resolv it for work...

Thanks,
Vincent.
Mail: cult

PS: The "i" caracter is an example, same trouble with for example the "`" caracter.

Comment 2 Mike A. Harris 2002-10-23 15:57:59 UTC
This is not an issue with "fonts" at all.  It is due to both of your
computer systems not using the same character set encoding.  When any
2 computers are communicating text data between each other, either
via web pages, IRC, email, text files, or some other form of
communication, the way that the text is stored in those files must
be done in a way that all computers agree upon.  If it is not, then
you will get an encoding mismatch, and will see oddball characters
showing up like what is shown in the screenshots linked to in the
email you provided a link to above.

Red Hat Linux 8.0 uses UTF-8 unicode encoding by default now, whereas
prior versions of Red Hat Linux use a localized 8bit encoding which
is dependant on what location you live in, and language you use.

Text communication comprised solely of english text with no other
foreign languages present will work in any 8 bit encoding because
all of the various 8bit encodings in popular usage today in Linux,
Windows, and other platforms all use ASCII as the common component
for 7bit text.  All of the 8 bit encodings differ only in text
in which the high bit of the 8 bit data is set, or in other words,
the characters in positions 128 through 255.  Every 8bit encoding
differs in this range, and so when computers exchange 8bit text
data between each other using any mechanism whatsoever, if the
text contains any 8bit text with the high bit set, the computer
will display and use this text in whatever encoding the user has
chosen to use on their local system.  If the user's encoding does
not match the encoding of the text, then the user will see random
or garbled characters instead of what was present on screen to
the person who composed the text.

In short, encodings MUST match on all systems or you see junk.  This
is not a bug, it is totally an end user configuration issue.

UTF-8, is just another 8bit encoding, which is also 7bit compatible
with ASCII text.  When ASCII text is used, it fits right into the
UTF-8 data stream just fine.  Anyone using *any* 8 bit encoding
which has ASCII as the lower 7 bits will be able to read/view/edit
the text flawlessly as long as they remain within the defines of
ASCII and do not insert any 8bit text into the document.

The problem you are seeing arises *only* when characters are present
in text, are *not* ASCII, but contain characters which are in some
foreign 8bit encoding.  Your computer is encoding and decoding UTF-8
by default (Red Hat Linux 8), which allows everyone to use the same
encoding (unicode) regardless of which language they are using, and
everyone will be able to read the text, and modify it as well, so
long as they are also using unicode.

What you are seeing, is that one of your systems is using UTF-8,
and the other one is not using UTF-8, but is using ISO8859-2 or
something else instead.  UTF-8 and ISO8859-* is simply not compatible
at all with each other except for the common ASCII component.

So, in summary, whatever language you use, if you are using
characters which are not straight 7bit ASCII, then all of the
computers you are using must be configured to use the exact same
8bit encoding.  You can do this by configuring all of the machines
to use UTF-8, or you can configure them all to use ISO8859-2 (or
whichever other 8bit encoding you use/prefer).  The best way to
accomplish this, is to configure all of the machines to use UTF-8
as that allows communication with any machine using UTF-8 regardless
of what language the people using the given machine use.

Unicode has been used in Windows, Macintosh and other environments
for 10 years now or more, as a solution to internationalization
issues related to multilingual text.  Unfortunately, Linux and UNIX
in general did not catch on to the unicode bandwagon 10 years ago,
and so Linux has been behind in many respects WRT internationalization.
Unicode solves these problems, but ONLY if everyone USES it.  If one
person uses it, and another one does not, or one machine uses it,
and another machine does not - then the two machines and/or two
people are not speaking the same language, and the result is what
you see in your bug report.

In summary: Unicode solves the problem of communication of text in
a universal format regardless of language.  In order for it to be
useful, everything sharing text must use unicode, the most popular
encoding of which is UTF-8.  Red Hat uses UTF-8 by default now, and
will continue to do so in the future, as it is _the_ standard for
this stuff.  Until all Linux systems in usage out there as a whole
are reconfigured to use UTF-8, and all distros are using UTF-8 by
default, there will be some level of unicode growing pains.  The
end result once the changeover is complete, is that we all get to
enjoy hassle free internationalized text regardless of what languages
we speak and/or use, and without standing on our heads.  Until
that point however, users having mixed computing environments
need to reconfigure their systems to both use a common encoding,
either by configuring Red Hat Linux to use an older legacy encoding,
or by configuring all other systems to use unicode.  There is no
way to autodetect and configure things like this automatically..
If there were.... unicode would not need to exist.

Closing bug as NOTABUG, since this is entirely a cross computer
encoding configuration issue.