Bug 34176
Summary: | LANG/LC_* can not be used to indicate UTF-8 display-support to apps | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Daniel Resare <noa-bugzilla-redhat> |
Component: | glibc | Assignee: | Jakub Jelinek <jakub> |
Status: | CLOSED NOTABUG | QA Contact: | Aaron Brown <abrown> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.0 | CC: | fweimer |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2001-03-30 22:34:19 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Daniel Resare
2001-03-30 22:34:01 UTC
Actually it is not. nl_langinfo(CODESET) informs you in which charset are all other nl_langinfo values encoded. Ulrich Drepper said he'll actually change glibc today so that it does not accept locale setting like sv_SE.UTF-8 unless sv_SE locale is encoded in UTF-8 character set (the other way around this would be translating the locales on the fly, but that's expensive). What you should basically do is first check the $OUTPUT_CHARSET variable and if it is not given, try nl_langinfo(CODESET). If the currently set locale uses the UTF-8 character encoding, then all standard input/output/error communication, all file names, and all data in plaintext files and pipes for which no other encoding is specified explicitely etc. should be in UTF-8. It is correct that the recomended way for an application to find out whether the current locale uses UTF-8 is to use a test like strcmp(nl_langinfo(CODESET), "UTF-8") == 0 For details, please read http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate To test on the command line, which encoding you have set, use locale charmap If you set LANG=sv_SE.UTF-8 on a system where no "sv_SE.UTF-8" locale is installed, then glibc will just silently stay in the "C" locale, unless the program checked the return value of setlocale() properly. This is what happened to you. For instance SuSE Linux 7.1 installs by default the precompiled UTF-8 locales de_DE.UTF-8 el_GR.UTF-8 en_GB.UTF-8 en_US.UTF-8 fa_IR.UTF-8 fr_FR.UTF-8 hi_IN.UTF-8 ja_JP.UTF-8 ko_KR.UTF-8 mr_IN.UTF-8 ru_RU.UTF-8 vi_VN.UTF-8 zh_CN.UTF-8 zh_TW.UTF-8 in /usr/share/locale/. Use one of these. If your favourite locale it not among these (or your distribution still lacks preinstalled UTF-8 locales), no problem: Any non-root user can easily use for instance localedef -v -c -i da_DK -f UTF-8 $HOME/local/locale/da_DK.UTF-8 export LOCPATH=$HOME/local/locale export LANG=da_DK.UTF-8 to generate and activate for instance a Danish UTF-8 locale. The root user can easily add a new UTF-8 locale for all users via localedef -v -c -i da_DK -f UTF-8 /usr/share/locale/da_DK.UTF-8 and if root wants to make da_DK.UTF-8 the system-wide default locale for every user, then adding the line export LANG=da_DK.UTF-8 into /etc/profile will do the trick. If you start xterm (XFree86 4.0.2 or newer) after setting LANG to a UTF-8 locale, it will go automatically into UTF-8 mode. If you have further questions on this matter, please consult the experts on the linux-utf8 mailing list: http://www.cl.cam.ac.uk/~mgk25/unicode.html#lists The bug report speaks about behaviour when the requested locale is not present in the system, and at so far glibc does not fall back to "C" when it cannot find proper charset: #include <locale.h> #include <langinfo.h> int main(void) { char *l = setlocale(LC_ALL, "cs_CZ.UTF-8"); printf ("%s %s\n", l, nl_langinfo(CODESET)); } gives cs_CZ.UTF-8 ISO-8859-2 on glibc 2.2.2 and cs_CZ ISO-8859-2 on glibc 2.1.3. It has been changed in glibc a few minutes ago. |