It seems like the recomended way for an application to find out weather it is ok to output UTF-8 encoded chars to use the nl_langinfo(CODESET) call. If a user wants to communicate that her setup interprets UTF-8 data correctly (for example, xterm -u8 is used) the logical thing to do would be to set LANG to for example sv_SE.UTF-8. In theory that is. It turns out that glibc seems to discard the codeset information if the given locale is not encoded in that locale. By default a redhat-7 system returns ISO-8859-1 when nl_langinfo(CODESET) is called even if LANG is set to sv_SE.UTF-8. A workaround is to use create a private locale with the right 'codesetlocaledef -v -c -i sv_SE -f UTF-8 $HOME/locales/sv_SE.UTF-8' and then set LOCPATH for example like this: [noa@nestor locales]$ LOCPATH=$HOME/locales LANG=sv_SE.UTF-8 ~/c/cs_test codeset: UTF-8 Conclusion: the codeset part of LANG or LC_CTYPE/LC_ALL should reflect to what nl_langinfo(CODESET) returned even if the locale in question is not encoded in that codeset.
Actually it is not. nl_langinfo(CODESET) informs you in which charset are all other nl_langinfo values encoded. Ulrich Drepper said he'll actually change glibc today so that it does not accept locale setting like sv_SE.UTF-8 unless sv_SE locale is encoded in UTF-8 character set (the other way around this would be translating the locales on the fly, but that's expensive). What you should basically do is first check the $OUTPUT_CHARSET variable and if it is not given, try nl_langinfo(CODESET).
If the currently set locale uses the UTF-8 character encoding, then all standard input/output/error communication, all file names, and all data in plaintext files and pipes for which no other encoding is specified explicitely etc. should be in UTF-8. It is correct that the recomended way for an application to find out whether the current locale uses UTF-8 is to use a test like strcmp(nl_langinfo(CODESET), "UTF-8") == 0 For details, please read http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate To test on the command line, which encoding you have set, use locale charmap If you set LANG=sv_SE.UTF-8 on a system where no "sv_SE.UTF-8" locale is installed, then glibc will just silently stay in the "C" locale, unless the program checked the return value of setlocale() properly. This is what happened to you. For instance SuSE Linux 7.1 installs by default the precompiled UTF-8 locales de_DE.UTF-8 el_GR.UTF-8 en_GB.UTF-8 en_US.UTF-8 fa_IR.UTF-8 fr_FR.UTF-8 hi_IN.UTF-8 ja_JP.UTF-8 ko_KR.UTF-8 mr_IN.UTF-8 ru_RU.UTF-8 vi_VN.UTF-8 zh_CN.UTF-8 zh_TW.UTF-8 in /usr/share/locale/. Use one of these. If your favourite locale it not among these (or your distribution still lacks preinstalled UTF-8 locales), no problem: Any non-root user can easily use for instance localedef -v -c -i da_DK -f UTF-8 $HOME/local/locale/da_DK.UTF-8 export LOCPATH=$HOME/local/locale export LANG=da_DK.UTF-8 to generate and activate for instance a Danish UTF-8 locale. The root user can easily add a new UTF-8 locale for all users via localedef -v -c -i da_DK -f UTF-8 /usr/share/locale/da_DK.UTF-8 and if root wants to make da_DK.UTF-8 the system-wide default locale for every user, then adding the line export LANG=da_DK.UTF-8 into /etc/profile will do the trick. If you start xterm (XFree86 4.0.2 or newer) after setting LANG to a UTF-8 locale, it will go automatically into UTF-8 mode. If you have further questions on this matter, please consult the experts on the linux-utf8 mailing list: http://www.cl.cam.ac.uk/~mgk25/unicode.html#lists
The bug report speaks about behaviour when the requested locale is not present in the system, and at so far glibc does not fall back to "C" when it cannot find proper charset: #include <locale.h> #include <langinfo.h> int main(void) { char *l = setlocale(LC_ALL, "cs_CZ.UTF-8"); printf ("%s %s\n", l, nl_langinfo(CODESET)); } gives cs_CZ.UTF-8 ISO-8859-2 on glibc 2.2.2 and cs_CZ ISO-8859-2 on glibc 2.1.3. It has been changed in glibc a few minutes ago.