From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 Description of problem: When less is configured to use the latin1 characterset via the LESSCHARSET environmental variable (or -Klatin1), several high-bit characters are displayed as "&" instead of something more useful. I accept the need for foreign language support, but latin1 seems pretty standard and regardless it should display *something* different for different characters instead of substituting some constant character. While unsetting LESSCHARSET appears like a potential workaround, this prevents the displaying of files containing the european character set. Substitution of iso8859 is, as you would hope, exactly equivelent. This was tried both running in an xterm and running on the console. Note that less 381 (from GNU) and 358+iso247 (RH 7.1) both do the right thing. Version-Release number of selected component (if applicable): less-378-7 How reproducible: Always Steps to Reproduce: 1.echo -e 'A\201\nA\253\nA&' | /usr/bin/less -Klatin1 Actual Results: The second and third lines appear the same "A&" Expected Results: All three lines should appear differently, either like: ---- A<81> A<AB> A& ---- or ---- A<81> A� A& ---- Additional info:
I wouldn't say some 8bit characters display as &, but all 8bit characters display as &. I have LC_CTYPE=cs_CZ, Latin2 fonts in terminal, LESSCHARSET=iso8859. Everything displays Latin2 characters correctly (except less). echo -e $(echo \\{2{4,5,6,7},3{0,1,2,3,4,5,6,7}}{0,1,2,3,4,5,6,7} | tr -d ' ') prints a list of Latin2 characters (probably not reasonably pasteable here), echo -e $(echo \\{2{4,5,6,7},3{0,1,2,3,4,5,6,7}}{0,1,2,3,4,5,6,7} | tr -d ' ') | less prints a list of 96 &. Unsetting LESSCHARSET has an interesting effect. It makes *some* characters to display properly -- it seems those present also in Latin1; Latin2 specific characters are still shown as &.
I have the same problem for Polish locale pl_PL as well. All Polish diacritical marks except Ó and ó are displayed incorrectly.
I'm running with LANG="en_US.UTF-8". I see this issue all the time with the man command (since less is the default pager) for characters that could translate as enclosing single quotes. An example is "man modprobe", for which with the default setting of LESSCHARSET=latin1 one sees: -k, --autoclean Set &&&autoclean&&& on loaded modules. and with LESSCHARSET undefined: -k, --autoclean Set âautocleanâ on loaded modules. The actual characters given to less for the relevant part of this (per "od -c") are: S e t 342 200 231 a u t o c l e a n 342 200 231 o n
Bug is added by patches less-378+iso247-20030108.diff and/or less-378-multibyte.patch - if less-378 is compiled without those patches - everything looks OK.
Re comment #3, if your locale is en_US.UTF-8(as opposed to en_US.ISO8859-1), you're not supposed to set LESSCHARSET to latin1. As your 'od -c' output showed, you're feeding a UTF-8 stream to less 'lying to less that you're feeding ISO-8859-1' by settig LESSCHARSET to latin1. If you need to view files genuinely in ISO-8859-1, you have to filter it through 'iconv' first (i.e. 'iconv -f ISO-8859-1 -t UTF-8 | less') This is not to say that less (in RH 9.0) is not 'buggy'.It is confusing to end-users because its behavior is controlled by multiple factors, LESSCHARSEt, the current locale setting and who knows what. IMO, it should IGNORE LESSCHARSET/JLESSCHARSET on POSIX-compliant systems(Linux witgh glibc 2.2 or later is one of them) where nl_langinfo(CODESET) is supported. Moreover, the last time I checked its way of determining the current codeset was buggy. The priority should be given in the following order, but it gave a higher priority to LANG IIRC. 1. nl_langinfo(CODESET) 2. LC_ALL 3. LC_CTYPE 4. LANG > but latin1 seems pretty standard This is rather a Western-Euro-centric point of view :-) What less needs is 'raw or 'binary' view option.
Re <a href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=88868#c4">comment #4</a>, the bug is produced by less-378-multibyte.patch . When I recompiled less without it, the problem disappeared. When I recompiled less from rawhide, with all the patches, the character `&' has changed to `6'--I gather it's a value from random spot of the memory.
I've updated to less 381 and dropped the multibyte patch, so this report can be closed