Bug 88868
Summary: | less using latin1 charset doesn't display binary files correctly | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | System Detection Staff <in-redbug> |
Component: | less | Assignee: | Karsten Hopp <karsten> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike McLean <mikem> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 9 | CC: | jshin, kasal, ml-bz-dale, yeti |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-03-29 09:09:03 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
System Detection Staff
2003-04-15 03:32:51 UTC
I wouldn't say some 8bit characters display as &, but all 8bit characters display as &. I have LC_CTYPE=cs_CZ, Latin2 fonts in terminal, LESSCHARSET=iso8859. Everything displays Latin2 characters correctly (except less). echo -e $(echo \\{2{4,5,6,7},3{0,1,2,3,4,5,6,7}}{0,1,2,3,4,5,6,7} | tr -d ' ') prints a list of Latin2 characters (probably not reasonably pasteable here), echo -e $(echo \\{2{4,5,6,7},3{0,1,2,3,4,5,6,7}}{0,1,2,3,4,5,6,7} | tr -d ' ') | less prints a list of 96 &. Unsetting LESSCHARSET has an interesting effect. It makes *some* characters to display properly -- it seems those present also in Latin1; Latin2 specific characters are still shown as &. I have the same problem for Polish locale pl_PL as well. All Polish diacritical marks except Ó and ó are displayed incorrectly. I'm running with LANG="en_US.UTF-8". I see this issue all the time with the man command (since less is the default pager) for characters that could translate as enclosing single quotes. An example is "man modprobe", for which with the default setting of LESSCHARSET=latin1 one sees: -k, --autoclean Set &&&autoclean&&& on loaded modules. and with LESSCHARSET undefined: -k, --autoclean Set âautocleanâ on loaded modules. The actual characters given to less for the relevant part of this (per "od -c") are: S e t 342 200 231 a u t o c l e a n 342 200 231 o n Bug is added by patches less-378+iso247-20030108.diff and/or less-378-multibyte.patch - if less-378 is compiled without those patches - everything looks OK. Re comment #3, if your locale is en_US.UTF-8(as opposed to en_US.ISO8859-1), you're not supposed to set LESSCHARSET to latin1. As your 'od -c' output showed, you're feeding a UTF-8 stream to less 'lying to less that you're feeding ISO-8859-1' by settig LESSCHARSET to latin1. If you need to view files genuinely in ISO-8859-1, you have to filter it through 'iconv' first (i.e. 'iconv -f ISO-8859-1 -t UTF-8 | less') This is not to say that less (in RH 9.0) is not 'buggy'.It is confusing to end-users because its behavior is controlled by multiple factors, LESSCHARSEt, the current locale setting and who knows what. IMO, it should IGNORE LESSCHARSET/JLESSCHARSET on POSIX-compliant systems(Linux witgh glibc 2.2 or later is one of them) where nl_langinfo(CODESET) is supported. Moreover, the last time I checked its way of determining the current codeset was buggy. The priority should be given in the following order, but it gave a higher priority to LANG IIRC. 1. nl_langinfo(CODESET) 2. LC_ALL 3. LC_CTYPE 4. LANG > but latin1 seems pretty standard This is rather a Western-Euro-centric point of view :-) What less needs is 'raw or 'binary' view option. Re <a href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=88868#c4">comment #4</a>, the bug is produced by less-378-multibyte.patch . When I recompiled less without it, the problem disappeared. When I recompiled less from rawhide, with all the patches, the character `&' has changed to `6'--I gather it's a value from random spot of the memory. I've updated to less 381 and dropped the multibyte patch, so this report can be closed |