Bug 108994 - text mangled by man and less
text mangled by man and less
Status: CLOSED RAWHIDE
Product: Red Hat Linux
Classification: Retired
Component: less (Show other bugs)
9
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Eido Inoue
Ben Levenson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-11-03 23:03 EST by Joe R. Doupnik
Modified: 2007-04-18 12:59 EDT (History)
0 users

See Also:
Fixed In Version: 1.5m2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-02-19 11:49:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joe R. Doupnik 2003-11-03 23:03:13 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4)
Gecko/20030624 Netscape/7.1 (ax)

Description of problem:
Text processing bug:

RH 8 and 9, and WS v3 all share a common problem: the *roff
processing of man pages and "less" displays inserts a three
byte binary code to the user. That code occurs when parsing
quotation marks in the source. The code is most likely that
of *roff internal work being passed between processing stages,
but alas is not recognized and instead is passed to the user.
 One barely notices this on a console display, but it's still
wrong. One heavily notices this when telneting in with a good
terminal emulator.
 A handy test item I use is  man iptables. If we look at the
second paragraph of running text and do a debug display of a
terminal emulator we something like this:

       Each chain is a list of rules which can match a set of 
packets.   EachMJ
[[24;1H[[K:[[24;1H[[24;1H[[K       rule specifies what to do with a
packet that
matches.  This is called aMJ
       �@Xtarget�@Y, which may be a jump to a user-defined chain in
the  same  t


  Notice the word "target" above and the binary gibberish surrounding
it. The "@" item turns out to be hex 80, a non-printable control code.
And so on. 
 Doing a  more filename   with similar quoting can yield the same
difficulty.
 Here is the same material viewed thorough a binary editor (I did
man iptables >/tmp/x.x   and then viewed x.x through editor bvi):


000005B0  63 6B 65 74 73 2E 20 20 20 45 61 63 68 0A 20 20 ckets.   Each.
000005C0  20 20 20 20 20 72 75 6C 65 20 73 70 65 63 69 66      rule specif
000005D0  69 65 73 20 77 68 61 74 20 74 6F 20 64 6F 20 77 ies what to do w
000005E0  69 74 68 20 61 20 70 61 63 6B 65 74 20 74 68 61 ith a packet tha
000005F0  74 20 6D 61 74 63 68 65 73 2E 20 20 54 68 69 73 t matches.  This
00000600  20 69 73 20 63 61 6C 6C 65 64 20 61 0A 20 20 20  is called a.
00000610  20 20 20 20 E2 80 98 74 61 72 67 65 74 E2 80 99     ...target...

 ---- the line above has the details around word target ----

00000620  2C 20 77 68 69 63 68 20 6D 61 79 20 62 65 20 61 , which may be a
00000630  20 6A 75 6D 70 20 74 6F 20 61 20 75 73 65 72 2D  jump to a user-
00000640  64 65 66 69 6E 65 64 20 63 68 61 69 6E 20 69 6E defined chain in
00000650  20 74 68 65 20 20 73 61 6D 65 20 20 74 61 2D 0A  the  same  ta-.
00000660  20 20 20 20 20 20 20 62 6C 65 2E 0A 0A 0A 54 08        ble....T.
00000670  54 41 08 41 52 08 52 47 08 47 45 08 45 54 08 54 TA.AR.RG.GE.ET.T
00000680  53 08 53 0A 20 20 20 20 20 20 20 41 20 20 66 69 S.S.       A  fi
00000690  72 65 77 61 6C 6C 20 72 75 6C 65 20 73 70 65 63 rewall rule spec

Or, viewing    more x.x  on FreeBSD to see clearly what's happening:

       Each chain is a list of rules which can match a set of 
packets.   Each
       rule specifies what to do with a packet that matches.  This is
called a
       <E2><80><98>target<E2><80><99>, which may be a jump to a
user-defined cha
in in the  same  ta-
       ble.


 For what it's worth dept: SuSE does not have these problems, using
the same source material. Nor do *BSD systems. So it's a bug in the
RH way of constructing these utilities, deep within something
roff-like.
 Thanks,
 Joe Doupnik
 jrd@cc.usu.edu

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. man iptables
   and similar, as illustrated in the report above
2.
3.
    

Actual Results:  Please refer to the report above for explicit
information.

Expected Results:  Quote marks, rather than binary gibberish

Additional info:

Two test systems, uname -a on each:
Linux netlab4.usu.edu 2.4.20-20.9 #1 Mon Aug 18 11:28:34 EDT 2003 i586
i586 i386 GNU/Linux

Linux netlab6.usu.edu 2.4.21-4.0.1.EL #1 Thu Oct 23 01:42:27 EDT 2003
i686 athlon i386 GNU/Linux
Comment 1 Need Real Name 2003-11-10 12:38:01 EST
Fedora Core 1 default packages:
less is ignoring LESSCHARSET variable and display incorrect characters
for iso8859-2 and utf-8 encodeded text file (probably others encodigs
are missiterpreted too). Another locale setting has no effect on it.
If build less from less-378-11.1.src.rpm without patch#1, patch#2 and
patch#3 all is working OK.
Comment 2 Eido Inoue 2004-02-19 11:49:19 EST
This is fixed with the nroff/man UTF-8 combination in rawhide

Note You need to log in before you can comment on or make changes to this bug.