Bug 175459 - nroff outputs warning about character encoding on stdout
nroff outputs warning about character encoding on stdout
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: groff (Show other bugs)
8
All Linux
medium Severity low
: ---
: ---
Assigned To: Marcela Mašláňová
David Lawrence
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-10 20:26 EST by JW
Modified: 2011-08-01 22:49 EDT (History)
5 users (show)

See Also:
Fixed In Version: perl-5.8.8-33.fc8.x86_64
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-03-26 09:49:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description JW 2005-12-10 20:26:58 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; MSIE 6.0; Windows; U; AIIEEEE!; Win98; Windows 98; en-US; Gecko masquerading as IE; should it matter?; rv:1.8b) Gecko/20050217

Description of problem:
When running nroff (usually as a result of running man) a warning message is emitted on standard output (which is wrong place for warning messages anyhow).
It serves absolutely no purpose, because it isn't as though people are going to start running around converting existing nroff source files into UTF-8.
Besides, we have computers that can do that sort of thing automatically (apparently).  So if a script can satisfactorily convert from one character set to another then why do we need a warning message to tell us that it is doing what it is supposed to do?

And as the /usr/bin/nroff script says:

>   # This shell script is intended for use with man, so warnings are
>   # probably not wanted.

But in any case the nroff script makes a lot of wrong assumptions about the input language.


Version-Release number of selected component (if applicable):
groff-1.18.1.1-5

How reproducible:
Always

Steps to Reproduce:
1.man perlcn
2.
3.
  

Actual Results:  XXX
XXX WARNING: old character encoding and/or character set
XXX
..... <man page data> ...



Expected Results:  No "XXX" warnings.


Additional info:

Perhaps all warning lines should begin with "YYY"? 
I don't know what is special about the "XXX" sequence anyhow.
It is part of any known standard that applies when (erroneously!) writing error messages on standard output?
Comment 1 Jindrich Novy 2005-12-12 04:24:23 EST
JW, the characters "XXX" are used only to highlight the warning message and have
no special meaning. The warning message is emitted just to ensure that all the
man pages are correctly converted to UTF-8 because all the man pages should use
universal encoding at the time. Since there's no precise way how to detect a
particular encoding of a man page, some assumptions about encoding are made, but
they're not needed when all the man pages are in UTF-8.

If you see such a warning, please check to which package the man page belongs
(rpm -qf /path/to/man/page) and report it as a new bug against this package. You
can Cc me on the bug so that I can react.

Jindrich
Comment 2 JW 2005-12-12 05:31:45 EST
Sorry to be critical, but surely it would be easier to simply install all rpms
(including extras) somewhere and do:

   find $MANPATH -type f | while read f
   do
       zcat $f | iconv -f utf-8 -t utf-8 -o /dev/null || echo $f
   done

Let's fact it, nobody is going to be able to file a bug report against something
you don't distribute!

Why wait 10 year for all the bug reports for each man page to filter in before
you fix them?  That's assuming people bother to file one!

But my initial criticism is still valid - the nroff script should be able to do
the conversion automatically. After all, that is what computers are for - to do
work for us.

Even more importantly, such debugging really belongs on some debugging platform,
not in released software.

Mind you when you consider what a ludicrous situtation it is when you can cut
and paste portions of documents and lose character set information in the
process, there is always going to be a certain amount of flakiness where it
comes to character sets.

Of course in true RedHat style this gets closed as notabug.
But it is a bug.  Somebody has left debugging code in released software.

Comment 3 Jindrich Novy 2005-12-12 08:22:33 EST
> But my initial criticism is still valid - the nroff script should be able to do
> the conversion automatically. After all, that is what computers are for - to do
> work for us.

I agree, but sorry, if you want this feature, please send a patch which I can
review and then apply. These statements won't solve anything. The right place to
complain is the groff's upstream.
Comment 4 JW 2005-12-13 01:33:54 EST
Also, I notice that man pages most certainly are not in utf8.
In fact 'man' uses some heuristics to map NROFF_OLD_CHARSET onto the most likely
input format (eg. /usr/share/man1/en/man1/xxx is assumed to be ISO-8859-1).

But nroff partly negates this by trying utf-8 to utf-8 conversion and if that
works it forgets supplied arg to --legacy and assuming it is really utf-8.
Doesn't seem right to me because an ISO-8859-1 character just might appear to be
a valid utf-8 character, which would cause wrong type of conversion.

Comment 5 Marcela Mašláňová 2006-05-29 08:43:54 EDT
Manual page for perlcn isn't in UTF-8.
Comment 6 Robin Norwood 2006-10-01 19:31:31 EDT
assigning to rnorwood@redhat.com
Comment 7 Jan Pazdziora 2008-03-04 03:38:41 EST
I just run man perlcn on my Fedora 8 and no XXX warnings were shown. Not that
the content of that page would make much sense but if the topic of this bugzilla
are the XXX warnings, it seems fixed in CURRENTRELEASE.

$ rpm -q perl groff
perl-5.8.8-33.fc8.x86_64
groff-1.18.1.4-11.fc8.x86_64
Comment 8 JW 2008-03-12 22:19:13 EDT
(In reply to comment #7)
> I just run man perlcn on my Fedora 8 and no XXX warnings were shown. Not that
> the content of that page would make much sense but if the topic of this bugzilla
> are the XXX warnings, it seems fixed in CURRENTRELEASE.

It is not fixed in FC8. Just "seeming" to be fixed doesn't really mean that the
might be fixed. One would hope for a little more effort than having a quick
think and some seeming.

You should simply read /usr/bin/nroff to confirm that (look for 
"XXX WARNING:")

Try, for example, "man -c pdftohtml" on a system where non-UTF8.
Comment 9 Marcela Mašláňová 2008-03-13 04:12:54 EDT
We converted everything in Fedora into utf-8 some time ago. So looking at only
one problematic page is sufficient.
We appreciate your outstanding help and I hope, if you find any other wrong
page, you'll let us know.
Comment 10 JW 2008-03-13 04:29:57 EDT
(In reply to comment #9)
> We converted everything in Fedora into utf-8 some time ago.

But wait a moment ... what is /etc/sysconfig/i18n for?
Or do you mean that you converted everything to support any locale so long as it
is utf-8?

But the problem isn't with with your conversion ... it is with a silly piece of
code in nroff script which generates an "XXX WARNING:..." which doesn't even go
to stderr.

If you must insist on a warning then why don't you add '1>&2' after the echo?

But I have to say that the nroff conversion is terrible code. For a start it
assumes that all input is utf-8.  Why should it be? If the system is set for
non-utf8 and if the system is designed right then the correct input language
would be specified in each file (or fileystem) wouldn't it?
Comment 11 Jan Pazdziora 2008-03-13 05:04:16 EDT
(In reply to comment #10)
> But wait a moment ... what is /etc/sysconfig/i18n for?

Specify the default environment variables. It does not specify what character
set your files use.

> Or do you mean that you converted everything to support any locale so long as it
> is utf-8?

Locale is a run-time thing. File encoding (character set) is a persistent thing.
You can have two programs running, processing the same file, with different locales.

RHEL and Fedora assumes UTF-8 in files for quite some time. The main reason is
that with regular text files, you have no way of specifying the encoding /
character set for files.

> But the problem isn't with with your conversion ... it is with a silly piece of
> code in nroff script which generates an "XXX WARNING:..." which doesn't even go
> to stderr.

Alright. So is the problem with nroff? Let's update the component to groff
because this bugzilla is clearly filed under wrong one (perl), and I believe
that this is part of the misunderstanding.

> But I have to say that the nroff conversion is terrible code. For a start it
> assumes that all input is utf-8.  Why should it be? If the system is set for
> non-utf8 and if the system is designed right then the correct input language
> would be specified in each file (or fileystem) wouldn't it?

Out of curiosity -- how do you specify input language in a text file?
Comment 12 JW 2008-03-13 05:16:55 EDT
(In reply to comment #11)
> Alright. So is the problem with nroff? Let's update the component to groff
> because this bugzilla is clearly filed under wrong one (perl), and I believe
> that this is part of the misunderstanding.

This bug is not filed under perl.
See the original post:
>> Version-Release number of selected component (if applicable):
>> groff-1.18.1.1-5

>
> Out of curiosity -- how do you specify input language in a text file?
> 
You mean the input character set I suppose?
Not by testing iconv with utf-8 to see whether that works or not!
That is what nroff code does.

Here is a snippet:
>>   iconv -f utf-8 -t utf-8 $TMPFILE &>/dev/null && charset_in=utf-8
Comment 13 Jan Pazdziora 2008-03-13 05:54:23 EDT
(In reply to comment #12)
> (In reply to comment #11)
> > Alright. So is the problem with nroff? Let's update the component to groff
> > because this bugzilla is clearly filed under wrong one (perl), and I believe
> > that this is part of the misunderstanding.
> 
> This bug is not filed under perl.
> See the original post:
> >> Version-Release number of selected component (if applicable):
> >> groff-1.18.1.1-5

It was filed under groff, it was later changed to perl. I'd like to see it
changed back to groff.

> > Out of curiosity -- how do you specify input language in a text file?
>
> You mean the input character set I suppose?

You said "input language". I guess you know better than me what you meant.

> Not by testing iconv with utf-8 to see whether that works or not!

So, you told us how not to do it. Now please tell us how to do it.
Comment 14 Marcela Mašláňová 2008-03-26 09:49:08 EDT
Warning goes on stderr.

For "input language" issue send a patch. The best way will be send it to upstream.
Comment 15 j.clevii 2009-05-06 21:25:58 EDT
(In reply to comment #7)
> I just run man perlcn on my Fedora 8 and no XXX warnings were shown. Not that
> the content of that page would make much sense but if the topic of this bugzilla
> are the XXX warnings, it seems fixed in CURRENTRELEASE.
> 
> $ rpm -q perl groff
> perl-5.8.8-33.fc8.x86_64
> groff-1.18.1.4-11.fc8.x86_64
>   

I am VERY new at Fedora/Linux, (first semester in school) SO that being said I use the man pages frequently. Well I got that "XXX" warning today in terminal, I ran the "man perlcn" command and everything is back to normal. (I am currently using Fedora Core 4 for my class.) THANK YOU VERY MUCH it was a huge help!
Comment 16 Roman 2009-07-27 03:55:54 EDT
Just an idea how to make this error appear: change umask to some crazy value.


[vr@squirrel ~]$ umask
0002
[vr@squirrel ~]$ man bash <--- works OK
[vr@squirrel ~]$ umask 0644
[vr@squirrel ~]$ umask
0644
[vr@squirrel ~]$ man bash
XXX
XXX WARNING: old character encoding and/or character set
XXX

hehe )))

cat /proc/6549/cmdline
sh-c(cd /usr/share/man && (echo ".ll 14.2i"; echo ".nr LL 14.2i"; echo ".pl 1100i"; /usr/bin/gunzip -c '/usr/share/man/man1/bash.1.gz'; echo ".\\\""; echo ".pl \n(nlu+10") | /usr/bin/gtbl | /usr/bin/nroff -c --legacy ISO-8859-1 -mandoc 2>/dev/null | /usr/bin/less -is)
Comment 17 Jan Pazdziora 2009-07-27 05:00:59 EDT
(In reply to comment #16)
> Just an idea how to make this error appear: change umask to some crazy value.
> 
> 
> [vr@squirrel ~]$ umask
> 0002
> [vr@squirrel ~]$ man bash <--- works OK
> [vr@squirrel ~]$ umask 0644
> [vr@squirrel ~]$ umask
> 0644
> [vr@squirrel ~]$ man bash
> XXX
> XXX WARNING: old character encoding and/or character set
> XXX
> 
> hehe )))

On F11, the output is empty. So no XXX warnings on latest Fedora version.
Comment 18 Brett Ryan 2011-08-01 22:49:24 EDT
(In reply to comment #1)
> JW, the characters "XXX" are used only to highlight the warning message and have
> no special meaning. The warning message is emitted just to ensure that all the
> man pages are correctly converted to UTF-8 because all the man pages should use
> universal encoding at the time. Since there's no precise way how to detect a
> particular encoding of a man page, some assumptions about encoding are made, but
> they're not needed when all the man pages are in UTF-8.

Why should an end user be informed of this? Shouldn't this be something package maintainers need to worry about? This warning is missleading and has no benefit to any end user unless you are a package maintainer.

> If you see such a warning, please check to which package the man page belongs
> (rpm -qf /path/to/man/page) and report it as a new bug against this package. You
> can Cc me on the bug so that I can react.

Again, why would an end user be responsible for reporting these fixes? As an RHEL customer I would expect that an upgrade would not result in half our man pages to suddenly contain this warning at the start.

Note You need to log in before you can comment on or make changes to this bug.