Bug 485072 - gencat(1p) does not document the "$ codeset" option
gencat(1p) does not document the "$ codeset" option
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: man-pages (Show other bugs)
4.7
All Linux
low Severity low
: rc
: ---
Assigned To: Ivana Varekova
BaseOS QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-11 09:15 EST by Michael Solberg
Modified: 2009-02-12 08:01 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-02-12 04:36:43 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Michael Solberg 2009-02-11 09:15:32 EST
Description of problem:

When converting ISO-8859-1 message files into NLS catalogs, you'll get "invalid character" errors unless you specify the "$ codeset=ISO-8859-1" option in your message file.  UTF-8 and ASCII work fine, so I suppose this could be a bug in iconv.

It'd be nice if we included the "$ codeset=" option in the man page for gencat so that programmers would know about the option.  It doesn't appear to be part of the POSIX specification.
Comment 1 Jakub Jelinek 2009-02-11 09:21:55 EST
ISO-8859-1 isn't valid UTF-8, unless you use only the ASCII subset thereof, and
as UTF-8 is the default encoding, no wonder you need to tell gencat that you are using a different encoding.

gencat manpage isn't part of glibc though (and it is just the 1p manpage, so it really should describe just what POSIX says and nothing else).
Comment 2 Ivana Varekova 2009-02-12 04:36:43 EST
The posix man pages describes only the POSIX norm - so this kind of changes are not relevant for them (there is only one change  - which was done in 2.67 - it is not in RHEL4 - which added to all POSIX man pages the note it only describes the POSIX description - it is not in older version for now). 
I'm closing this bug.
Comment 3 Michael Solberg 2009-02-12 07:50:32 EST
Would it be possible to get this documented upsteam in the glibc manual?  Maybe a knowledgebase article?  I understand not wanting to mess with the POSIX pages, but the only way to find out about the option at this point is to read the source for gencat.
Comment 4 Michael Solberg 2009-02-12 08:01:58 EST
(In reply to comment #1)
> ISO-8859-1 isn't valid UTF-8, unless you use only the ASCII subset thereof, and
> as UTF-8 is the default encoding, no wonder you need to tell gencat that you
> are using a different encoding.

I should have been a little more specific.  The files were valid ISO-8859-1.  When running gencat on them, you get the errors.  I converted one to UTF-8 with iconv (-f ISO-8859-1 -t UTF-8) and it worked fine and the ones I had in ASCII worked fine.  It was only files in ISO-8859-1 format that I had a problem with:

#Without "$ codeset=ISO-8859-1", I get invalid characters
[l3p5xms@cpliis13 src]$ file set_real_id.msg.es_AR
set_real_id.msg.es_AR: ISO-8859 English text
[l3p5xms@cpliis13 src]$ file set_real_id.msg.es_AR
set_real_id.msg.es_AR: ISO-8859 English text
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR
set_real_id.msg.es_AR:25: invalid character: message ignored
set_real_id.msg.es_AR:26: invalid character: message ignored
set_real_id.msg.es_AR:27: invalid character: message ignored

# If I convert to UTF-8, it works.
[l3p5xms@cpliis13 src]$ iconv -f ISO-8859-1 -t UTF-8 set_real_id.msg.es_AR > set_real_id.msg.es_AR.utf8
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR.utf8

# If I add the codeset line, it works.
[l3p5xms@cpliis13 src]$ cp set_real_id.msg.es_AR set_real_id.msg.es_AR~
[l3p5xms@cpliis13 src]$ vi set_real_id.msg.es_AR
[l3p5xms@cpliis13 src]$ diff set_real_id.msg.es_AR~ set_real_id.msg.es_AR
0a1
> $ codeset=ISO-8859-1
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR

# Even if LANG is set to ISO-8859-1, I still get errors with the old file.
[l3p5xms@cpliis13 src]$ export LANG=ISO-8859-1
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR~
set_real_id.msg.es_AR~:25: invalid character: message ignored
set_real_id.msg.es_AR~:26: invalid character: message ignored
set_real_id.msg.es_AR~:27: invalid character: message ignored

Do I just not understand how the tool is supposed to work?

Note You need to log in before you can comment on or make changes to this bug.