Bug 485072

Summary: gencat(1p) does not document the "$ codeset" option
Product: Red Hat Enterprise Linux 4 Reporter: Michael Solberg <msolberg>
Component: man-pagesAssignee: Ivana Varekova <varekova>
Status: CLOSED NOTABUG QA Contact: BaseOS QE <qe-baseos-auto>
Severity: low Docs Contact:
Priority: low    
Version: 4.7   
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-12 09:36:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Solberg 2009-02-11 14:15:32 UTC
Description of problem:

When converting ISO-8859-1 message files into NLS catalogs, you'll get "invalid character" errors unless you specify the "$ codeset=ISO-8859-1" option in your message file.  UTF-8 and ASCII work fine, so I suppose this could be a bug in iconv.

It'd be nice if we included the "$ codeset=" option in the man page for gencat so that programmers would know about the option.  It doesn't appear to be part of the POSIX specification.

Comment 1 Jakub Jelinek 2009-02-11 14:21:55 UTC
ISO-8859-1 isn't valid UTF-8, unless you use only the ASCII subset thereof, and
as UTF-8 is the default encoding, no wonder you need to tell gencat that you are using a different encoding.

gencat manpage isn't part of glibc though (and it is just the 1p manpage, so it really should describe just what POSIX says and nothing else).

Comment 2 Ivana Varekova 2009-02-12 09:36:43 UTC
The posix man pages describes only the POSIX norm - so this kind of changes are not relevant for them (there is only one change  - which was done in 2.67 - it is not in RHEL4 - which added to all POSIX man pages the note it only describes the POSIX description - it is not in older version for now). 
I'm closing this bug.

Comment 3 Michael Solberg 2009-02-12 12:50:32 UTC
Would it be possible to get this documented upsteam in the glibc manual?  Maybe a knowledgebase article?  I understand not wanting to mess with the POSIX pages, but the only way to find out about the option at this point is to read the source for gencat.

Comment 4 Michael Solberg 2009-02-12 13:01:58 UTC
(In reply to comment #1)
> ISO-8859-1 isn't valid UTF-8, unless you use only the ASCII subset thereof, and
> as UTF-8 is the default encoding, no wonder you need to tell gencat that you
> are using a different encoding.

I should have been a little more specific.  The files were valid ISO-8859-1.  When running gencat on them, you get the errors.  I converted one to UTF-8 with iconv (-f ISO-8859-1 -t UTF-8) and it worked fine and the ones I had in ASCII worked fine.  It was only files in ISO-8859-1 format that I had a problem with:

#Without "$ codeset=ISO-8859-1", I get invalid characters
[l3p5xms@cpliis13 src]$ file set_real_id.msg.es_AR
set_real_id.msg.es_AR: ISO-8859 English text
[l3p5xms@cpliis13 src]$ file set_real_id.msg.es_AR
set_real_id.msg.es_AR: ISO-8859 English text
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR
set_real_id.msg.es_AR:25: invalid character: message ignored
set_real_id.msg.es_AR:26: invalid character: message ignored
set_real_id.msg.es_AR:27: invalid character: message ignored

# If I convert to UTF-8, it works.
[l3p5xms@cpliis13 src]$ iconv -f ISO-8859-1 -t UTF-8 set_real_id.msg.es_AR > set_real_id.msg.es_AR.utf8
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR.utf8

# If I add the codeset line, it works.
[l3p5xms@cpliis13 src]$ cp set_real_id.msg.es_AR set_real_id.msg.es_AR~
[l3p5xms@cpliis13 src]$ vi set_real_id.msg.es_AR
[l3p5xms@cpliis13 src]$ diff set_real_id.msg.es_AR~ set_real_id.msg.es_AR
0a1
> $ codeset=ISO-8859-1
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR

# Even if LANG is set to ISO-8859-1, I still get errors with the old file.
[l3p5xms@cpliis13 src]$ export LANG=ISO-8859-1
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR~
set_real_id.msg.es_AR~:25: invalid character: message ignored
set_real_id.msg.es_AR~:26: invalid character: message ignored
set_real_id.msg.es_AR~:27: invalid character: message ignored

Do I just not understand how the tool is supposed to work?