Bug 485072

Summary:	gencat(1p) does not document the "$ codeset" option
Product:	Red Hat Enterprise Linux 4	Reporter:	Michael Solberg <msolberg>
Component:	man-pages	Assignee:	Ivana Varekova <varekova>
Status:	CLOSED NOTABUG	QA Contact:	BaseOS QE <qe-baseos-auto>
Severity:	low	Docs Contact:
Priority:	low
Version:	4.7
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-02-12 09:36:43 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michael Solberg 2009-02-11 14:15:32 UTC

Description of problem:

When converting ISO-8859-1 message files into NLS catalogs, you'll get "invalid character" errors unless you specify the "$ codeset=ISO-8859-1" option in your message file.  UTF-8 and ASCII work fine, so I suppose this could be a bug in iconv.

It'd be nice if we included the "$ codeset=" option in the man page for gencat so that programmers would know about the option.  It doesn't appear to be part of the POSIX specification.

Comment 1 Jakub Jelinek 2009-02-11 14:21:55 UTC

ISO-8859-1 isn't valid UTF-8, unless you use only the ASCII subset thereof, and
as UTF-8 is the default encoding, no wonder you need to tell gencat that you are using a different encoding.

gencat manpage isn't part of glibc though (and it is just the 1p manpage, so it really should describe just what POSIX says and nothing else).

Comment 2 Ivana Varekova 2009-02-12 09:36:43 UTC

The posix man pages describes only the POSIX norm - so this kind of changes are not relevant for them (there is only one change  - which was done in 2.67 - it is not in RHEL4 - which added to all POSIX man pages the note it only describes the POSIX description - it is not in older version for now). 
I'm closing this bug.

Comment 3 Michael Solberg 2009-02-12 12:50:32 UTC

Would it be possible to get this documented upsteam in the glibc manual?  Maybe a knowledgebase article?  I understand not wanting to mess with the POSIX pages, but the only way to find out about the option at this point is to read the source for gencat.

Comment 4 Michael Solberg 2009-02-12 13:01:58 UTC

(In reply to comment #1)
> ISO-8859-1 isn't valid UTF-8, unless you use only the ASCII subset thereof, and
> as UTF-8 is the default encoding, no wonder you need to tell gencat that you
> are using a different encoding.

I should have been a little more specific.  The files were valid ISO-8859-1.  When running gencat on them, you get the errors.  I converted one to UTF-8 with iconv (-f ISO-8859-1 -t UTF-8) and it worked fine and the ones I had in ASCII worked fine.  It was only files in ISO-8859-1 format that I had a problem with:

#Without "$ codeset=ISO-8859-1", I get invalid characters
[l3p5xms@cpliis13 src]$ file set_real_id.msg.es_AR
set_real_id.msg.es_AR: ISO-8859 English text
[l3p5xms@cpliis13 src]$ file set_real_id.msg.es_AR
set_real_id.msg.es_AR: ISO-8859 English text
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR
set_real_id.msg.es_AR:25: invalid character: message ignored
set_real_id.msg.es_AR:26: invalid character: message ignored
set_real_id.msg.es_AR:27: invalid character: message ignored

# If I convert to UTF-8, it works.
[l3p5xms@cpliis13 src]$ iconv -f ISO-8859-1 -t UTF-8 set_real_id.msg.es_AR > set_real_id.msg.es_AR.utf8
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR.utf8

# If I add the codeset line, it works.
[l3p5xms@cpliis13 src]$ cp set_real_id.msg.es_AR set_real_id.msg.es_AR~
[l3p5xms@cpliis13 src]$ vi set_real_id.msg.es_AR
[l3p5xms@cpliis13 src]$ diff set_real_id.msg.es_AR~ set_real_id.msg.es_AR
0a1
> $ codeset=ISO-8859-1
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR

# Even if LANG is set to ISO-8859-1, I still get errors with the old file.
[l3p5xms@cpliis13 src]$ export LANG=ISO-8859-1
[l3p5xms@cpliis13 src]$ gencat set_real_id.cat.es_AR set_real_id.msg.es_AR~
set_real_id.msg.es_AR~:25: invalid character: message ignored
set_real_id.msg.es_AR~:26: invalid character: message ignored
set_real_id.msg.es_AR~:27: invalid character: message ignored

Do I just not understand how the tool is supposed to work?