Bug 599516

Summary: rpmlint should warn packages with unusable translations
Product: [Fedora] Fedora Reporter: Praveen Arimbrathodiyil <parimbra>
Component: rpmlintAssignee: Ville Skyttä <ville.skytta>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: eng-l10n-bugs, llim, manuel.wolfshant, pnemade, tmz, ville.skytta
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: rpmlint-0.98-1.fc13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-01 18:47:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Praveen Arimbrathodiyil 2010-06-03 12:23:31 UTC
Description of problem:
When a package appears to ship locales for a language but uses an unknown locale code as a subdirectory of /usr/share/locale. This usually results in users of the intended target language not finding the locale. The language codes used in the locale directories are those from the ISO 639-1 and ISO 639-2 standards, not those usually used as TLDs (which are from the ISO 3166 standard).

It is possible that the language code was mistyped or incorrectly guessed from the language's or country's name.

Refer to http://www.loc.gov/standards/iso639-2/php/code_list.php for details.

How reproducible:

always

Steps to Reproduce:
1. run rpmlint against lekhonee-gnome package
2.
3.
  
Actual results:

it does not give any warning

Expected results:

warn presence of incorrect translation code "bb", correct code is "bjs"

Additional info:
http://lintian.debian.org/tags/unknown-locale-code.html

http://en.wikipedia.org/wiki/.bb
http://en.wikipedia.org/wiki/Bajan

Comment 1 Ville Skyttä 2010-06-06 10:23:07 UTC
Sure, this would be an useful check, but the problem is: what is the actual full list of valid translation dir names for the /usr/share/locale dir?  (And could we use that list to check for other things besides /usr/share/locale, for example the /usr/share/man/$locale dirs?)

The /usr/share/locale dir contains many entries that are not in ISO 639, it contains language_COUNTRY formatted things, language@something, language_COUNTRY.CHARSET etc.

"locale -a" output contains a bunch of things, but it does not have any 3-char ISO 639 codes, and language-only things would have to be parsed from language_COUNTRY lines.  It also lacks some foo@bar things that I have in my /usr/share/locale, for example en@quot, and I don't know if those are valid or not.

The iso-codes package contains XML files from which one could generate lists.  It doesn't have any foo@bar info or info about charsets so those could not be checked based on such a list alone.

Does glibc/gettext/whatever-uses-usr-share-locale support all ISO 639 2 and 3 char languages?  What about the ISO 3166 countries if they're present after the ISO 639 code?

I don't have a good feeling about manually maintaining a list of valid codes in rpmlint; ideally there would be a way to check validity of each locale dir name validity at runtime using some external tool/library, or some tool that could be used for generating a list that we could use for shipping a pre-generated list in rpmlint.  I'd also like to be able to check the foo@bar and foo_BAR.CHARSET entries.

Thoughts?

Comment 2 Parag Nemade 2010-06-07 09:28:30 UTC
I think all the valid translation locale codes are available by filesystem package. This package installs pre-defined locale directories.

See the output of "rpm -ql filesystem | grep LC_MESSAGES"

Comment 3 Ville Skyttä 2010-06-08 06:08:39 UTC
$ rpm -ql filesystem | grep LC_MESSAGES | wc -l
562

$ grep id= /usr/share/xml/iso-codes/iso_639_3.xml | wc -l
7702

/usr/share/locale also contains dirs that are not owned by the filesystem package.    Does that mean that those are not valid, for example en_NZ and hne?

Comment 4 Parag Nemade 2010-06-08 06:36:19 UTC
Not like that. Those are valid. 

Other suggestion to resolve this bug can be to check 
grep id= /usr/share/xml/iso-codes/iso_639_3.xml | grep \"CODE\"
or

grep code= /usr/share/xml/iso-codes/iso_639.xml | grep \"CODE\"

e.g.
grep id= /usr/share/xml/iso-codes/iso_639_3.xml | grep \"bjs\"
		id="bjs"

Comment 5 Ville Skyttä 2010-06-08 18:01:36 UTC
That would still leave open the question what to do about things like en_NZ, en_US.UTF-8 (note the charset; even if this is unconventional I believe it works just fine), sv_FI (Swedish as spoken in Finland) etc.  I'm looking for a way to implement this in a way that it would produce as little "noise" (false positives/warnings) as possible.

Comment 6 Praveen Arimbrathodiyil 2010-06-09 06:02:29 UTC
Ville,

We are more concerned about using invalid language codes. I think there is fallback defined for country specific locales. 

from http://userguide.icu-project.org/locale (for ICU, other rendering engines should be following a similar approach). 

"Fallback

Locale fallback proceeds as follows:

   1.

      The variant is removed, if there is one.
   2.

      The country is removed, if there is one.
   3.

      The script is removed, if there is one.
   4.

      The ICU default locale is examined. The same set of steps is performed for the default locale."

But when the language code itself is invalid only fallback would be to C locale, which would mean the user won't be able to use any localization. In case of sv_FI, even if the directory is missing, it would fallback to fi, which is not as bad as missing the whole sv localization.

Comment 7 Ville Skyttä 2010-06-20 21:10:55 UTC
Turns out rpmlint already had support for this but the language listing was missing so many things that I disabled this in the Fedora rpmlint package a long time ago.  Improved upstream now:

http://rpmlint.zarb.org/cgi-bin/trac.cgi/changeset/1795

(Needs also removal of 'addFilter("invalid-(lc-messages|locale-man)-dir")' from /usr/share/rpmlint/config)

$ ./devel-rpmlint.sh lekhonee-gnome-0.10-1.fc13.x86_64.rpm | grep bb
lekhonee-gnome.x86_64: E: invalid-lc-messages-dir /usr/share/locale/bb/LC_MESSAGES/lekhonee-gnome.mo

Comment 8 Fedora Update System 2010-06-23 19:53:24 UTC
rpmlint-0.98-1.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/rpmlint-0.98-1.fc13

Comment 9 Fedora Update System 2010-06-23 19:53:56 UTC
rpmlint-0.98-1.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/rpmlint-0.98-1.fc12

Comment 10 Fedora Update System 2010-06-24 16:23:45 UTC
rpmlint-0.98-1.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update rpmlint'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/rpmlint-0.98-1.fc13

Comment 11 Fedora Update System 2010-07-01 18:46:59 UTC
rpmlint-0.98-1.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2010-07-01 18:54:10 UTC
rpmlint-0.98-1.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.