Red Hat Bugzilla – Bug 599516
rpmlint should warn packages with unusable translations
Last modified: 2013-06-17 00:07:08 EDT
Description of problem:
When a package appears to ship locales for a language but uses an unknown locale code as a subdirectory of /usr/share/locale. This usually results in users of the intended target language not finding the locale. The language codes used in the locale directories are those from the ISO 639-1 and ISO 639-2 standards, not those usually used as TLDs (which are from the ISO 3166 standard).
It is possible that the language code was mistyped or incorrectly guessed from the language's or country's name.
Refer to http://www.loc.gov/standards/iso639-2/php/code_list.php for details.
Steps to Reproduce:
1. run rpmlint against lekhonee-gnome package
it does not give any warning
warn presence of incorrect translation code "bb", correct code is "bjs"
Sure, this would be an useful check, but the problem is: what is the actual full list of valid translation dir names for the /usr/share/locale dir? (And could we use that list to check for other things besides /usr/share/locale, for example the /usr/share/man/$locale dirs?)
The /usr/share/locale dir contains many entries that are not in ISO 639, it contains language_COUNTRY formatted things, language@something, language_COUNTRY.CHARSET etc.
"locale -a" output contains a bunch of things, but it does not have any 3-char ISO 639 codes, and language-only things would have to be parsed from language_COUNTRY lines. It also lacks some foo@bar things that I have in my /usr/share/locale, for example en@quot, and I don't know if those are valid or not.
The iso-codes package contains XML files from which one could generate lists. It doesn't have any foo@bar info or info about charsets so those could not be checked based on such a list alone.
Does glibc/gettext/whatever-uses-usr-share-locale support all ISO 639 2 and 3 char languages? What about the ISO 3166 countries if they're present after the ISO 639 code?
I don't have a good feeling about manually maintaining a list of valid codes in rpmlint; ideally there would be a way to check validity of each locale dir name validity at runtime using some external tool/library, or some tool that could be used for generating a list that we could use for shipping a pre-generated list in rpmlint. I'd also like to be able to check the foo@bar and foo_BAR.CHARSET entries.
I think all the valid translation locale codes are available by filesystem package. This package installs pre-defined locale directories.
See the output of "rpm -ql filesystem | grep LC_MESSAGES"
$ rpm -ql filesystem | grep LC_MESSAGES | wc -l
$ grep id= /usr/share/xml/iso-codes/iso_639_3.xml | wc -l
/usr/share/locale also contains dirs that are not owned by the filesystem package. Does that mean that those are not valid, for example en_NZ and hne?
Not like that. Those are valid.
Other suggestion to resolve this bug can be to check
grep id= /usr/share/xml/iso-codes/iso_639_3.xml | grep \"CODE\"
grep code= /usr/share/xml/iso-codes/iso_639.xml | grep \"CODE\"
grep id= /usr/share/xml/iso-codes/iso_639_3.xml | grep \"bjs\"
That would still leave open the question what to do about things like en_NZ, en_US.UTF-8 (note the charset; even if this is unconventional I believe it works just fine), sv_FI (Swedish as spoken in Finland) etc. I'm looking for a way to implement this in a way that it would produce as little "noise" (false positives/warnings) as possible.
We are more concerned about using invalid language codes. I think there is fallback defined for country specific locales.
from http://userguide.icu-project.org/locale (for ICU, other rendering engines should be following a similar approach).
Locale fallback proceeds as follows:
The variant is removed, if there is one.
The country is removed, if there is one.
The script is removed, if there is one.
The ICU default locale is examined. The same set of steps is performed for the default locale."
But when the language code itself is invalid only fallback would be to C locale, which would mean the user won't be able to use any localization. In case of sv_FI, even if the directory is missing, it would fallback to fi, which is not as bad as missing the whole sv localization.
Turns out rpmlint already had support for this but the language listing was missing so many things that I disabled this in the Fedora rpmlint package a long time ago. Improved upstream now:
(Needs also removal of 'addFilter("invalid-(lc-messages|locale-man)-dir")' from /usr/share/rpmlint/config)
$ ./devel-rpmlint.sh lekhonee-gnome-0.10-1.fc13.x86_64.rpm | grep bb
lekhonee-gnome.x86_64: E: invalid-lc-messages-dir /usr/share/locale/bb/LC_MESSAGES/lekhonee-gnome.mo
rpmlint-0.98-1.fc13 has been submitted as an update for Fedora 13.
rpmlint-0.98-1.fc12 has been submitted as an update for Fedora 12.
rpmlint-0.98-1.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
su -c 'yum --enablerepo=updates-testing update rpmlint'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/rpmlint-0.98-1.fc13
rpmlint-0.98-1.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.
rpmlint-0.98-1.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report.