524990 – setroubleshoot should use ngettext

Bug 524990 - setroubleshoot should use ngettext

Summary: setroubleshoot should use ngettext

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	setroubleshoot
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Daniel Walsh
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	525036
TreeView+	depends on / blocked

Reported:	2009-09-22 21:58 UTC by Miloš Komarčević
Modified:	2009-10-01 20:49 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-10-01 13:26:05 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
C ngettext patch (2.34 KB, patch) 2009-09-24 20:24 UTC, Miloš Komarčević	no flags	Details \| Diff
Improved C ngettext patch (2.29 KB, patch) 2009-09-24 21:07 UTC, Miloš Komarčević	no flags	Details \| Diff
Show Obsolete (1) View All

Description Miloš Komarčević 2009-09-22 21:58:58 UTC

Hacks like

num_of_times = _("s")
if alert.report_count == 1:
  num_of_times = ""
...
start_label_text = _("This alert has occurred <b>%d time%s</b> since %s") % (alert.report_count, num_of_times, start_date.format(date_format))

are highly inappropriate and don't work for languages other than English.

http://live.gnome.org/TranslationProject/DevGuidelines/Plurals

Please use ngettext instead, see bug #467603 for an example.

Comment 1 Daniel Walsh 2009-09-23 01:07:08 UTC

Fixed in setroubleshoot-2_2_30-1_fc12

Comment 2 Miloš Komarčević 2009-09-23 07:48:49 UTC

Daniel, thanks for picking this up so quickly. Unfortunately, this is just another hack, admittedly less bad than the last one from i18n point of view. It still might not work well for some languages.

I'm reopening this so you can fix it properly using ngettext when you have some more time.

Comment 3 Daniel Walsh 2009-09-23 14:20:26 UTC

How about

        start_label_text = gettext.ngettext("This alert has occurred <b>%d time</b> since %s", "This alert has occurred <b>%d times</b> since %s", alert.report_count) %  % (alert.report_count, start_date.format(date_format))

Comment 4 Miloš Komarčević 2009-09-23 21:23:29 UTC

That's the idea, and it might "just work", but I don't get the inner workings of python, automake, and intltool.

More typically an alias for ngettext() is made, like

        start_label_text = P_("This alert has occurred <b>%d
time</b> since %s", "This alert has occurred <b>%d times</b> since %s",
alert.report_count) %  % (alert.report_count, start_date.format(date_format))

with P_() bound to gettext.ngettext() wherever _() is bound to gettext.gettext()

Then, you need to tell intltool how to extract these messages by passing some extra args to xgettext. From the intltool README:


Changing keywords used in xgettext invocation
.............................................

If you need to change default keywords used to extract messages from 
source code, you need to add variable XGETTEXT_KEYWORDS to 
Makefile.in.in file inside directory where intltool-update is run
from, eg.

        --- start ----

        XGETTEXT_KEYWORDS = --keyword --keyword=P_

        --- end ----

Default keywords xgettext looks for if no XGETTEXT_KEYWORDS is defined
are _, N_ and U_.



So, to your po/Makefile.in.in you'd need to add something like

XGETTEXT_KEYWORDS = --keyword=P_:1,2


As I said, I'm not an expert, so some details are probably missing...

Comment 5 Miloš Komarčević 2009-09-23 23:32:15 UTC

Found more candidates:

browser.py:        return _("%d days ago") % days

seapplet.c:	sprintf(msg, _("Since your last login, there are %d new security alerts to view."), yellow+red);  

seapplet.c:		sprintf(msg, _("Since your last login, there are %d new security alerts to view.  %d of the alerts may be very serious security violations."), yellow+red, red);

seapplet.c:	sprintf(title, _("%d New Security Alerts"), yellow+red);

Note that the second message in seapplet.c is probably best broken down, the added benefit being that the first sentence already appears earlier, will be recognized as such by xgettext during extraction, and will need to be translated only once.

Comment 6 Miloš Komarčević 2009-09-24 20:24:16 UTC

Created attachment 362563 [details]
C ngettext patch

Attaching a proposed ngettext patch for the C part, just missing the Python bit (could figure out where gettext is imported and _() bound).

Comment 7 John Dennis 2009-09-24 20:39:57 UTC

Dan: this is only somewhat tangentially related. In the python code we were using somewhat of a hack to get around the fact the default encoding in Python was hardcoded to ascii instead of utf-8. This required use to tell gettext() to return utf-8 in Python str object instead of UCS-4 in Unicode objects.

Since that code was first written we've figured out a way to force Python into the correct utf-8 default encoding mode. Which means we can get rid of the old hack and clean up some misleading comments in the code.

This is really just a FYI, it won't affect the output, but at some point should be cleaned up and use the "correct" idoms.

Comment 8 Miloš Komarčević 2009-09-24 21:07:45 UTC

Created attachment 362568 [details]
Improved C ngettext patch

Splitting the two strings, second sentence form might depend on the count in some more languages like it does for Serbian.

Comment 9 Daniel Walsh 2009-09-25 14:12:51 UTC

All this is fixed in setroubleshoot-2.2.31-1.fc12

But we still are not seeing the actual translations in the code.  The problem seems to be that we are shipping the viewer in one package "setroubleshoot"  and the text in another package "setroubleshoot-plugins"

And I think the text is not being translated because of this.

Comment 10 Miloš Komarčević 2009-09-25 21:37:17 UTC

Hm, now we're missing desktop and glade strings...

Btw, don't think putting XGETTEXT_KEYWORDS in Makefile.am is appropriate, the README clearly indicates po/Makefile.in.in. Also the --keywords option without any keywords overrides the default keywords, which is not what one would want I think (I'd use only --keyworrd=_P:1,2 as in my patch). Unfortunately neither of these changes bring back the desktop and glade strings, could really use an intltool expert here.


One plural string also slipped through the net:

--- setroubleshoot-2.2.32/src/browser.py	2009-09-25 22:13:40.168164466 +0100
+++ setroubleshoot-2.2.32_ngettext/src/browser.py	2009-09-25 22:18:17.618165876 +0100
@@ -736,7 +736,7 @@
             return _("Today")
         if days == 1:
             return _("Yesterday")
-        return _("%d days ago") % days
+        return P_("%d day ago", "%d days ago", days) % days
      
     def show(self):
         self.main_window.present()

Comment 11 John Dennis 2009-09-25 21:59:40 UTC

Have you checked to see that locale dir is correct and the domain names match?

Comment 12 Miloš Komarčević 2009-09-25 22:11:38 UTC

John, they are missing during string extraction to pot. When XGETTEXT_KEYWORDS is commented out, the extraction works as before, without the plural strings of course.

Comment 13 John Dennis 2009-09-25 22:13:45 UTC

I wrote up this guide to localization and i18n. It tries to give a concise overview, a lot of the other guides are hard to follow IMHO, 

https://wiki.idm.lab.bos.redhat.com/export/idmwiki/Internationalization_Overview

HTH

Comment 14 John Dennis 2009-09-25 22:14:45 UTC

re comment #12, so the pot files are empty?

Comment 15 Miloš Komarčević 2009-09-25 22:52:41 UTC

They are not empty, just missing/skipping desktop and glade files, although they are listed in POTFILES.in. Strings from Python and C files still get extracted. See e.g. 

http://hg.fedoraproject.org/hg/setroubleshoot/diff/712c021e447c/framework/po/setroubleshoot.pot

Comment 16 Piotr Drąg 2009-10-01 13:17:48 UTC

It seems that it has been fixed in this commit:

http://hg.fedorahosted.org/hg/setroubleshoot/rev/ef68b011d6cb

Comment 17 Daniel Walsh 2009-10-01 13:26:05 UTC

I believe this is fixed in setroubleshoot-2.2.33-1.fc12.x86_64

Comment 18 Miloš Komarčević 2009-10-01 15:37:22 UTC

Yep, just missing one more P_() marked string (see bottom of comment #10).

Thanks Daniel and Piotr.

Comment 19 Daniel Walsh 2009-10-01 15:40:15 UTC

No because if it happened 1 day ago, the tool prints "Yesterday"  So it will only print the plural.

Comment 20 Piotr Drąg 2009-10-01 15:47:22 UTC

Some languages have more than one plural form.

Comment 21 Miloš Komarčević 2009-10-01 15:50:30 UTC

Sure Daniel, that makes perfect sense, but again, only for English. For other languages you could use different plural word forms for "days" even for numbers greater than 2. For example, in Serbian "3 dana", but "21 dan" (same is singular because it ends wit a 1). Plural rules get even more complex for other languages, as mentioned in the link in the opening comment.

This is the whole point of using ngettext. You as a developer do not have to, and indeed shoudn't think about strings with plurals. As they say: "assumption is..." ;)

Comment 22 Miloš Komarčević 2009-10-01 20:49:35 UTC

Last change confirmed in hg, thank you very much Daniel!

Note You need to log in before you can comment on or make changes to this bug.