Bug 83914 - msgmerge does not add spaces for line-breaks
Summary: msgmerge does not add spaces for line-breaks
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: gettext
Version: 8.0
Hardware: i686
OS: Linux
medium
low
Target Milestone: ---
Assignee: Leon Ho
QA Contact: Jay Turner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-02-10 01:17 UTC by Bernd Groh
Modified: 2015-01-08 00:03 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-07-08 00:28:28 UTC
Embargoed:


Attachments (Terms of Use)
Testcase: pot, old po, merged po for an entry with this problem (1.66 KB, text/html)
2003-02-10 06:00 UTC, Bernd Groh
no flags Details
pot-file of the previous testcase (740 bytes, text/plain)
2003-02-10 06:10 UTC, Bernd Groh
no flags Details
old po-file of the previous testcase (1.06 KB, text/plain)
2003-02-10 06:11 UTC, Bernd Groh
no flags Details
new merged po-file of the previous testcase (1.06 KB, text/plain)
2003-02-10 06:11 UTC, Bernd Groh
no flags Details

Description Bernd Groh 2003-02-10 01:17:47 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

Description of problem:
Problem:

msgstr:
[snip] blau und
gelb ist [snip]

msgmerge'd into:
[snip] blau undgelb 
ist [snip]

A lot of german translations have missing spaces in between words.

The initial cause of this I believe to be msgmerge, since it appends the next
line to the previous line, without checking whether it ends in a space. A lot of
translators are not aware that this space at the end of the line is *absolutely*
required.

I believe msgmerge should do a sanity-check, or is there any particular reason
that it doesn't?

An option to not add a space might be ok, but I believe that as default,
msgmerge should not re-arrange lines, but leave them as is.

Cheers,
Bernd


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Create a multi-line string where some lines leave quite some space at the end
of the line, i.e. only have 40 characters (e.g.). Do not end the line in either
"\\n" or " ".
2. Perform a msgmerge on the file containing this entry.


Actual Results:  msgmerge re-formats the lines and introduces a more suitable
line-break. In so doing, it falsly merges the last word of the first line with
the first word of the second line into a single string.

Expected Results:  The entry should have been left as is, or, if a new
line-break was introduced, there should be a space in between the last word of
the first line and the first word of the second line.

Additional info:

Comment 1 Bernd Groh 2003-02-10 06:00:02 UTC
Created attachment 89959 [details]
Testcase: pot, old po, merged po for an entry with this problem

The first row is from the pot-file. msgid on the right, no msgstr available (as
seen on left).
The second row is from the old po-file. Correct msgstr is on the left.
The third row is the result after msgmerge (using default options). On the left
side now is the msgstr in which this problem occurs, i.e. "zusagen" instead of
"zu sagen", "derGemeinschaft" rather than "der Gemeinschaft", and
"könnten,und" where it should read "könnten, und"

Comment 2 Bernd Groh 2003-02-10 06:10:11 UTC
Created attachment 89960 [details]
pot-file of the previous testcase

Comment 3 Bernd Groh 2003-02-10 06:11:10 UTC
Created attachment 89961 [details]
old po-file of the previous testcase

Comment 4 Bernd Groh 2003-02-10 06:11:52 UTC
Created attachment 89962 [details]
new merged po-file of the previous testcase

Comment 5 Leon Ho 2003-02-10 06:42:01 UTC
The orginal idea of muliple lines is for easy reading. People would type in a word 
inbetween lines (very simliar behaviour on editors with wrapping on), as you 
know the message is in a line if there aren't any \n. 
 
I don't think it is sane to change the behaviour or enforce people to add a space 
at the end of the line.  
 
On the other hand, on your test case that needed a space between "," and "u", 
how will it shows on software? Possibly it will shows without space as well. 

Comment 6 Bernd Groh 2003-02-10 07:19:49 UTC
> The orginal idea of muliple lines is for easy reading. People would type in
> a word inbetween lines (very simliar behaviour on editors with wrapping on),
> as you know the message is in a line if there aren't any \n. 

Yes, I know. But most tools for po-files do not wrap automatically and you have
to wrap manually, by pressing [Enter].

> I don't think it is sane to change the behaviour or enforce people to add a
> space at the end of the line.  

That's what I meant. Therefore, msgmerge should check whether there's a space at
the end in such cases. And if it isn't, msgmerge should add a space
automatically, before appending the lines into one, which causes the problem iff
spaces are missing.

> On the other hand, on your test case that needed a space between "," and "u",
> how will it shows on software? Possibly it will shows without space as well.

Yes, that's right, it will show without the space. Not just in this case, but in
all three for the given testcase. "zusagen" is not a german word, neither is
"derGemeinschaft". This is how it would appear in the software, and that is wrong.

msgmerge does that by re-wrapping the lines and ignoring that users might not
have added a space at the end of a line. A such, I believe msgmerge should do a
sanity-check, and if there is no space at the end of the line, add one.


Comment 7 Miloslav Trmac 2003-02-10 17:19:55 UTC
This is NOT a msgmerge bug.
The format is *simple*, no exceptions: list of strings (with possible
\foo escape sequences) concatenated together.

If you "fix" msgmerge not to reformat the entries, gettext
will *still* join the strings, causing the same output. The fact
that there are newlines outside the " " quotes around the strings has
*no* influence on the translated string contents, neither does the
way string is broken into parts:

"a"
" "
"b"

"a b"

"a "
"b"

all generate the same string in the .mo file.

Use a reasonable .po editor (emacs, kbabel, whatever), fix your .po files
and be happy.

Comment 8 Bernd Groh 2003-02-10 23:56:08 UTC
> This is NOT a msgmerge bug.
> The format is *simple*, no exceptions: list of strings (with possible
> \foo escape sequences) concatenated together.

Technically, it might not be. But it sure is a problem! Do you know just how
many of these wrong entries I have fixed in german translations?

> If you "fix" msgmerge not to reformat the entries, gettext
> will *still* join the strings, causing the same output. The fact
> that there are newlines outside the " " quotes around the strings has
> *no* influence on the translated string contents, neither does the
> way string is broken into parts:

That was an *example*, which I don't like as much as reformatting and simply
adding a space at the end of the line if not there, or at least provide an
option that enables this. In this case, you at least fix all such potential
problems the first time you run a msgmerge on the file. And that's what I said.

And there won't be any problem with gettext after that anymore either.

> "a"
> " "
> "b"
>
> "a b"
>
> "a "
> "b"
>
> all generate the same string in the .mo file.

Yes, I know.

But what about:

"a"
"b"

This will generate "ab", not "a b".

Now you sure can tell me about the *simple* format, but then you completely
ignore common practice. If you set a newline writing some text, for example, do
you mean that to say that there will be a new word, or do you intend to say the
next line should be read as part of that very same line, i.e. the last string of
the first line and the first string of the second line forming a single word?

Example:

Since this example doesn't fit entirely in my box, I really do
prefer to use multiple lines, even though I'd like to have it
in one line later on in my output.

This example will be msgmerge'd into:

Since this example doesn't fit entirely in my box, I really doprefer to use 
multiple lines, even though I'd like to have itin one line later on in my 
output.

If that is the desired output for you, you do have a point.

Then I'd like to make the mere suggestion of doing a sanity-check, at least in
msgmerge and simply adding a space if there isn't one.

> Use a reasonable .po editor (emacs, kbabel, whatever), fix your .po files
> and be happy.

I *do* use kbabel! But kbabel doesn't tell me, hey, you've forgotten a space at
the end of the line there, which is *absolutely* required, because if you don't
have one there, msgmerge will scramble your translation next time 'round! ;-)

Cheers,
Bernd

Comment 9 Leon Ho 2003-02-11 06:03:37 UTC
couple of points for us to brainstorm: 
- msgmerge is not a compulsory step for converting .po to .mo, hence we could 
not rely on msgmerge to do a sanity check and add a space at the end of line, but 
also update msgfmt, etc for that behaviour. 
- 3rd party script may try to break the line with just counting the width. So if they 
are breaking one word into two lines, then they may complain if we change the 
behaviour. 
- Ben has pointed out that application like kbabel will automatically add a space 
on previous line when you break into new line. This is a absolutely a good feature 
for client to handle. (on KBabel 0.9.6) 
 
I will try to get a approval from upstream if you still prefer handle it on gettext 
tools instead of translation applications level. 

Comment 10 Bernd Groh 2003-02-11 07:17:23 UTC
> - msgmerge is not a compulsory step for converting .po to .mo, hence we
> could not rely on msgmerge to do a sanity check and add a space at the
> end of line, but also update msgfmt, etc for that behaviour.

In my case, I'll never ever convert anything to .mo. I simply create a new pot
and then do a msgmerge to fill the entries. If a space was missing, as
addressed, my previous translation breaks. While this is easily avoidable, it is
annoying if you happen to have forgotten the required space (in practice, this
happens, as seen in some of the german translations I took over).
How this could be achieved otherwise, I don't know?

> - 3rd party script may try to break the line with just counting the width.
> So if they are breaking one word into two lines, then they may complain if
> we change the behaviour.

What exactly do you mean here? If they don't use msgmerge, they don't have any
problem. And if they use the output, I don't see any problem either. Or do you
mean that they want to use po-files and write words over the edge, e.g.
[snip]and today is a re
ally nice day?

I'd prefer the same behaviour as in markup-languages, if you do a line-break, a
space is added if not placed explicitly.

> - Ben has pointed out that application like kbabel will automatically add
> a space on previous line when you break into new line. This is a absolutely
> a good feature for client to handle. (on KBabel 0.9.6) 

I agree. KBabel therefore complies to the behaviour we know from
markup-languages. But it doesn't enforce it. If you happen to do some changes
and end up with no space at the end, you might not realise it until your
translation is broken. And if you do not use kbabel, then it is even more likely
that you forget a space now and then. I just think it is wrong that forgetting a
simple space at the end of a line can break your translation.

> I will try to get a approval from upstream if you still prefer handle it
> on gettext tools instead of translation applications level. 

Just see what their take on this is. If you've worked a lot with
markup-languages and know that a line-break actually implies a space, then you
are likely to expect that from some tool too. And if it breaks your translation,
because you didn't explicitly add a space at the end of the line, then it simply
is annoying. And if you keep adding spaces in previous translations and know it
could have been easily avoided, you are annoyed too.

I'm happy to keep the default as it is, but an option would be really good.

And while I am all for doing it on the application-level, I am also for doing it
on the gettext tools, because otherwise you are dependent on your application.
And what tool would like to make itself dependent on some application?

But if you want to resolve it to NOTABUG, that's fine with me too. I'm sure I
find another merge-tool, or I simply write my own. :-)

Cheers,
Bernd


Comment 11 Miloslav Trmac 2003-02-12 15:03:47 UTC
> "a"
> "b"
> This will generate "ab", not "a b".
Yes, and that's *right*.

> Since this example doesn't fit entirely in my box, I really doprefer to use 
> multiple lines, even though I'd like to have itin one line later on in my 
> output.
> If that is the desired output for you, you do have a point.
That is not the desired output when using clear text or *ML, but
.po is not *ML. The " " marks are part of the syntax and only the marks
are there to delimit what is the string. No additional rules.

Changing .po to be more *ML-like now after years of use is
a very bad idea IMHO.

> I *do* use kbabel! But kbabel doesn't tell me, hey, you've forgotten a space 
at
> the end of the line there, which is *absolutely* required, because if you 
don't
> have one there, msgmerge will scramble your translation next time 'round! ;-)

> If a space was missing, as
> addressed, my previous translation breaks.is
Your translation is *already* broken (will be wrong in when compiled)
in those cases.


> Then I'd like to make the mere suggestion of doing a sanity-check, at least in
> msgmerge and simply adding a space if there isn't one.

I guess this bug can be laid to rest with mention of something like this:
--------------- addspaces.awk
/msgid/ { in_str = 0; }
/msgstr/ { in_str = 1; }
{
  if (in_str != 0 && $0 ~ /".* "$/)
    $0 = gensub(/"(.*) "$/, "\"\\1\"", "g");
  print $0;
}
---------------
Then hand-check the results.


Comment 12 Bernd Groh 2003-02-13 00:01:07 UTC
> > "a"
> > "b"
> > This will generate "ab", not "a b".
> Yes, and that's *right*.

That's nothing I disagree with, from a technical viewpoint that is.

> > Since this example doesn't fit entirely in my box, I really doprefer to use 
> > multiple lines, even though I'd like to have itin one line later on in my 
> > output.
> > If that is the desired output for you, you do have a point.
> That is not the desired output when using clear text or *ML, but
> .po is not *ML. The " " marks are part of the syntax and only the marks
> are there to delimit what is the string. No additional rules.

No, it's not. But given how many of these problems I've fixed in previous
translations, I can sure tell you that a lot of translators don't seem to be
aware of the po-syntax and simply assume that po is simply clear text. Let's
redirect my critic then away from the tools and to whoever is responsible for
not explaining translators the exact syntax of po-files. Given that some
translators might get the impression, that all they are meant to do is to
translate some clear text, we should make an effort to really tell them that
they are not simply translating clear text, but that they are translating
po-strings, which have a given syntax-requirement.

> Changing .po to be more *ML-like now after years of use is
> a very bad idea IMHO.

This point is taken. But the *ML-like was simply an example, an example for
clear text in an editor that doesn't support line-wrapping, like editors you
write your po-files with. Even if one simply writes clear text in an email, for
example, one often puts a line-break -- without space at the end of the line --
to start a new word, simply in the next line, so that the line doesn't get too
long. If translators behave in the same -- usual -- way writing their po-files,
their translation might break.

While I agree with you that changing the default-behaviour of a tool after years
of use is a bad idea, let's just think about what else we can do, so that this
won't happen anymore, on a more global level.

There are really only two options (well, three to be exact). 1) Make sure that
every translator is aware, that they are not simply translating clear text, but
po-strings, with given syntax requirements, whereas some of these are not
enforced by the common tools. The other two are obvious, 2) have some software
pick up potential problems, or 3) waste a lot of time fixing broken translations
afterwards.

A matter of choice I guess, but I, personally, surely don't choose option 3).
And I don't choose option 1) for the reason that I believe that option 2) is
much easier to achieve than option 1).

> > If a space was missing, as addressed, my previous translation breaks.
> Your translation is *already* broken (will be wrong in when compiled)
> in those cases.

True. But it isn't lost for the translators, since it still appears correct in
your editor and can easily be auto-fixed, i.e. in adding a space at the end.
Once a msgmerge is performed, these fixes must be done manually, because these
really are broken. As such, from a translators perspective, working on clear
text, the translations aren't really broken until you do a msgmerge. Or do you
expect all translators to know that these texts will be broken in the output? If
you do, did you do all necessary steps to ensure they do? If you ensure me, that
every translator but me (since I do know) is aware that this space is required,
then I'm happy to not do anything further about this issue and resolve it myself
to NOTABUG. :-)

> > Then I'd like to make the mere suggestion of doing a sanity-check, at
> > least in msgmerge and simply adding a space if there isn't one.
> I guess this bug can be laid to rest with mention of something like this:
> --------------- addspaces.awk
> /msgid/ { in_str = 0; }
> /msgstr/ { in_str = 1; }
> {
>   if (in_str != 0 && $0 ~ /".* "$/)
>     $0 = gensub(/"(.*) "$/, "\"\\1\"", "g");
>   print $0;
> }
> ---------------
> Then hand-check the results.

If you believe this will solve the problem, and every translator will now be
aware that this space is required, then we can safely lay this issue (which I
completely agree is not a bug technically) to rest.

But I am even in doubt that all translators know about awk, or even any kind of
programming.

Cheers,
Bernd


Comment 13 Bernd Groh 2003-02-13 00:26:45 UTC
Just for the case I left it too vague how I would fix the problem if I would
have a say, here's what I'd do.

> Changing .po to be more *ML-like now after years of use is
> a very bad idea IMHO.

And we wouldn't want that, but if you simply not reformat the lines in msgmerge
-- even though you change the default-behaviour -- this, as you said yourself,
doesn't have any negative effect, since, and I cite:

''
"a"
" "
"b"

"a b"

"a "
"b"

all generate the same string in the .mo file.
''

But, on the other hand, not reformatting the lines brings the advantage, that
missing spaces at the end of the line do not cause the translation to break,
i.e. end up in a stage where a manual fix is required. While these are still
broken in the software, these can still be fixed automatically any time, if we
come across such problems.

What now is your argument against this? Why is it so essential for you, that
msgmerge reformats the lines? What disadvantage causes this for you?
And do you think this is a bigger disadvantage than the one I face?

Anyone else? Any other options? Please *do* fire away!

Thank you,
Bernd


Comment 14 Bruno Haible 2003-02-18 19:45:34 UTC
Three comments:

- As was already said, the PO file format has a certain specification for 8 years
  now, and msgmerge is obeying this specification. Basically, white space outside
  "..." doesn't count.

- Translators should test their translations before submitting them. The first
  step of this test is to call msgfmt to get a .mo file and install this .mo
  file at the appropriate place.

- A good editor for PO files (Emacs PO mode does this; I don't know whether
  KBabel does) should:
  1. Display the translation in a way that displays newlines as newlines and
     not \n, does not depend on extra whitespace and line breaks outside strings
     in the PO file, and show where the string ends.
  2. Let the translator see where there are spaces at the end of line.
  3. If it does automatic line wrapping, let the translator see where there
     are linebreaks (\n in PO mode syntax).

Comment 15 Bernd Groh 2003-02-19 05:41:57 UTC
> - As was already said, the PO file format has a certain specification for 8
> years now, and msgmerge is obeying this specification. Basically, white
> space outside "..." doesn't count.

That's true. But if you'd adapt msgmerge as suggested in the previous comment,
then it would still obey the very same specification, that it did in the last 8
years. This change doesn't mean it suddenly doesn't obey this specification
anymore. But, it does solve one problem that seems to occur in practice.

> - Translators should test their translations before submitting them. The
> first step of this test is to call msgfmt to get a .mo file and install this
> .mo file at the appropriate place.

That's a nice idea. But how do you ensure that every translator really does that?

> - A good editor for PO files (Emacs PO mode does this; I don't know whether
> KBabel does) should:
> 1. Display the translation in a way that displays newlines as newlines and
>    not \n, does not depend on extra whitespace and line breaks outside
>    strings in the PO file, and show where the string ends.

What do you mean with extra whitespace and line breaks outside strings in the
PO-file? We are only talking about strings in the po-file, or? What do you mean
with outside? I'm not even talking about anything but the po-file really.

> 2. Let the translator see where there are spaces at the end of line.
> 3. If it does automatic line wrapping, let the translator see where there
>    are linebreaks (\n in PO mode syntax).

But do we want to dictate translators what capabilities their editor needs to
have? What if I want to edit the po-file directly? In an ordinary text-editor?
Do you simply want to tell me that I shouldn't do that? And that if I am not
willing or able to create and test the .mo, that I shouldn't do any translations
in the first place?

Thanks,
Bernd


Comment 16 Miloslav Trmac 2003-02-19 09:44:58 UTC
> But do we want to dictate translators what capabilities their editor needs to
> have? What if I want to edit the po-file directly? In an ordinary text-editor?
> Do you simply want to tell me that I shouldn't do that?
You can use whatever tools you want, but it is your reponsibility to
ensure the output is right.

Comment 17 Bernd Groh 2003-02-19 23:16:19 UTC
> You can use whatever tools you want, but it is your reponsibility to
> ensure the output is right.

Yes, and I do that. But it seems to be true in practice, that not every
translator does that. It's nice to say that everybody has to ensure their output
is right, but it doesn't change that it doesn't always apply in practice.
Sometimes people don't even have the time in practice. And then, a simple
mistake like this, can not only show wrong in the output, no, once you performed
a msgmerge, it is even broken in the source. As a result, you have to spend
additional time (which in some cases means additional money) to fix manually,
what otherwise could have been fixed automatically.

This could have been avoided easily, simply through msgmerge not changing the
msgstr-entries, but simply leave them as is and copy them to the new file as is.
Why does msgmerge need to do a reformatting of lines? Some translators might
even layout their texts nicely, so that they break behind a comma, or a period,
but msgmerge simply ignores that and does its own layout. Some people might even
find that annoying. Software shouldn't change things that it wasn't told to
change by default. For such things, it can always provide an option.

Just because that's how it was done the last 8 years, doesn't mean we have to do
it this way for the rest of time. Especially not if more and more non-software
text gets converted into po and therefore more and more translators might use
it. Maybe even in a completely different domain.

I will repeat my question. Why is it so essential, that msgmerge reformats the
msgstr-entry in a po-file, rather than using it as it is? If you can give me a
reasonable answer to this very question, you might just convince me of your point.

Thank you,
Bernd


Comment 18 Bernd Groh 2003-02-20 23:27:07 UTC
FYI, here's what I've received this morning:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=84724

And this is only one, currently at the end of a very long list, not including
the ones I found.

Cheers,
Bernd


Comment 19 Miloslav Trmac 2003-02-21 12:46:05 UTC
I still can't see *any* indication why the proposed change would help anything.
* The translation is broken in both cases
* Manual intervention is needed in both cases
* Not reformating means the files still spell-check fine, reformating could
  catch (some of) these errors.

Comment 20 Bernd Groh 2003-02-23 23:25:49 UTC
> I still can't see *any* indication why the proposed change would help
> anything.

???

> * The translation is broken in both cases

Depends on your perspective. It is not broken from a clear-text point of view,
but only from a po-string point of view. And yes, I am only addressing po-files
here, not mo-files. I'm not primarily concerned with mo-files and a lot of my
po-files aren't software-files, therefore no mo-file is ever to be created.

> * Manual intervention is needed in both cases

Yes. But in one case the manual intervention is simply running a script that
automatically fixes all these problems, which does not require an understanding
of the language in which the translation is in, while in the second case,
somebody who understands the language has to go manually through every entry in
the po-file to ensure it is still valid. One manual intervention takes a few
seconds, the other can take up to several hours. IMO, that's a big difference.

> * Not reformating means the files still spell-check fine, reformating could
>   catch (some of) these errors.

Ok, let's get into this issue. Does msgmerge do a spell-check that works only if
it reformats the line? Will msgmerge automatically notify you of incorrect
spellings? Or what exactly do you refer to here? Do you have a concrete example?

Thank you,
Bernd


Comment 21 Miloslav Trmac 2003-02-26 02:12:52 UTC
> > * The translation is broken in both cases
> Depends on your perspective. It is not broken from a clear-text point of view,
> but only from a po-string point of view. And yes, I am only addressing po-
files
> here, not mo-files. I'm not primarily concerned with mo-files and a lot of my
> po-files aren't software-files, therefore no mo-file is ever to be created.
If you are creating "po" files which consider the sequence < " new-line " >
to be white space, you are not using the .po format.

> > * Manual intervention is needed in both cases
> Yes. But in one case the manual intervention is simply running a script that
> automatically fixes all these problems, which does not require an 
understanding
> of the language in which the translation is in, while in the second case,
> somebody who understands the language has to go manually through every entry 
in
> the po-file to ensure it is still valid.
No. A "smart .po editor" is perfectly allowed to break the lines in the
middle of a word, in which case your "script that automatically fixes
all the problems" introduces new errors.
And if you need a "script that automatically fixes all the problems"
has to be run, why shouldn't it be run before you even start using the .po
file in the first place?

> > * Not reformating means the files still spell-check fine, reformating could
> >   catch (some of) these errors.
> 
> Ok, let's get into this issue. Does msgmerge do a spell-check that works only 
if
> it reformats the line? Will msgmerge automatically notify you of incorrect
> spellings? Or what exactly do you refer to here? Do you have a concrete 
example?
Nothing that involved. Merely that using my simple spell check script
(which does have the bug about " \n "  ;-),
msgstr "wrong"
"file"
passes ok, but
msgstr "wrongfi"
"le"
does not.

After msgmerge does it's job, you have a high probablility that a spell
check will reveal at least some instances of this problem.
(This might not be that useful in German, where almost any word combination
is gramatically correct, or correct enough for the spellchecker not to 
complain).


Yes, once we are discussing "not-really-po" files, this all makes some sense.
But IMHO, it still should be solved by using the *strict* .po format, thus
allowing full interoperability, not by trying to expand the number of
utilities that support a particular dialect.

Comment 22 Bernd Groh 2003-02-26 04:53:20 UTC
> If you are creating "po" files which consider the sequence < " new-line " >
> to be white space, you are not using the .po format.

But I don't. I am only using .po format.

> No. A "smart .po editor" is perfectly allowed to break the lines in the
> middle of a word, in which case your "script that automatically fixes
> all the problems" introduces new errors.

It is not to be applied to all po-files, and such script doesn't exist yet
either. The point was that the file remains valid if you look at it as clear
text. And if you ask me how I know to which file it can be applied to
automatically and to which it can't, well, some files have a X-Generator entry,
in fact, all po-files I ever came across (and these are many and the ones that
concern me) have a X-Generator entry, i.e.

"X-Generator: KBabel 1.0beta2\n"

A "smart .po editor" uses this feature! ;-)

> And if you need a "script that automatically fixes all the problems"
> has to be run, why shouldn't it be run before you even start using the .po
> file in the first place?

Because the engineers don't wait for me with a change in po-strings. They do
their changes, and create a revision. And that's the way it should be. Sure, you
could enforce to run a script on commit, but I still believe it is much easier
and more sane if msgmerge simply leaves the entries as is and doesn't reformat
them. Because then you could check the entries any time in the future, without
having to worry about having to check all the entries, but simply the ones with
a space missing at the end. And you don't use a script that might wrongfully fix
entries it shouldn't fix either (good argument, by the way, but it only further
proves my point).

> Nothing that involved. Merely that using my simple spell check script
> (which does have the bug about " \n "  ;-),
> msgstr "wrong"
> "file"
> passes ok, but
> msgstr "wrongfi"
> "le"
> does not.

Are you now being funny? :-)

> After msgmerge does it's job, you have a high probablility that a spell
> check will reveal at least some instances of this problem.

Well, why not add an option to msgmerge to allow such reformating? I would. But
this doesn't change that I still believe the default shouldn't change the
entries, but leave them as is.

For your spell-checking problem, alternatively, you could fix your script! ;-)

> (This might not be that useful in German, where almost any word combination
> is gramatically correct, or correct enough for the spellchecker not to 
> complain).

It still does complain though. But I don't have a problem spell-checking anyway,
since I do it in kbabel. :-)

> Yes, once we are discussing "not-really-po" files, this all makes some sense.

But I don't. I am discussing nothing but po-files and the disadvantages you face
in practice with its strict format, i.e. no whitespace after \n and a space at
the end of the line, else it no longer corelates with clear text. While I am
completely for not changing the format, I am also for msgmerge not changing the
format -- of the po-entries. I *do* talk about po-files, but also about the
problem we seem to run into constantly by translators understanding po as clear
text. And my suggestion of msgmerge not changing the entries is the best way to
ensure, that such small mistakes do not completely mess up the translation-source.

> But IMHO, it still should be solved by using the *strict* .po format, thus
> allowing full interoperability, not by trying to expand the number of
> utilities that support a particular dialect.

Me too. As said, I am not for changing the format. From one of my initial
thoughts of adding a space if msgmerge really needs to do a reformating I have
long departed. But I do strongly believe that msgmerge shouldn't change any
format either, not even that of po-entries.

Cheers,
Bernd


Note You need to log in before you can comment on or make changes to this bug.