Bug 1035959

Summary: RFE: Remove Project Type, Zanata should recognise file types individually
Product: [Retired] Zanata Reporter: Damian Jansen <djansen>
Component: Component-UIAssignee: Michelle Kim <mkim>
Status: CLOSED UPSTREAM QA Contact: Zanata-QA Mailling List <zanata-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.4CC: amatlack, camunoz, damason, dchen, sflaniga, zanata-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-29 03:24:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Damian Jansen 2013-11-29 03:03:36 UTC
Description of problem:

Files types should be recognised by their extension or, failing that, metadata.
It seems somewhat pointless and restrictive to force a project version to be one type.

Users should be able to simply upload their sources, and have:
- Zanata set the type,
- Ask them if it can't work it out,
- Allow the maintainer to set it to something else if Zanata gets it wrong,
- A little icon in the file list telling the user what kind of file it is

Version-Release number of selected component (if applicable):
3.2

Comment 1 Sean Flanigan 2013-11-29 07:02:00 UTC
We currently use project type so that the server will know about files (eg .xlf, .properties) which were converted into Zanata format by the client.  (We need to know this to produce an "offline PO" safely.)  By the time the server sees the file, it is just XML, with no file extension.

Another significant limitation with the idea of "project type" is that it means a given project can only support one type of file (xliff, properties, gettext, podir or "raw").

One way to solve this would be use the FileService for all files (as we do for "raw" files).  It would need to be extended to handle XLIFF/Properties/Gettext files, but the FileService already knows about file types, and it stores the original file, which could be very useful.

This would imply that clients would send their .xlf, .properties or .po files to the server for conversion by the FileService, just like, say, .odt files ("raw").  The difference is that the FileService would probably not use Okapi for the conversion of these file types.

Deciding on the best migration strategy for existing projects would be the tricky bit, especially if the project type hasn't been set (or is wrong!).

Comment 2 Carlos Munoz 2013-12-01 23:37:30 UTC
I think eventually that's where we want to go. As for a migration strategy, maybe we still need to support both ways of uploading, and we hide the project type. Users may continue to use the 'legacy' way of uploading files, with their last stored project type (now hidden or non-editable); but if they want to change to a newer client, they would start using the 'new' upload mechanism that Sean describes.

This comes with a few technical challenges: We need to find a way to process files (any format) from as they are streamed from the client (like we do with TMX), this will prevent http timeouts, and we still need to break up transactions for larger files. The alternative is to push the file first and then either keep pinging the server for a result (what we currently do when pushing PO or property files), or use comet or some kind of long polling.

Comment 3 Carlos Munoz 2013-12-03 01:51:30 UTC
Sean and I discussed this in a bit of detail and came up with a stage 1 for this:

Extend the File rest service to support other types of files that are not currently processed using okapi (po, properties, xliff). The service would know how to process the files based on the file extension, as it happens right now.

The current XML-based rest service to upload translations will still rely on the project type field, and consequently will still only support one file type.

Did I miss anything Sean?

Comment 4 Sean Flanigan 2013-12-03 02:08:19 UTC
One minor thing: we also plan to have the client upload its idea of the project type, so that the server can store it (for projects where projectType is null).

Also, we will have to make a big improvement to the client's support for File type projects.  At present, File projects may not have any overlap between source and translation directories: translation directories are always completely separate from source directories, and the translated filename is the same as the source filename, but with a locale directory at the front.

In other words, for a source file:
  src/abc/xyz.odt

The translated files will be:
  translated/de/abc/xyz.odt
  translated/fr/abc/xyz.odt
etc.

That's okay for ODT files, but it won't work for .properties files, because the locale is expected to become part of the filename, not the directory name.  So we may need a customisable way of specifying the template for the pathname.


Also, when there is overlap between source and translated directories, you need a mechanism to avoid treating translated files as if they were source files.

For instance, try to upload the files:

  abc/client_messages.properties
  abc/client_messages_de.properties

As a Java programmer, I know the first one is a source file, and the second is a translation file, but telling them apart programatically is quite difficult.  We have something implemented as part of the current Properties support (it uses the list of locales found in zanata.xml), but this would need to be adapted or reinvented to work with File projects.  (Instead of looking at zanata.xml, perhaps we could see whether part of the filename "looks like" a locale code.)  NB: these rules may need to differ between file types, since most file types don't use underscores to delimit locales.

As a simplification, we could start by disallowing overlaps between source and translation, but this limitation will have to be removed before we can use File projects for a typical Java software project.

We should check what other projects do to handle these problems.

Comment 5 Ding-Yi Chen 2013-12-03 14:31:30 UTC
I have some experience on something like this when I develop zanata-util. 
One major issue I have encountered is that: without existing translation, there is no way of knowing that what to pull.

For example:

1. Maintainer upload PROJ.pot
2. Translators translated into de and ja.

When pulling, which one should the files be output:

1. de.po and ja.po (gettext)
2. de/PROJ.po ja/PROJ.po (podir)

Comment 6 Sean Flanigan 2013-12-04 05:18:05 UTC
Exactly, that's why we need pathname templates.  We may still need to keep some idea of the file types (similar to our current project type enumeration) so that we know which template to use for each file.

So if the document type is "properties" we will use the standard template for properties files, which might be ${dirname}/${basename}_${locale}.properties,

if the document type is "gettext" we use ${dirname}/${locale}.po,

and if the document type is "podir" we use ${locale}/${dirname}/${basename}.po.

In advanced cases, the user might need to specify the actual templates in zanata.xml, along with a way of associating files/directories with the right templates.


And another thing to bear in mind: we need templates when parsing filenames for upload, as well as when generating files.

Comment 7 Ding-Yi Chen 2013-12-04 05:50:01 UTC
Then we need a table of supported locale as well. :)

There is also not-so-corner cases: po/polish.po
"po" can mean Polish language, or directory of po files.

That said, we may take advantage of some mutual exclusive cases, like if you discover UTF8 in .properties, then the project type should be utf8properties, not properties.

Comment 8 David Mason 2013-12-04 06:07:58 UTC
(In reply to Sean Flanigan from comment #6)
> Exactly, that's why we need pathname templates.  We may still need to keep
> some idea of the file types (similar to our current project type
> enumeration) so that we know which template to use for each file.
> 
> So if the document type is "properties" we will use the standard template
> for properties files, which might be
> ${dirname}/${basename}_${locale}.properties,
> 
> if the document type is "gettext" we use ${dirname}/${locale}.po,
> 
> and if the document type is "podir" we use
> ${locale}/${dirname}/${basename}.po.
> 
> In advanced cases, the user might need to specify the actual templates in
> zanata.xml, along with a way of associating files/directories with the right
> templates.
> 
> 
> And another thing to bear in mind: we need templates when parsing filenames
> for upload, as well as when generating files.

In addition to a mechanism for specifying how source files are identified and how they map to translation files, a preview of the results of such a mapping would be useful in helping maintainers to ensure and be confident that they have the correct setting. This would entail a display showing some or all source files that are detected with the current settings, and which translation files they would map to in one or more of the active locales (possibly indicating whether the translation files are found on the file system). This would be especially valuable if maintainers are given access to something like the above syntax to specify manually the exact mapping of source files to translation files.

Comment 9 Damian Jansen 2014-08-12 02:14:11 UTC
*** Bug 1096495 has been marked as a duplicate of this bug. ***

Comment 10 Ding-Yi Chen 2014-08-21 03:55:31 UTC
Perhaps ultimate solution is: you can specify conversion rules or use default ones.

For example, for gettext project
*.pot: 
   <locale>.po

For podir projects
*.pot:
   <locale>/<name>.po

Freemind has both properties and document translations so can benefit from
Resources*.properties:
    Resources<locale>.properties

FM_Key_Mappings_Quick_Guide.odt:
    FM_Key_Mappings_Quick_Guide_<locale>.odt


Even the multiple-translation in same file like .desktop and .spec can be benefit:

*.spec:
    <name>.spec

*.desktop:
    <name>.spec

Frequent used setting should be provided and selectable, while project maintainers can customized into their own need.

Comment 11 David Mason 2014-08-21 05:39:36 UTC
> Frequent used setting should be provided and selectable, while project maintainers can customized into their own need.

I think we can do even better than that: zanata init should be able to look what files are in the directory and suggest a likely setting based on the files that are present, so that in many cases the maintainer will not even have to select a setting manually.

Comment 12 Ding-Yi Chen 2014-09-04 03:10:42 UTC
(In reply to David Mason from comment #11)
> > Frequent used setting should be provided and selectable, while project maintainers can customized into their own need.
> 
> I think we can do even better than that: zanata init should be able to look
> what files are in the directory and suggest a likely setting based on the
> files that are present, so that in many cases the maintainer will not even
> have to select a setting manually.

But first, we have to change how "File" type output translation.

Currently, Zanata out fr translation of README.txt to fr/README.txt
However, most projects I saw are README-fr.txt instead.

e.g.
http://sourceforge.net/p/freemind/code/ci/master/tree/freemind/doc/

Comment 13 Allison Matlack 2014-09-04 10:27:27 UTC
(In reply to Ding-Yi Chen from comment #12)
> (In reply to David Mason from comment #11)
> > > Frequent used setting should be provided and selectable, while project maintainers can customized into their own need.
> > 
> > I think we can do even better than that: zanata init should be able to look
> > what files are in the directory and suggest a likely setting based on the
> > files that are present, so that in many cases the maintainer will not even
> > have to select a setting manually.
> 
> But first, we have to change how "File" type output translation.
> 
> Currently, Zanata out fr translation of README.txt to fr/README.txt
> However, most projects I saw are README-fr.txt instead.
> 
> e.g.
> http://sourceforge.net/p/freemind/code/ci/master/tree/freemind/doc/

^ That's the file structure for the Customer Portal. Prior to uploading stuff to Zanata, I have to manually create folders for each language and rename each file before putting it in the folder. It's a tedious process for sure.

Comment 14 David Mason 2014-09-04 12:15:33 UTC
(In reply to Allison Matlack from comment #13)
> ^ That's the file structure for the Customer Portal. Prior to uploading
> stuff to Zanata, I have to manually create folders for each language and
> rename each file before putting it in the folder. It's a tedious process for
> sure.

You can use command hooks to automate that process as a workaround until different project structures are supported. Some information on command hooks is available on the wiki, and I am happy to help with scripting as needed. See: https://github.com/zanata/zanata-server/wiki/Client-Command-Hooks

Comment 15 Ding-Yi Chen 2014-09-04 23:26:21 UTC
(In reply to David Mason from comment #14)
> (In reply to Allison Matlack from comment #13)
> > ^ That's the file structure for the Customer Portal. Prior to uploading
> > stuff to Zanata, I have to manually create folders for each language and
> > rename each file before putting it in the folder. It's a tedious process for
> > sure.
> 
> You can use command hooks to automate that process as a workaround until
> different project structures are supported. Some information on command
> hooks is available on the wiki, and I am happy to help with scripting as
> needed. See:
> https://github.com/zanata/zanata-server/wiki/Client-Command-Hooks

Command hook won't help the people that use WebUI and download all translation as zip. It will be better to make the rule available on serverside, so download from WebUi and pull with client get consistant result.

Comment 16 David Mason 2014-09-05 01:14:45 UTC
(In reply to Ding-Yi Chen from comment #15)
> Command hook won't help the people that use WebUI and download all
> translation as zip. It will be better to make the rule available on
> serverside, so download from WebUi and pull with client get consistant
> result.

I agree fully - I am only suggesting command hooks as a workaround to avoid some manual work until different project structures are supported. Since the server generates zip files with translations, we definitely need to implement support for arbitrary project structures on both the client and the server.

Comment 17 Ding-Yi Chen 2014-09-10 03:27:47 UTC
One of the good test case is Freemind, it has translation for .odt and .properties.

Comment 19 Damian Jansen 2015-07-14 00:20:47 UTC
Reassigned to PM

Comment 20 Zanata Migrator 2015-07-29 03:24:00 UTC
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-242