Bug 1102964 - CopyTrans takes excessively long hours to complete copying translations
Summary: CopyTrans takes excessively long hours to complete copying translations
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Zanata
Classification: Retired
Component: Component-CopyTrans
Version: 3.3
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.4
Assignee: Patrick Huang
QA Contact: Zanata-QA Mailling List
URL:
Whiteboard:
: 1104469 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-29 23:24 UTC by Yuko Katabami
Modified: 2014-07-28 02:18 UTC (History)
6 users (show)

Fixed In Version: 3.4.2-SNAPSHOT (git-server-3.4.1-47-g88e8fe3)
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-07-17 06:39:32 UTC
Embargoed:


Attachments (Terms of Use)

Description Yuko Katabami 2014-05-29 23:24:49 UTC
Description of problem:
CopyTrans is taking excessively long time to complete copying translations.
With a particular project I am working with, the developer got multiple time out due to the slow action and could not complete the push.
Pushing source only without CopyTrans enabled was normal and fast, however running CopyTrans on those files from the GUI took more than 10 hours.

Version-Release number of selected component (if applicable): 3.3.2


How reproducible:


Steps to Reproduce:
1.Push the source to an existing project with CopyTrans enabled
or push the source to an existing project without CopyTrans enabled then run CopyTrans from the GUI

Actual results:
It takes excessively long time or may fail due to timeout

Expected results:
It should not take that long time

Additional info:
The project we found the issue is: https://translate.zanata.org/zanata/iteration/view/ovirt-reports-history/3.5

Comment 1 Yuko Katabami 2014-05-29 23:28:47 UTC
Number of words: 5480
Number of strings: 1080
Number of locales: 7 (zh-TW is not active)

Comment 2 Yuko Katabami 2014-05-30 00:02:37 UTC
(In reply to Yuko Katabami from comment #1)
Correction:
Number of words: 5480
Number of strings: 1084
Number of locales: 7 (zh-TW is not active)

Comment 3 Carlos Munoz 2014-06-03 22:32:38 UTC
We are currently investigating this issue. Patrick has identified a degradation point where the process starts getting slower and slower as it goes.

Comment 4 Carlos Munoz 2014-06-03 22:49:56 UTC
See also:
https://github.com/zanata/zanata-server/pull/484

Comment 5 Patrick Huang 2014-06-04 23:40:28 UTC
In local machine for 2000 messages, it has reduced copyTrans time from 30 min to 12 min. Not sure how well it will do in production

Comment 6 Ding-Yi Chen 2014-06-04 23:50:46 UTC
*** Bug 1104469 has been marked as a duplicate of this bug. ***

Comment 7 Ding-Yi Chen 2014-06-05 04:40:17 UTC
Tested with Zanata 3.4.2-SNAPSHOT (git-server-3.4.1-6-g206676f)
With gettext type project GCC (gcc-4.8.3)
gcc.pot  (9657 messages)

And server log shows:
12:46:33,012 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-2) copyTrans: 0 zh-CN translations for document "gcc/po/gcc" - duration: 2555 s
13:28:47,419 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-2) copyTrans: 0 zh-TW translations for document "gcc/po/gcc" - duration: 2534 s
14:10:42,116 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-2) copyTrans: 0 de-DE translations for document "gcc/po/gcc" - duration: 2515 s
.....

In other words, the speed of copytrans 9657/2555= 3.78 message per second for one locale.

I will test this with other server that have not applied this fix.

Comment 8 Ding-Yi Chen 2014-06-05 06:54:17 UTC
Tested with Zanata 3.5.0-SNAPSHOT (git-server-3.4.1-62-g6551e0d) which does not include the fix:

16:39:12,427 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-2) copyTrans: 0 zh-CN translations for document "gcc/po/gcc" - duration: 2423 s


Note that the test on 3.5.0-SNAPSHOT, different branch.

Comment 9 Carlos Munoz 2014-06-09 23:01:10 UTC
So, essentially there's no difference?

(In reply to Ding-Yi Chen from comment #8)
> Tested with Zanata 3.5.0-SNAPSHOT (git-server-3.4.1-62-g6551e0d) which does
> not include the fix:
> 
> 16:39:12,427 INFO  [org.zanata.service.impl.CopyTransServiceImpl]
> (DefaultQuartzScheduler_Worker-2) copyTrans: 0 zh-CN translations for
> document "gcc/po/gcc" - duration: 2423 s
> 
> 
> Note that the test on 3.5.0-SNAPSHOT, different branch.

Comment 10 Patrick Huang 2014-06-09 23:22:33 UTC
With large enough data set, yes it makes no difference with current fix. Our latest finding lead us to believe the cache is more likely be the culprit. 
(In reply to Carlos Munoz from comment #9)
> So, essentially there's no difference?
> 
> (In reply to Ding-Yi Chen from comment #8)
> > Tested with Zanata 3.5.0-SNAPSHOT (git-server-3.4.1-62-g6551e0d) which does
> > not include the fix:
> > 
> > 16:39:12,427 INFO  [org.zanata.service.impl.CopyTransServiceImpl]
> > (DefaultQuartzScheduler_Worker-2) copyTrans: 0 zh-CN translations for
> > document "gcc/po/gcc" - duration: 2423 s
> > 
> > 
> > Note that the test on 3.5.0-SNAPSHOT, different branch.

Comment 11 Patrick Huang 2014-06-17 02:16:01 UTC
11:48:34,096 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-2) copyTrans start: document "gcc/po/gcc"
12:07:42,845 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-2) copyTrans: 0 fr translations for document "gcc/po/gcc" - duration: 1149 s

Now we have an over 50% improvement.

Comment 12 Patrick Huang 2014-06-19 22:42:01 UTC
16:04:21,394 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans start: document "gcc/po/gcc"


16:23:30,190 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 6 de translations for document "gcc/po/gcc" - duration: 1149 s
16:42:08,828 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 0 zh translations for document "gcc/po/gcc" - duration: 1119 s
17:00:50,631 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 2 ja translations for document "gcc/po/gcc" - duration: 1122 s
17:19:03,680 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 2 pl translations for document "gcc/po/gcc" - duration: 1093 s
17:37:30,717 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 0 en-US translations for document "gcc/po/gcc" - duration: 1107 s
17:55:56,921 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 1 de-DE translations for document "gcc/po/gcc" - duration: 1106 s
18:14:01,691 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 0 it translations for document "gcc/po/gcc" - duration: 1085 s
18:32:14,236 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 0 es translations for document "gcc/po/gcc" - duration: 1093 s
18:50:41,244 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 0 zh-Hant-TW translations for document "gcc/po/gcc" - duration: 1107 s
19:08:44,455 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 0 uk translations for document "gcc/po/gcc" - duration: 1083 s
19:26:54,246 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans: 0 en translations for document "gcc/po/gcc" - duration: 1090 s
19:26:54,250 INFO  [org.zanata.service.impl.CopyTransServiceImpl] (DefaultQuartzScheduler_Worker-1) copyTrans finished: document "gcc/po/gcc"

Comment 13 Ding-Yi Chen 2014-06-20 01:17:58 UTC
Tested with Zanata 3.4.2-SNAPSHOT (git-server-3.4.1-43-g2f664d4)

The first is very fast 9657/1124= 8.59 msg/s
The second and later are slow, about 4 msg/s

Perhaps some heuristic logic can be apply here.
When only one version exists, and all the copyTrans options are set as "Don't Copy", there should be nothing to copy from, it should not be taking thousands of seconds to process this.

This will definitely help for first version push.

Comment 14 Patrick Huang 2014-06-24 01:31:03 UTC
Above suggestion is implemented. Now if project mismatch or docId mismatch is set to reject, it will skip over copyTrans if there is only one version. It will also skip over locales that don't have any translation.

Comment 15 Ding-Yi Chen 2014-06-24 06:02:51 UTC
VERIFIED with Zanata 3.4.2-SNAPSHOT (git-server-3.4.1-47-g88e8fe3),
as it won't spend needless time on checking unrelated TextFlow.

Comment 16 Ding-Yi Chen 2014-07-28 02:18:45 UTC
*** Bug 1120034 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.