Bug 1077439
Summary: | RFE: Use lucene indexes to do Copy Trans. | ||
---|---|---|---|
Product: | [Retired] Zanata | Reporter: | Carlos Munoz <camunoz> |
Component: | Component-CopyTrans | Assignee: | Alex Eng <aeng> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Zanata-QA Mailling List <zanata-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.3 | CC: | camunoz, damason, dchen, djansen, sflaniga, ykatabam, zanata-bugs |
Target Milestone: | --- | ||
Target Release: | 3.4 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 3.4.0-SNAPSHOT (git-server-3.3.1-244-gcebf76a) | Doc Type: | Bug Fix |
Doc Text: | Story Points: | 8 | |
Clone Of: | Environment: | ||
Last Closed: | 2014-07-17 06:39:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1088122 |
Description
Carlos Munoz
2014-03-18 02:47:25 UTC
This is combining the backend for CopyTrans and TM Merge. Concerns about how obsolete documents will be handled - CopyTrans doesn't ignore them at the moment, TM Merge does. Hash column could still be used for 100% matches (currently only used by copytrans. # Testing Will need to make sure test data is indexed before tests are run. Could test different aspects separately: - test that indexes can be generated properly - test that indexes are used properly during searches - test that indexes are updated when new data is added Secondary: camunoz The test should also cover CJK languages, both Han character and punctuation, as we did have bugs on lucence search with CJK before. (In reply to Ding-Yi Chen from comment #3) > The test should also cover CJK languages, both Han character and > punctuation, as we did have bugs on lucence search with CJK before. Thanks. Do you know the bug numbers? We should make sure we have tests. Development branch is here: https://github.com/zanata/zanata-server/commits/rhbz1077439 I don't think it is recorded in Bugzilla, it was discovered in Translate Editor search and fixed straight away when it was discovered. Yet I can come up with some test cases like: "性": U+6027 CJK UNIFIED IDEOGRAPH-6027, -ity, nature, character "。": U+3002 IDEOGRAPHIC FULL STOP "、": U+3001 IDEOGRAPHIC COMMA (Used to separate items in list) Pull request is here: https://github.com/zanata/zanata-server/pull/418 Although Hibernate Search/Lucene is indexing the translated contents, we won't be using those fields in the CopyTrans query. So those CJK characters shouldn't give us any trouble unless they appear in source contents. (And right now we are querying by contentHash, which doesn't care about CJK-compatible Lucene Analyzers.) Pull request: https://github.com/zanata/zanata-server/pull/418 We've implemented lucene search for Copy Trans (same as TM Merge) but disabled at the moment due to performance. This pull request now is mainly for refactoring of unit test. Verified |