Bug 988204 - RFE: Efficiently push and pull recently changed documents
RFE: Efficiently push and pull recently changed documents
Product: Zanata
Classification: Community
Component: Component-API (Show other bugs)
Unspecified Unspecified
unspecified Severity medium
: ---
: ---
Assigned To: Michelle Kim
Zanata-QA Mailling List
Depends On:
  Show dependency treegraph
Reported: 2013-07-25 01:53 EDT by Matthew Casperson
Modified: 2015-07-28 22:46 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: 13
Clone Of:
Last Closed: 2015-07-28 22:46:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Matthew Casperson 2013-07-25 01:53:44 EDT
Currently the only way to sync documents with Zanata is to do a full sync of a project. There is no way to search for documents that have been changed within a specified period. As more content is included in Zanata performing a full sync is going to become less practical.

It would be great if we could sync a subset of documents based on whether or not they have been edited, either by searching for documents based on their last edited time, or by receiving notifications through a queue.
Comment 1 Matthew Casperson 2014-01-13 19:48:44 EST
After talking to Carlos we could implement ETags to save some bandwidth.
Comment 2 Carlos Munoz 2014-02-13 18:46:23 EST
Assigning to Damian for triage.

This is something we need sooner rather than later.
Comment 3 Damian Jansen 2014-02-13 19:24:43 EST
Sounds like a great idea.
Comment 4 Sean Flanigan 2014-02-27 00:40:41 EST
I think this bug might be misnamed, because Zanata already pushes and pull individual documents.  The problem is knowing *which* individual documents to push or pull.

And is it really about push and pull, or just about making pull more efficient?  (Working out what to push is mainly the client's problem, although storing ETags returned from Zanata on PUT could help here too.)

A queue sounds like it might be a good solution.  We should see if it would be feasible to expose a HornetQ queue for updated documents/locales within a project version.
Comment 5 Carlos Munoz 2014-03-02 18:09:23 EST
After discussing with Sean and a bit of research, HornetQ offers a REST API for interacting with queues, which would avoid any dependence on JMS APIs. It also allows for subscribing consumers (basically urls that hornetq will push to upon receipt of a message). We might need to restrict the number of consumers per queue, but all in all sounds like a good solution.
Comment 6 Sean Flanigan 2014-03-02 20:46:11 EST
I don't think we need to avoid dependence on JMS APIs, but being able to expose the queues over REST would give us another option, and may help with firewalls.
Comment 7 Carlos Munoz 2014-03-02 21:10:15 EST
We don't need to avoid that dependence, but I think we definitely want to. Exposing this as a JMS API solely would make it difficult for non-java clients to make use of these 'notifications'. Even if we decided (for some reason) not to go with HornetQ in the end, we should take a cue from their API and try to implement it as a RESTful endpoint.
Comment 8 Sean Flanigan 2014-03-06 01:02:19 EST
Here are the docs for HornetQ's REST API:

Incidentally, the HornetQ docs mention the "Accept-Wait" header, which could be useful in a few other places I can think of.

But perhaps to be more RESTful (and for integration with other tools like RSS readers or Yahoo Pipes) we should think about publishing something like an ATOM feed of changed documents: http://answers.oreilly.com/topic/2153-rest-in-practice-how-to-use-atom-for-event-driven-systems/
Comment 9 Sean Flanigan 2014-03-11 00:40:00 EDT
Based on our discussion today:

1. Zanata should expose a REST query resource which returns a list of documents changed since a specified date. 

Perhaps something like this:
where the three parameters are optional, but version may only be specified if project is specified.  Suggestions welcome for the actual URL.

The result could be returned as an Atom feed, where each entry includes a link to a REST resource for a translated document, plus some metadata [1] to identify the project slug, version slug, docName and locale, plus perhaps a link to let humans view the document in Zanata's editor or download it.

2. As an extension, Zanata might publish a HornetQ topic which pushes newly changed document IDs.  (A subscriber of this topic would still need the above REST query resource for initial synchronisation or synchronisation after an extended disconnection.)  The advantage of the topic is that changes would be visible immediately to the subscriber, and the synchronisation load could be spread throughout the day.

[1] https://tools.ietf.org/html/rfc4287#section-6.4.1
Comment 10 David Mason 2014-05-13 22:01:04 EDT
 - estimate assumes that we have a queue provider working
 - add items to a queue when there are any changes to translations (just starting with translations in the initial implementation to keep it simple).
 - interested parties poll an endpoint that will provide an atom feed based on the queue.
 - items in the feed should be available to consumers for at least a month
Comment 13 Damian Jansen 2015-07-13 20:20:30 EDT
Reassigned to PM
Comment 14 Zanata Migrator 2015-07-28 22:46:31 EDT
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-185

Note You need to log in before you can comment on or make changes to this bug.