Bug 988204 - RFE: Efficiently push and pull recently changed documents
Summary: RFE: Efficiently push and pull recently changed documents
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Zanata
Classification: Retired
Component: Component-API
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Michelle Kim
QA Contact: Zanata-QA Mailling List
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-25 05:53 UTC by Matthew Casperson
Modified: 2015-07-29 02:46 UTC (History)
5 users (show)

Fixed In Version:
Story Points: 13
Clone Of:
Environment:
Last Closed: 2015-07-29 02:46:31 UTC
Embargoed:


Attachments (Terms of Use)

Description Matthew Casperson 2013-07-25 05:53:44 UTC
Currently the only way to sync documents with Zanata is to do a full sync of a project. There is no way to search for documents that have been changed within a specified period. As more content is included in Zanata performing a full sync is going to become less practical.

It would be great if we could sync a subset of documents based on whether or not they have been edited, either by searching for documents based on their last edited time, or by receiving notifications through a queue.

Comment 1 Matthew Casperson 2014-01-14 00:48:44 UTC
After talking to Carlos we could implement ETags to save some bandwidth.

Comment 2 Carlos Munoz 2014-02-13 23:46:23 UTC
Assigning to Damian for triage.

This is something we need sooner rather than later.

Comment 3 Damian Jansen 2014-02-14 00:24:43 UTC
Sounds like a great idea.

Comment 4 Sean Flanigan 2014-02-27 05:40:41 UTC
I think this bug might be misnamed, because Zanata already pushes and pull individual documents.  The problem is knowing *which* individual documents to push or pull.

And is it really about push and pull, or just about making pull more efficient?  (Working out what to push is mainly the client's problem, although storing ETags returned from Zanata on PUT could help here too.)

A queue sounds like it might be a good solution.  We should see if it would be feasible to expose a HornetQ queue for updated documents/locales within a project version.

Comment 5 Carlos Munoz 2014-03-02 23:09:23 UTC
After discussing with Sean and a bit of research, HornetQ offers a REST API for interacting with queues, which would avoid any dependence on JMS APIs. It also allows for subscribing consumers (basically urls that hornetq will push to upon receipt of a message). We might need to restrict the number of consumers per queue, but all in all sounds like a good solution.

Comment 6 Sean Flanigan 2014-03-03 01:46:11 UTC
I don't think we need to avoid dependence on JMS APIs, but being able to expose the queues over REST would give us another option, and may help with firewalls.

Comment 7 Carlos Munoz 2014-03-03 02:10:15 UTC
We don't need to avoid that dependence, but I think we definitely want to. Exposing this as a JMS API solely would make it difficult for non-java clients to make use of these 'notifications'. Even if we decided (for some reason) not to go with HornetQ in the end, we should take a cue from their API and try to implement it as a RESTful endpoint.

Comment 8 Sean Flanigan 2014-03-06 06:02:19 UTC
Here are the docs for HornetQ's REST API:
  http://docs.jboss.org/hornetq/2.3.0.Final/docs/user-manual/html/rest.html#message-pull

Incidentally, the HornetQ docs mention the "Accept-Wait" header, which could be useful in a few other places I can think of.

But perhaps to be more RESTful (and for integration with other tools like RSS readers or Yahoo Pipes) we should think about publishing something like an ATOM feed of changed documents: http://answers.oreilly.com/topic/2153-rest-in-practice-how-to-use-atom-for-event-driven-systems/

Comment 9 Sean Flanigan 2014-03-11 04:40:00 UTC
Based on our discussion today:

1. Zanata should expose a REST query resource which returns a list of documents changed since a specified date. 

Perhaps something like this:
  http://zanata.example.com/rest/updated_translations?project=${projectSlug}&version=${versionSlug}&locale=${locale}
where the three parameters are optional, but version may only be specified if project is specified.  Suggestions welcome for the actual URL.

The result could be returned as an Atom feed, where each entry includes a link to a REST resource for a translated document, plus some metadata [1] to identify the project slug, version slug, docName and locale, plus perhaps a link to let humans view the document in Zanata's editor or download it.

2. As an extension, Zanata might publish a HornetQ topic which pushes newly changed document IDs.  (A subscriber of this topic would still need the above REST query resource for initial synchronisation or synchronisation after an extended disconnection.)  The advantage of the topic is that changes would be visible immediately to the subscriber, and the synchronisation load could be spread throughout the day.

[1] https://tools.ietf.org/html/rfc4287#section-6.4.1

Comment 10 David Mason 2014-05-14 02:01:04 UTC
 - estimate assumes that we have a queue provider working
 - add items to a queue when there are any changes to translations (just starting with translations in the initial implementation to keep it simple).
 - interested parties poll an endpoint that will provide an atom feed based on the queue.
 - items in the feed should be available to consumers for at least a month

Comment 13 Damian Jansen 2015-07-14 00:20:30 UTC
Reassigned to PM

Comment 14 Zanata Migrator 2015-07-29 02:46:31 UTC
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-185


Note You need to log in before you can comment on or make changes to this bug.