Bug 1460701 - [RFE] Add support to search jobs by correlation_id
[RFE] Add support to search jobs by correlation_id
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: Search-Backend (Show other bugs)
future
Unspecified Unspecified
medium Severity medium (vote)
: ovirt-4.2.0
: ---
Assigned To: Ori Liel
Petr Matyáš
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-12 08:32 EDT by Benny Zlotnik
Modified: 2017-12-20 06:17 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-20 06:17:59 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.2+
pmatyas: testing_plan_complete-
mgoldboi: planning_ack+
mperina: devel_ack+
lsvaty: testing_ack+


Attachments (Terms of Use)
Search events by correlation id (15.65 KB, image/png)
2017-07-10 04:36 EDT, Eli Mesika
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 6260 None None None 2017-07-10 06:27 EDT
oVirt gerrit 6276 None None None 2017-07-10 06:27 EDT
oVirt gerrit 83370 master MERGED engine: Enable searching job by correlation id 2017-11-13 02:17 EST

  None (edit)
Description Benny Zlotnik 2017-06-12 08:32:41 EDT
Description of problem:
Currently the correlation ID is not exposed via the REST API.
Exposing this field could later provide a way to follow-up on a job's status.
Comment 1 Allon Mureinik 2017-06-12 11:16:38 EDT
IMHO, the Correlation-ID is an implementation detail.  In other words, the real requirement here is to be able to poll commands managed by CoCo in some intelligent fashion, without relying on entity polling - if correlation id is the way to do it, that's fine. If you decide to go some other way, that's fine too.
Comment 2 Juan Hernández 2017-06-26 04:15:31 EDT
Benny, I don't really understand what is your need. Can you elaborate on how you intend to use the correlation id to follow-up on a job status? Would be nice if you give examples of the API calls that you would like to use, and how they will be combined.
Comment 3 Benny Zlotnik 2017-06-26 07:43:56 EDT
The problem which I encountered was in the live storage migration scenario, which consists of 3 jobs:
1. Create a snapshot
2. Move the disk
3. Delete the snapshot

I have added a test for this in OST:
https://github.com/oVirt/ovirt-system-tests/blob/master/basic-suite-master/test-scenarios/004_basic_sanity.py#L302

It seems that using the SDK and checking the number of snapshots, SD of the disk, and the status of the disk would be enough, but because there is a memory lock on the disk, the fact that the 3 steps have finished successfully does not indicate the job is completed and currently this results in a race condition. 

Currently the /jobs/:id endpoint looks like this:
<job href="/ovirt-engine/api/jobs/826193e8-26a3-426c-a42f-1bfb17b77db0" id="826193e8-26a3-426c-a42f-1bfb17b77db0">
<actions>
<link href="/ovirt-engine/api/jobs/826193e8-26a3-426c-a42f-1bfb17b77db0/clear" rel="clear"/>
<link href="/ovirt-engine/api/jobs/826193e8-26a3-426c-a42f-1bfb17b77db0/end" rel="end"/>
</actions>
<description>
Removing Snapshot Auto-generated for Live Storage Migration of VM vmo
</description>
<link href="/ovirt-engine/api/jobs/826193e8-26a3-426c-a42f-1bfb17b77db0/steps" rel="steps"/>
<auto_cleared>true</auto_cleared>
<end_time>2017-06-26T13:46:28.990+03:00</end_time>
<external>false</external>
<last_updated>2017-06-26T13:46:28.990+03:00</last_updated>
<start_time>2017-06-26T13:45:28.372+03:00</start_time>
<status>finished</status>
<owner href="/ovirt-engine/api/users/593fd8dd-03c9-0239-01ee-0000000003d0" id="593fd8dd-03c9-0239-01ee-0000000003d0"/>
</job>

After speaking with oliel I saw that the Correlation ID is passed in the response 
after sending a request using disk_service.move(...) for instance. If I could use it to query the jobs service it would allow me to poll the job's status and correctly wait for its completion. I believe this could be useful for other scenarios as well.
Comment 4 Daniel Erez 2017-06-27 04:40:11 EDT
I've also opened and RFE for tackling a similar issue - bug 1199011. It was closed as Moti preferred to use the existing Job polling type. But I'm not sure whether we can facilitate it in LSM flow (since the memory lock as mentioned in comment #3).
Comment 5 Juan Hernández 2017-06-27 06:22:49 EDT
Note that the correlation id is generated by the engine only if it isn't provided by the caller. You can provide your own correlation id explicitly, with any API call:

  GET /ovirt-engine/api/whatever
  Correlation-Id: myid

If I understand correctly you want to be able to find the jobs that have certain correlation id. If we want to do so then we need more than just adding the attribute to the API job type: we need also to implement search for jobs:

  GET /ovirt-engine/api/jobs?search=correlation_id%3Dmyid

Otherwise you would need to retrieve all the jobs and then look for those that have the relevant job id. I think that we don't have search capability for jobs in the backend, so that would also need to be added.

We can implement these two things, the new attribute and the search, but I still wonder how reliable this can be. We don't have any guarantee that a job will be created for a task. The next commit of the engine may change the way that live migration is implemented, to use a mechanism that doesn't use jobs, and then your code to check the status will silently fail. The presence of jobs for a task isn't part of the contract of the API, so you should better avoid using them.
Comment 6 Juan Hernández 2017-06-27 09:39:05 EDT
Allon, what would the way right to poll for live storage migration completion with the current API?
Comment 7 Allon Mureinik 2017-06-29 10:30:13 EDT
(In reply to Juan Hernández from comment #6)
> Allon, what would the way right to poll for live storage migration
> completion with the current API?

I don't think there is one - hence this BZ.
I'm not sure correlation ID is the way to go, but theoretically, any long running operation should be job, regradless of whether it's backed by an SPM task or not.
Ideally, I'd like the return value of an action to hold the job id, and be able to poll it until it completes.
Comment 8 Juan Hernández 2017-07-06 06:57:11 EDT
So, we need to add the capability to search based on "correlation_id". That needs to be implemented in the backend and the API won't need to be changed. Benny, please confirm that will solve your issue. Then we can move the bug to the backend search component.
Comment 9 Benny Zlotnik 2017-07-09 05:03:37 EDT
Confirming
Comment 10 Eli Mesika 2017-07-10 04:36 EDT
Created attachment 1295730 [details]
Search events by correlation id
Comment 11 Eli Mesika 2017-07-10 04:37:45 EDT
AFAIK search events by correlation_id is already supported (see attached screenshot) 

I think that we should close this RFE , Martin ?
Comment 12 Eli Mesika 2017-07-10 05:19:49 EDT
This RFE was added actually long ago by [1] , the correlation id also is supposed to be exposed to the API as well 


[1] https://gerrit.ovirt.org/#/q/topic:search_by_correlation_id+(status:open+OR+status:merged)
Comment 13 Martin Perina 2017-07-10 06:26:24 EDT
Based on the above targeting to 4.2 and moving to MODIFIED
Comment 14 Allon Mureinik 2017-07-10 06:43:00 EDT
Eli/Martin, someone is missing something here (possibly me, of course). Those patches are about searching **audit logs**.

The requirement here is to search **jobs** as a mechanism to monitor long running operations (e.g. live storage migration).
Could you either explain how I can use the current REST API/SDK to do so, or move this RFE back to NEW/ASSIGNED? 

Thanks!
Comment 15 Martin Perina 2017-07-17 05:57:59 EDT
After offline discussion fixing the title and moving back to NEW
Comment 16 Eli Mesika 2017-07-17 15:19:21 EDT
(In reply to Martin Perina from comment #15)
> After offline discussion fixing the title and moving back to NEW

This will require adding Jobs as a search-able entity
Comment 17 Petr Matyáš 2017-11-14 10:14:27 EST
Verified on ovirt-engine-4.2.0-0.0.master.20171113223918.git25568c3.el7.centos.noarch

Events can now be searched for by correlation id through REST API, there is no search in UI events though.
Comment 18 Sandro Bonazzola 2017-12-20 06:17:59 EST
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.