Bug 1460701 - [RFE] Add support to search jobs by correlation_id
Summary: [RFE] Add support to search jobs by correlation_id
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Search-Backend
Version: future
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.2.0
: ---
Assignee: Ori Liel
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-12 12:32 UTC by Benny Zlotnik
Modified: 2017-12-20 11:17 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-20 11:17:59 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.2+
pmatyas: testing_plan_complete-
mgoldboi: planning_ack+
mperina: devel_ack+
lsvaty: testing_ack+


Attachments (Terms of Use)
Search events by correlation id (15.65 KB, image/png)
2017-07-10 08:36 UTC, Eli Mesika
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 6260 0 None MERGED core: adding search by correlation-id 2021-02-10 10:09:41 UTC
oVirt gerrit 6276 0 None MERGED api: Adding correlation_id to Event 2021-02-10 10:09:41 UTC
oVirt gerrit 83370 0 master MERGED engine: Enable searching job by correlation id 2021-02-10 10:09:41 UTC

Description Benny Zlotnik 2017-06-12 12:32:41 UTC
Description of problem:
Currently the correlation ID is not exposed via the REST API.
Exposing this field could later provide a way to follow-up on a job's status.

Comment 1 Allon Mureinik 2017-06-12 15:16:38 UTC
IMHO, the Correlation-ID is an implementation detail.  In other words, the real requirement here is to be able to poll commands managed by CoCo in some intelligent fashion, without relying on entity polling - if correlation id is the way to do it, that's fine. If you decide to go some other way, that's fine too.

Comment 2 Juan Hernández 2017-06-26 08:15:31 UTC
Benny, I don't really understand what is your need. Can you elaborate on how you intend to use the correlation id to follow-up on a job status? Would be nice if you give examples of the API calls that you would like to use, and how they will be combined.

Comment 3 Benny Zlotnik 2017-06-26 11:43:56 UTC
The problem which I encountered was in the live storage migration scenario, which consists of 3 jobs:
1. Create a snapshot
2. Move the disk
3. Delete the snapshot

I have added a test for this in OST:
https://github.com/oVirt/ovirt-system-tests/blob/master/basic-suite-master/test-scenarios/004_basic_sanity.py#L302

It seems that using the SDK and checking the number of snapshots, SD of the disk, and the status of the disk would be enough, but because there is a memory lock on the disk, the fact that the 3 steps have finished successfully does not indicate the job is completed and currently this results in a race condition. 

Currently the /jobs/:id endpoint looks like this:
<job href="/ovirt-engine/api/jobs/826193e8-26a3-426c-a42f-1bfb17b77db0" id="826193e8-26a3-426c-a42f-1bfb17b77db0">
<actions>
<link href="/ovirt-engine/api/jobs/826193e8-26a3-426c-a42f-1bfb17b77db0/clear" rel="clear"/>
<link href="/ovirt-engine/api/jobs/826193e8-26a3-426c-a42f-1bfb17b77db0/end" rel="end"/>
</actions>
<description>
Removing Snapshot Auto-generated for Live Storage Migration of VM vmo
</description>
<link href="/ovirt-engine/api/jobs/826193e8-26a3-426c-a42f-1bfb17b77db0/steps" rel="steps"/>
<auto_cleared>true</auto_cleared>
<end_time>2017-06-26T13:46:28.990+03:00</end_time>
<external>false</external>
<last_updated>2017-06-26T13:46:28.990+03:00</last_updated>
<start_time>2017-06-26T13:45:28.372+03:00</start_time>
<status>finished</status>
<owner href="/ovirt-engine/api/users/593fd8dd-03c9-0239-01ee-0000000003d0" id="593fd8dd-03c9-0239-01ee-0000000003d0"/>
</job>

After speaking with oliel I saw that the Correlation ID is passed in the response 
after sending a request using disk_service.move(...) for instance. If I could use it to query the jobs service it would allow me to poll the job's status and correctly wait for its completion. I believe this could be useful for other scenarios as well.

Comment 4 Daniel Erez 2017-06-27 08:40:11 UTC
I've also opened and RFE for tackling a similar issue - bug 1199011. It was closed as Moti preferred to use the existing Job polling type. But I'm not sure whether we can facilitate it in LSM flow (since the memory lock as mentioned in comment #3).

Comment 5 Juan Hernández 2017-06-27 10:22:49 UTC
Note that the correlation id is generated by the engine only if it isn't provided by the caller. You can provide your own correlation id explicitly, with any API call:

  GET /ovirt-engine/api/whatever
  Correlation-Id: myid

If I understand correctly you want to be able to find the jobs that have certain correlation id. If we want to do so then we need more than just adding the attribute to the API job type: we need also to implement search for jobs:

  GET /ovirt-engine/api/jobs?search=correlation_id%3Dmyid

Otherwise you would need to retrieve all the jobs and then look for those that have the relevant job id. I think that we don't have search capability for jobs in the backend, so that would also need to be added.

We can implement these two things, the new attribute and the search, but I still wonder how reliable this can be. We don't have any guarantee that a job will be created for a task. The next commit of the engine may change the way that live migration is implemented, to use a mechanism that doesn't use jobs, and then your code to check the status will silently fail. The presence of jobs for a task isn't part of the contract of the API, so you should better avoid using them.

Comment 6 Juan Hernández 2017-06-27 13:39:05 UTC
Allon, what would the way right to poll for live storage migration completion with the current API?

Comment 7 Allon Mureinik 2017-06-29 14:30:13 UTC
(In reply to Juan Hernández from comment #6)
> Allon, what would the way right to poll for live storage migration
> completion with the current API?

I don't think there is one - hence this BZ.
I'm not sure correlation ID is the way to go, but theoretically, any long running operation should be job, regradless of whether it's backed by an SPM task or not.
Ideally, I'd like the return value of an action to hold the job id, and be able to poll it until it completes.

Comment 8 Juan Hernández 2017-07-06 10:57:11 UTC
So, we need to add the capability to search based on "correlation_id". That needs to be implemented in the backend and the API won't need to be changed. Benny, please confirm that will solve your issue. Then we can move the bug to the backend search component.

Comment 9 Benny Zlotnik 2017-07-09 09:03:37 UTC
Confirming

Comment 10 Eli Mesika 2017-07-10 08:36:09 UTC
Created attachment 1295730 [details]
Search events by correlation id

Comment 11 Eli Mesika 2017-07-10 08:37:45 UTC
AFAIK search events by correlation_id is already supported (see attached screenshot) 

I think that we should close this RFE , Martin ?

Comment 12 Eli Mesika 2017-07-10 09:19:49 UTC
This RFE was added actually long ago by [1] , the correlation id also is supposed to be exposed to the API as well 


[1] https://gerrit.ovirt.org/#/q/topic:search_by_correlation_id+(status:open+OR+status:merged)

Comment 13 Martin Perina 2017-07-10 10:26:24 UTC
Based on the above targeting to 4.2 and moving to MODIFIED

Comment 14 Allon Mureinik 2017-07-10 10:43:00 UTC
Eli/Martin, someone is missing something here (possibly me, of course). Those patches are about searching **audit logs**.

The requirement here is to search **jobs** as a mechanism to monitor long running operations (e.g. live storage migration).
Could you either explain how I can use the current REST API/SDK to do so, or move this RFE back to NEW/ASSIGNED? 

Thanks!

Comment 15 Martin Perina 2017-07-17 09:57:59 UTC
After offline discussion fixing the title and moving back to NEW

Comment 16 Eli Mesika 2017-07-17 19:19:21 UTC
(In reply to Martin Perina from comment #15)
> After offline discussion fixing the title and moving back to NEW

This will require adding Jobs as a search-able entity

Comment 17 Petr Matyáš 2017-11-14 15:14:27 UTC
Verified on ovirt-engine-4.2.0-0.0.master.20171113223918.git25568c3.el7.centos.noarch

Events can now be searched for by correlation id through REST API, there is no search in UI events though.

Comment 18 Sandro Bonazzola 2017-12-20 11:17:59 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.