Bug 860194 - Sometimes event that export template was finished is not created
Sometimes event that export template was finished is not created
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
x86_64 Linux
unspecified Severity medium
: ---
: 3.2.0
Assigned To: Arik
Barak Dagan
virt
: Reopened, TestBlocker
Depends On:
Blocks: 915809
  Show dependency treegraph
 
Reported: 2012-09-25 04:45 EDT by Jakub Libosvar
Modified: 2014-07-13 19:18 EDT (History)
14 users (show)

See Also:
Fixed In Version: sf9
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-11-21 09:27:20 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (2.63 MB, application/x-bzip)
2012-09-25 04:45 EDT, Jakub Libosvar
no flags Details
engine log (99.59 KB, application/x-compressed-tar)
2013-02-20 09:20 EST, Barak Dagan
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 12432 None None None Never

  None (edit)
Description Jakub Libosvar 2012-09-25 04:45:46 EDT
Created attachment 616919 [details]
logs

Description of problem:
When export template is finished, there should be created event about this with same correlation id as start has.

Response body for action request is: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
    <async>false</async>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <storage_domain>
        <name>ExportDomainRest</name>
    </storage_domain>
    <exclusive>false</exclusive>
    <status>
        <state>complete</state>
    </status>
</action>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<events>
    <event id="1846" href="/api/events/1846">
        <description>Starting to export Template rest_templ1 to ExportDomainRest</description>
        <code>1164</code>
        <severity>normal</severity>
        <time>2012-09-25T07:13:19.060+02:00</time>
        <correlation_id>519</correlation_id>
        <user id="9b9002d1-ec33-4083-8a7b-31f6b8931648" href="/api/users/9b9002d1-ec33-4083-8a7b-31f6b8931648"/>
        <storage_domain id="f3e8db7f-69f9-4fcc-8564-b21b10390c91" href="/api/storagedomains/f3e8db7f-69f9-4fcc-8564-b21b10390c91"/>
        <template id="5f593530-db21-4f9f-bc0c-45b6b29a359b" href="/api/templates/5f593530-db21-4f9f-bc0c-45b6b29a359b"/>
        <data_center id="510bb763-77c3-47ac-8a0e-b42d145e8c3c" href="/api/datacenters/510bb763-77c3-47ac-8a0e-b42d145e8c3c"/>
    </event>
</events>

In above events collection should be two events - started task, finished task

Version-Release number of selected component (if applicable):
master branch revision 9bc69502cb44370a856afc359b699614144de265

How reproducible:
20%

Steps to Reproduce:
1. Export template with passed correlation id
2. After export task is completed, check events with given correlation id

Additional info:
The export was started at about 2012-09-25 07:13:18,290
Logs are attached
Comment 1 Jakub Libosvar 2012-09-25 06:39:21 EDT
For comparison, this is how <events> collection should look:
<events>
    <event id="1882" href="/api/events/1882">
        <description>Template rest_templ1 was exported successfully to ExportDomainRest</description>
        <code>1156</code>
        <severity>normal</severity>
        <time>2012-09-25T01:05:57.420+02:00</time>
        <correlation_id>519</correlation_id>
        <user id="9b9002d1-ec33-4083-8a7b-31f6b8931648" href="/api/users/9b9002d1-ec33-4083-8a7b-31f6b8931648"/>
        <storage_domain id="1be2d15f-6f4d-416f-95d7-965cb00d1a0e" href="/api/storagedomains/1be2d15f-6f4d-416f-95d7-965cb00d1a0e"/>
        <template id="8939f177-2aee-4ccd-8d60-3e7317b827b6" href="/api/templates/8939f177-2aee-4ccd-8d60-3e7317b827b6"/>
        <data_center id="422a4b9d-f3c0-485f-809c-e5c5a68a89ef" href="/api/datacenters/422a4b9d-f3c0-485f-809c-e5c5a68a89ef"/>
    </event>
    <event id="1880" href="/api/events/1880">
        <description>Starting to export Template rest_templ1 to ExportDomainRest</description>
        <code>1164</code>
        <severity>normal</severity>
        <time>2012-09-25T01:05:55.372+02:00</time>
        <correlation_id>519</correlation_id>
        <user id="9b9002d1-ec33-4083-8a7b-31f6b8931648" href="/api/users/9b9002d1-ec33-4083-8a7b-31f6b8931648"/>
        <storage_domain id="1be2d15f-6f4d-416f-95d7-965cb00d1a0e" href="/api/storagedomains/1be2d15f-6f4d-416f-95d7-965cb00d1a0e"/>
        <template id="8939f177-2aee-4ccd-8d60-3e7317b827b6" href="/api/templates/8939f177-2aee-4ccd-8d60-3e7317b827b6"/>
        <data_center id="422a4b9d-f3c0-485f-809c-e5c5a68a89ef" href="/api/datacenters/422a4b9d-f3c0-485f-809c-e5c5a68a89ef"/>
    </event>
</events>
Comment 2 Jakub Libosvar 2012-11-21 09:27:20 EST
Hasn't reproduced in last 20 tests, closing as current release
Comment 3 Barak Dagan 2013-02-20 09:20:16 EST
Created attachment 700041 [details]
engine log

Seems to happen again, SI27 - SI27.1


Response body for action request is: 
<action>
    <async>false</async>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <storage_domain>
        <name>ExportDomainRest</name>
    </storage_domain>
    <exclusive>false</exclusive>
    <status>
        <state>complete</state>
    </status>
</action>
 
Response code is valid: [200, 201] 
Action status is valid: ['complete'] 

Response body for QUERY request is: 
<events>
    <event href="/api/events/1796" id="1796">
        <description>Starting to export Template rest_templ1 to ExportDomainRest</description>
        <code>1164</code>
        <severity>normal</severity>
        <time>2013-02-20T00:24:46.010+02:00</time>
        <correlation_id>519</correlation_id>
        <user href="/api/users/fdfc627c-d875-11e0-90f0-83df133b58cc" id="fdfc627c-d875-11e0-90f0-83df133b58cc"/>
        <storage_domain href="/api/storagedomains/7bba6760-1789-4d6d-a0f9-48ad9e3d1947" id="7bba6760-1789-4d6d-a0f9-48ad9e3d1947"/>
        <template href="/api/templates/24e8c8a6-5fa1-469a-9aaf-3bfd3a359366" id="24e8c8a6-5fa1-469a-9aaf-3bfd3a359366"/>
        <data_center href="/api/datacenters/85086c9f-c6ab-4b5c-99de-16eefbf078b1" id="85086c9f-c6ab-4b5c-99de-16eefbf078b1"/>
    </event>
</events>
Comment 5 Arik 2013-02-21 07:21:11 EST
The problem is with the assumption that the end-action event will exist immediately after the control is gained-back from REST, when invoking an action asynchronically.

REST is monitoring the VDSM-tasks to track the action's status, and when they're done, REST return a response and the caller gets the control back.

The engine adds the end-action event after the VDSM tasks are finished and is not synchronized with the response that is returned by the REST api, thus it can lead to a race - the REST can return a respond that the action is completed, and the query on the events will be made before the end-action event is added.

It was planned to change the REST monitoring to be on the job state instead of the VDSM tasks, such change will solve this race because the response will be returned after the end-action event is added.
Comment 8 Andrew Cathrow 2013-02-25 08:29:19 EST
Arik, did you make a BZ as per comment #7
Comment 9 Arik 2013-02-25 09:45:26 EST
No, I've checked with the guys working on REST and it seems that the required infrastrcuture for monitoring the job id instead of the VDSM tasks is already there and should work, so I'll try to use it for the export template command and see if it solves this bug. so this bug can remain assigned on me.
Comment 10 Oded Ramraz 2013-02-26 02:49:01 EST
We have the same issue for remove VM command . Should we open separate bug per action or you are planning to solve this issue in some general way for all actions?

http://jenkins.qa.lab.tlv.redhat.com:8080/view/Core+Tools/view/3.1/job/3.1-automation_restapi_reg_vms_rhevh/66/
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Core+Tools/view/3.1/job/3.1-automation_restapi_reg_vms_rhevh/68/

Since this issue has major impact on our automation tests results I suggest to fix it for 3.2.
Comment 11 Arik 2013-02-26 06:30:00 EST
Michael - is there a reason not to make the job-polling as the default monitoring for all commands now that we see it's working well for two commands?
Comment 14 Michael Pasternak 2013-02-27 06:50:30 EST
(In reply to comment #11)
> Michael - is there a reason not to make the job-polling as the default
> monitoring for all commands now that we see it's working well for two
> commands?

yes, the noise it will produce ..., this is obviously not 3.2 stuff.
Comment 16 Barak Dagan 2013-03-21 09:06:07 EDT
Verified on SF10.1

This error wasn't since for 3 weeks since SF9 was built
Comment 17 Itamar Heim 2013-06-11 05:46:55 EDT
3.2 has been released
Comment 18 Itamar Heim 2013-06-11 05:47:27 EDT
3.2 has been released
Comment 19 Itamar Heim 2013-06-11 05:56:42 EDT
3.2 has been released

Note You need to log in before you can comment on or make changes to this bug.