Bug 1498580

Summary: Snapshot preview failure leaves jobs running and image locked
Product: [oVirt] ovirt-engine Reporter: Ravi Nori <rnori>
Component: Backend.CoreAssignee: Ravi Nori <rnori>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: bugs, mburman, mkalfon, mperina
Target Milestone: ovirt-4.1.7Keywords: Automation, AutomationBlocker, Regression
Target Release: 4.1.7.4Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-13 12:25:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log
none
screenshots none

Description Ravi Nori 2017-10-04 16:37:14 UTC
Created attachment 1334377 [details]
engine.log

Job isn't finished and stuck in the engine db.
Them image is locked for ever and can't be released. 
Preview VM Snapshot is failed and engine aware of it, but the job keep running for ever. NO time out. 

<jobs>
<jobhref="/ovirt-engine/api/jobs/19f3e594-a84c-406e-bd0f-5bf681179fc1"id="19f3e594-a84c-406e-bd0f-5bf681179fc1">
<actions>
<linkhref="/ovirt-engine/api/jobs/19f3e594-a84c-406e-bd0f-5bf681179fc1/clear"rel="clear"/>
<linkhref="/ovirt-engine/api/jobs/19f3e594-a84c-406e-bd0f-5bf681179fc1/end"rel="end"/>
</actions>
<description>Preview VM Snapshot snap4 of VM VM6</description>
<linkhref="/ovirt-engine/api/jobs/19f3e594-a84c-406e-bd0f-5bf681179fc1/steps"rel="steps"/>
<auto_cleared>true</auto_cleared>
<external>false</external>
<last_updated>2017-09-27T10:52:18.730+03:00</last_updated>
<start_time>2017-09-27T10:52:16.205+03:00</start_time>
<status>started</status>
<ownerhref="/ovirt-engine/api/users/586c19dc-00b9-00fa-0364-00000000012f"id="586c19dc-00b9-00fa-0364-00000000012f"/>
</job>

- Snapshot-Preview snap4 for VM VM6 was initiated by admin@internal-authz.
- No MAC addresses left in the MAC Address Pool.
- Some MAC addresses had to be reallocated, but operation failed because of insufficient amount of free MACs.
- Failed to complete Snapshot-Preview snap4 for VM VM6.

- In case of snapshot preview and we need to allocate new MAC for the VM and we have no MACs left, the operation must failed. It is indeed failing and engine aware of the failure, but the job run for ever and stuck in Finalize state. 
The most worst thing here is that the image got locked for ever!

- Steps to reproduce - 
1) Master 4.2
2) 2 VMs in cluster
3) MAC pool range with only
one MAC address in cluster
4) Start VM1 with 1 vNIC
with MAC 'z'
5) Create snapshot from VM1
6) Unplug the vNIC from VM1 and give MAC 'w'(not from the pool) to VM1 and assign MAC address 'z'(the origin MAC) to VM2
7) Try to preview VM1 from snapshot 
Expected Result - should fail
Actual result - Engine failed operation - 
- Snapshot-Preview snap4 for VM VM6 was initiated by admin@internal-authz.
- No MAC addresses left in the MAC Address Pool.
- Some MAC addresses had to be reallocated, but operation failed because of insufficient amount of free MACs.
- Failed to complete Snapshot-Preview snap4 for VM VM6.
But, the job is stuck and image stay locked!

Comment 1 Red Hat Bugzilla Rules Engine 2017-10-05 13:40:03 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Mor 2017-10-18 15:36:12 UTC
I'm also experiencing this problem on Red Hat Virtualization Manager Version: 4.1.7.2-0.1.el7

Comment 3 Mor 2017-10-19 07:12:45 UTC
This is also relevant for 4.2.0-0.0.master.20171013142622.git15e767c.el7.centos

Comment 4 Mor 2017-10-19 07:13:36 UTC
Created attachment 1340563 [details]
screenshots

Comment 5 Martin Perina 2017-10-19 07:33:04 UTC
(In reply to Mor from comment #3)
> This is also relevant for
> 4.2.0-0.0.master.20171013142622.git15e767c.el7.centos

The fix was merged to master on Oct 17th, so it should be included in nightly build 4.2.0-0.0.master.20171018...

For 4.1.7 this fis is included in 4.1.7.4 build

Comment 6 Michael Burman 2017-10-25 06:27:57 UTC
I think that BZ 1506092 should be a blocker to this bug. It can't tested properly until fixed.

Comment 7 Michael Burman 2017-10-25 06:54:42 UTC
(In reply to Michael Burman from comment #6)
> I think that BZ 1506092 should be a blocker to this bug. It can't tested
> properly until fixed.

BZ 1506092 should be a blocker for this bug on 4.2, as 4.1.7.4 not affected.

Comment 8 Michael Burman 2017-10-26 08:46:16 UTC
As BZ 1506092 wasn't affecting 4.1.7, only 4.2, this report can be verified with out any blockers or issues.

Comment 9 Michael Burman 2017-10-26 10:49:29 UTC
Verified on - 4.1.7.4-0.1.el7